emolib.tokenizer.lexer.spanish
Class SpanishLexer

java.lang.Object
  extended by emolib.util.proc.TextDataProcessor
      extended by emolib.tokenizer.Tokenizer
          extended by emolib.tokenizer.lexer.spanish.SpanishLexer
All Implemented Interfaces:
SpanishLexerConstants, Configurable, DataProcessor

public class SpanishLexer
extends Tokenizer
implements SpanishLexerConstants

Inherits the common methods and functions from the Tokenizer and implements a Spanish lexical analyzer with JavaCC.

Spanish is the language aimed by default in EmoLib. The tokens used by this lexer correpond to the Spanish grammatical tokens proposed by David García. If other tagging guidelines are desired, refer to [Santorini, 1995] or the tagset proposed by the EAGLES group, used by the FreeLing project.

Nouns, adjectives and verbs, the tokens that don't match with any of the given tags are considered to have affective meaning.

No syntax is implemented in this lexer, thus a bag of words is used instead.

--
[Santorini, 1995] Santorini, B., "Part-of-Speech Tagging Guidelines for the Penn Treebank Project", (3rd revision, 2nd printing). Technical Report, Department of Computer and Information Science, University of Pennsylvania, 1995.

Author:
David García, Alexandre Trilla (atrilla@salle.url.edu)

Field Summary
 Token jj_nt
          Next token.
 Token token
          Current token.
 SpanishLexerTokenManager token_source
          Generated Token Manager.
 
Fields inherited from class emolib.tokenizer.Tokenizer
negation, negativeModifier1, negativeModifier2, negativeModifier3, positiveModifier1, positiveModifier2, positiveModifier3, PROP_NEGATION, PROP_NEGATIVE_MODIFIER_1, PROP_NEGATIVE_MODIFIER_2, PROP_NEGATIVE_MODIFIER_3, PROP_POSITIVE_MODIFIER_1, PROP_POSITIVE_MODIFIER_2, PROP_POSITIVE_MODIFIER_3
 
Fields inherited from interface emolib.tokenizer.lexer.spanish.SpanishLexerConstants
ADVERBIO_AFIRMACION, ADVERBIO_CUANTITATIVO_NEG_1, ADVERBIO_CUANTITATIVO_NEG_2, ADVERBIO_CUANTITATIVO_NEG_3, ADVERBIO_CUANTITATIVO_POS_1, ADVERBIO_CUANTITATIVO_POS_2, ADVERBIO_CUANTITATIVO_POS_3, ADVERBIO_LUGAR, ADVERBIO_MODO, ADVERBIO_NEGACION, ADVERBIO_PROBABILIDAD, ADVERBIO_TIEMPO, ARTICULO_DETERMINADO, ARTICULO_FUSION, ARTICULO_INDETERMINADO, BLANK, CONJUNCION_ADVERSATIVA, CONJUNCION_CAUSAL, CONJUNCION_COPULATIVA, CONJUNCION_DISYUNTIVA, CONJUNCION_FINAL, CONJUNCION_TEMPORAL, DEFAULT, DEMOSTRATIVO, DIGITO, EOF, ESPECIAL, ESPECIFICACION, EXCLAMATIVA, FIN_FRASE, INDEFINIDO_CUANTITATIVO, INDEFINIDO_DISTRIBUTIVO, INTERROGATIVA, LETRA, NUMERAL, OTRO, POSESIVO_1, POSESIVO_2, POSESIVO_3, PREPOSICION, PRONOMBRE_1, PRONOMBRE_2, PRONOMBRE_3, PRONOMBRE_REL, SALTO_CR, SALTO_CRLF, SALTO_LF, SIMBOLO_NEUTRO, TAB, tokenImage
 
Constructor Summary
SpanishLexer()
          Void constructor needed to by the configuration manager to perform the instantiation.
SpanishLexer(java.io.InputStream stream)
          Constructor with InputStream.
SpanishLexer(java.io.InputStream stream, java.lang.String encoding)
          Constructor with InputStream and supplied encoding
SpanishLexer(java.io.Reader stream)
          Constructor.
SpanishLexer(SpanishLexerTokenManager tm)
          Constructor with generated Token Manager.
 
Method Summary
 void disable_tracing()
          Disable tracing.
 void enable_tracing()
          Enable tracing.
 ParseException generateParseException()
          Generate ParseException.
 Tokenizer getNew(java.lang.String initialization)
          Function to obtain a new initialized instance of the Tokenizer.
 Token getNextToken()
          Get the next Token.
 Token getToken(int index)
          Get the specific Token.
 void parseGrammar()
          Method to parse the incoming text with the well defined grammar.
 void parseSpanishGrammar()
           
 void ReInit(java.io.InputStream stream)
          Reinitialise.
 void ReInit(java.io.InputStream stream, java.lang.String encoding)
          Reinitialise.
 void ReInit(java.io.Reader stream)
          Reinitialise.
 void ReInit(SpanishLexerTokenManager tm)
          Reinitialise.
 
Methods inherited from class emolib.tokenizer.Tokenizer
fillConfigurationValues, getData, getPossibleEmotionalContent, getWord, getWordClass, getWordModifierValue, initialize, inputData, newProperties, putModifierValue, putWord, putWordClass, register, setPossibleEmotionalContent
 
Methods inherited from class emolib.util.proc.TextDataProcessor
flush, getName, getPredecessor, setPredecessor, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

token_source

public SpanishLexerTokenManager token_source
Generated Token Manager.


token

public Token token
Current token.


jj_nt

public Token jj_nt
Next token.

Constructor Detail

SpanishLexer

public SpanishLexer()
Void constructor needed to by the configuration manager to perform the instantiation.


SpanishLexer

public SpanishLexer(java.io.InputStream stream)
Constructor with InputStream.


SpanishLexer

public SpanishLexer(java.io.InputStream stream,
                    java.lang.String encoding)
Constructor with InputStream and supplied encoding


SpanishLexer

public SpanishLexer(java.io.Reader stream)
Constructor.


SpanishLexer

public SpanishLexer(SpanishLexerTokenManager tm)
Constructor with generated Token Manager.

Method Detail

getNew

public Tokenizer getNew(java.lang.String initialization)
Description copied from class: Tokenizer
Function to obtain a new initialized instance of the Tokenizer. The real (not abstract) tokenizers should override this function.

Specified by:
getNew in class Tokenizer
Parameters:
initialization - The string to initialize the new Tokenizer.
Returns:
The new Tokenizer.

parseGrammar

public void parseGrammar()
                  throws java.lang.Exception
Description copied from class: Tokenizer
Method to parse the incoming text with the well defined grammar.

Specified by:
parseGrammar in class Tokenizer
Throws:
java.lang.Exception - If a ParseException occurs.

parseSpanishGrammar

public final void parseSpanishGrammar()
                               throws ParseException
Throws:
ParseException

ReInit

public void ReInit(java.io.InputStream stream)
Reinitialise.


ReInit

public void ReInit(java.io.InputStream stream,
                   java.lang.String encoding)
Reinitialise.


ReInit

public void ReInit(java.io.Reader stream)
Reinitialise.


ReInit

public void ReInit(SpanishLexerTokenManager tm)
Reinitialise.


getNextToken

public final Token getNextToken()
Get the next Token.


getToken

public final Token getToken(int index)
Get the specific Token.


generateParseException

public ParseException generateParseException()
Generate ParseException.


enable_tracing

public final void enable_tracing()
Enable tracing.


disable_tracing

public final void disable_tracing()
Disable tracing.