emolib.tokenizer.lexer.english
Class EnglishLexer

java.lang.Object
  extended by emolib.util.proc.TextDataProcessor
      extended by emolib.tokenizer.Tokenizer
          extended by emolib.tokenizer.lexer.english.EnglishLexer
All Implemented Interfaces:
EnglishLexerConstants, Configurable, DataProcessor

public class EnglishLexer
extends Tokenizer
implements EnglishLexerConstants

Inherits the common methods and functions from the Tokenizer and implements an English lexical analyzer with JavaCC.

The tokens used by this lexer correpond to the English grammatical tokens proposed by David García. If other tagging guidelines are desired, refer to [Santorini, 1995] or the tagset proposed by the EAGLES group, used by the FreeLing project.

Nouns, adjectives and verbs, the tokens that don't match with any of the given tags are considered to have affective meaning.

No syntax is implemented in this lexer, thus a bag of words is used instead. The tokens are defined to not contain any space, otherwise, the modules that follow might be in trouble.

--
[Santorini, 1995] Santorini, B., "Part-of-Speech Tagging Guidelines for the Penn Treebank Project", (3rd revision, 2nd printing). Technical Report, Department of Computer and Information Science, University of Pennsylvania, 1995.

Author:
David García, Alexandre Trilla (atrilla@salle.url.edu)

Field Summary
 Token jj_nt
          Next token.
 Token token
          Current token.
 EnglishLexerTokenManager token_source
          Generated Token Manager.
 
Fields inherited from class emolib.tokenizer.Tokenizer
negation, negativeModifier1, negativeModifier2, negativeModifier3, positiveModifier1, positiveModifier2, positiveModifier3, PROP_NEGATION, PROP_NEGATIVE_MODIFIER_1, PROP_NEGATIVE_MODIFIER_2, PROP_NEGATIVE_MODIFIER_3, PROP_POSITIVE_MODIFIER_1, PROP_POSITIVE_MODIFIER_2, PROP_POSITIVE_MODIFIER_3
 
Fields inherited from interface emolib.tokenizer.lexer.english.EnglishLexerConstants
ADVERBIO_AFIRMACION, ADVERBIO_CUANTITATIVO_NEG_1, ADVERBIO_CUANTITATIVO_NEG_2, ADVERBIO_CUANTITATIVO_NEG_3, ADVERBIO_CUANTITATIVO_POS_1, ADVERBIO_CUANTITATIVO_POS_2, ADVERBIO_CUANTITATIVO_POS_3, ADVERBIO_LUGAR, ADVERBIO_MODO, ADVERBIO_NEGACION, ADVERBIO_PROBABILIDAD, ADVERBIO_TIEMPO, ARTICULO_DETERMINADO, ARTICULO_INDETERMINADO, BLANK, CONJUNCION_ADVERSATIVA, CONJUNCION_CAUSAL, CONJUNCION_COPULATIVA, CONJUNCION_DISYUNTIVA, CONJUNCION_TEMPORAL, DEFAULT, DEMOSTRATIVO, DIGITO, EOF, ESPECIAL, ESPECIFICACION, EXCLAMATIVA, FIN_FRASE, INDEFINIDO_CUANTITATIVO, INDEFINIDO_DISTRIBUTIVO, INTERROGATIVA, LETRA, NUMERAL, OTRO, POSESIVO_1, POSESIVO_2, POSESIVO_3, PREPOSICION, PRONOMBRE_1, PRONOMBRE_2, PRONOMBRE_3, PRONOMBRE_REL, SALTO_CR, SALTO_CRLF, SALTO_LF, SIMBOLO_NEUTRO, TAB, tokenImage
 
Constructor Summary
EnglishLexer()
          Void constructor needed to by the configuration manager to perform the instantiation.
EnglishLexer(EnglishLexerTokenManager tm)
          Constructor with generated Token Manager.
EnglishLexer(java.io.InputStream stream)
          Constructor with InputStream.
EnglishLexer(java.io.InputStream stream, java.lang.String encoding)
          Constructor with InputStream and supplied encoding
EnglishLexer(java.io.Reader stream)
          Constructor.
 
Method Summary
 void disable_tracing()
          Disable tracing.
 void enable_tracing()
          Enable tracing.
 ParseException generateParseException()
          Generate ParseException.
 Tokenizer getNew(java.lang.String initialization)
          Function to obtain a new initialized instance of the Tokenizer.
 Token getNextToken()
          Get the next Token.
 Token getToken(int index)
          Get the specific Token.
 void parseEnglishGrammar()
           
 void parseGrammar()
          Method to parse the incoming text with the well defined grammar.
 void ReInit(EnglishLexerTokenManager tm)
          Reinitialise.
 void ReInit(java.io.InputStream stream)
          Reinitialise.
 void ReInit(java.io.InputStream stream, java.lang.String encoding)
          Reinitialise.
 void ReInit(java.io.Reader stream)
          Reinitialise.
 
Methods inherited from class emolib.tokenizer.Tokenizer
fillConfigurationValues, getData, getPossibleEmotionalContent, getWord, getWordClass, getWordModifierValue, initialize, inputData, newProperties, putModifierValue, putWord, putWordClass, register, setPossibleEmotionalContent
 
Methods inherited from class emolib.util.proc.TextDataProcessor
flush, getName, getPredecessor, setPredecessor, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

token_source

public EnglishLexerTokenManager token_source
Generated Token Manager.


token

public Token token
Current token.


jj_nt

public Token jj_nt
Next token.

Constructor Detail

EnglishLexer

public EnglishLexer()
Void constructor needed to by the configuration manager to perform the instantiation.


EnglishLexer

public EnglishLexer(java.io.InputStream stream)
Constructor with InputStream.


EnglishLexer

public EnglishLexer(java.io.InputStream stream,
                    java.lang.String encoding)
Constructor with InputStream and supplied encoding


EnglishLexer

public EnglishLexer(java.io.Reader stream)
Constructor.


EnglishLexer

public EnglishLexer(EnglishLexerTokenManager tm)
Constructor with generated Token Manager.

Method Detail

getNew

public Tokenizer getNew(java.lang.String initialization)
Description copied from class: Tokenizer
Function to obtain a new initialized instance of the Tokenizer. The real (not abstract) tokenizers should override this function.

Specified by:
getNew in class Tokenizer
Parameters:
initialization - The string to initialize the new Tokenizer.
Returns:
The new Tokenizer.

parseGrammar

public void parseGrammar()
                  throws java.lang.Exception
Description copied from class: Tokenizer
Method to parse the incoming text with the well defined grammar.

Specified by:
parseGrammar in class Tokenizer
Throws:
java.lang.Exception - If a ParseException occurs.

parseEnglishGrammar

public final void parseEnglishGrammar()
                               throws ParseException
Throws:
ParseException

ReInit

public void ReInit(java.io.InputStream stream)
Reinitialise.


ReInit

public void ReInit(java.io.InputStream stream,
                   java.lang.String encoding)
Reinitialise.


ReInit

public void ReInit(java.io.Reader stream)
Reinitialise.


ReInit

public void ReInit(EnglishLexerTokenManager tm)
Reinitialise.


getNextToken

public final Token getNextToken()
Get the next Token.


getToken

public final Token getToken(int index)
Get the specific Token.


generateParseException

public ParseException generateParseException()
Generate ParseException.


enable_tracing

public final void enable_tracing()
Enable tracing.


disable_tracing

public final void disable_tracing()
Disable tracing.