emolib.pos.stanford
Class EnglishStanford

java.lang.Object
  extended by emolib.util.proc.TextDataProcessor
      extended by emolib.pos.POSTagger
          extended by emolib.pos.stanford.EnglishStanford
All Implemented Interfaces:
Configurable, DataProcessor

public class EnglishStanford
extends POSTagger

The EnglishStanford class performs the Part-Of-Speech (POS) tagging process in English using the Stanford POS tagger.

The Stanford POS tagger is a high performance POS tagger that makes use of several enriched features as well as a bidirectional structure (dependency network) to compute the predictions.

The necessary files (models) to use the Stanford POS tagger in English are available in the dat folder, under the stanford-postagger/english folder name. The configuration parameter resources_path must lead to the desired model file.

This POS tagger makes mistakes. Stanford POS tagger is a probabilistic POS tagger, so it may be inaccurate although the correctness performance is slightly better than 97% using the enriched bidirectional architecture.

Author:
David García, Alexandre Trilla (atrilla@salle.url.edu)

Field Summary
static java.lang.String PROP_RESOURCES_PATH
          The name of the property indicating the location of the English model.
 
Constructor Summary
EnglishStanford()
          Main constructor of the SpanishQTag.
 
Method Summary
 void applyPOSTagging(TextData inputTextDataObject)
          Method to perform the POS tagging process.
 void initialize()
          Method to initialize the SpanishQTag.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 
Methods inherited from class emolib.pos.POSTagger
getData
 
Methods inherited from class emolib.util.proc.TextDataProcessor
flush, getName, getPredecessor, setPredecessor, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_RESOURCES_PATH

public static final java.lang.String PROP_RESOURCES_PATH
The name of the property indicating the location of the English model.

See Also:
Constant Field Values
Constructor Detail

EnglishStanford

public EnglishStanford()
Main constructor of the SpanishQTag.

Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Overrides:
register in class POSTagger
Parameters:
name - the name of the component
registry - the registry for this component
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class POSTagger
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

initialize

public void initialize()
Method to initialize the SpanishQTag.

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class POSTagger

applyPOSTagging

public void applyPOSTagging(TextData inputTextDataObject)
Method to perform the POS tagging process.

Specified by:
applyPOSTagging in class POSTagger
Parameters:
inputTextDataObject - The TextData object to process.