emolib.pos.qtag
Class SpanishQTag

java.lang.Object
  extended by emolib.util.proc.TextDataProcessor
      extended by emolib.pos.POSTagger
          extended by emolib.pos.qtag.SpanishQTag
All Implemented Interfaces:
Configurable, DataProcessor

public class SpanishQTag
extends POSTagger

The SpanishQTag class performs the Part-Of-Speech (POS) tagging process in Spanish using the QTag library.

In order to obtain a Spanish version of QTag, the guidelines posted on the blog "Pythonner Zone!" Building a Spanish Part-of-Speech Tagger for Java in 5 Easy Steps.... Basically, the steps have been the following:

  1. Obtaining of QTag, a probabilistic POS tagger developed by Oliver Mason in Java at the University of Birmingham (UK).
  2. Obtaining of the Spanish corpus of the CoNLL 2002 Shared Task with POS tags, available at Resources on Named Entity Recognition and Classification (NERC).
  3. Modification of the corpus: removal of the second tag at the end of each line, removal of the empty lines and removal of the lines with long strings of "=" signs.
  4. Training of QTag through: java qtag.ResourceCreator esp.train.txt qtag-spanish
  5. Usage of the new POS tagger.

The necessary files to generate the tagger in Spanish using QTag are available in the dat/dataset/conll02task folder,

This POS tagger makes mistakes. QTag is a probabilistic POS tagger, so it may be inaccurate. The training Spanish corpus also has incoherences. But if used for what it is meant to be (the disambiguation of the function of nouns, verbs and adjectives in a sentence) this tool does its job successfully.

Author:
David García, Alexandre Trilla (atrilla@salle.url.edu)

Field Summary
static java.lang.String PROP_RESOURCES_PATH
          The name of the property indicating the location of the lexicon and matrix Spanish files.
 
Constructor Summary
SpanishQTag()
          Main constructor of the SpanishQTag.
 
Method Summary
 void applyPOSTagging(TextData inputTextDataObject)
          Method to perform the POS tagging process.
 void initialize()
          Method to initialize the SpanishQTag.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 
Methods inherited from class emolib.pos.POSTagger
getData
 
Methods inherited from class emolib.util.proc.TextDataProcessor
flush, getName, getPredecessor, setPredecessor, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_RESOURCES_PATH

public static final java.lang.String PROP_RESOURCES_PATH
The name of the property indicating the location of the lexicon and matrix Spanish files.

See Also:
Constant Field Values
Constructor Detail

SpanishQTag

public SpanishQTag()
Main constructor of the SpanishQTag.

Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Overrides:
register in class POSTagger
Parameters:
name - the name of the component
registry - the registry for this component
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class POSTagger
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

initialize

public void initialize()
Method to initialize the SpanishQTag.

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class POSTagger

applyPOSTagging

public void applyPOSTagging(TextData inputTextDataObject)
Method to perform the POS tagging process.

Specified by:
applyPOSTagging in class POSTagger
Parameters:
inputTextDataObject - The TextData object to process.