emolib.splitter.bdt
Class SentenceSplitterBDT

java.lang.Object
  extended by emolib.util.proc.TextDataProcessor
      extended by emolib.splitter.SentenceSplitter
          extended by emolib.splitter.bdt.SentenceSplitterBDT
All Implemented Interfaces:
Configurable, DataProcessor

public class SentenceSplitterBDT
extends SentenceSplitter

The SentenceSplitterBDT class performs the sentence segmentation process through a hand-crafted Binary Decision Tree (BDT).

The decision tree for sentence boundary detection has been inspired by the one that appears in (Reichel and Pfitzinger, 2006). Due to the fact that the tokens are independent (they don't end with a punctuation mark) the tree has not been kept the same. Refer to the article for more details.

The diagram below shows this decision tree implementation (YES vs. NO):


Figure 1: Decision tree for sentence boundary detection.

All the input sentences are required to be delimited by either a dot, an exclamation mark or a question mark.

--
(Reichel and Pfitzinger, 2006) Reichel, U.D. and Pfitzinger, H.R., "Text Preprocessing for Speech Synthesis", In Proc. TC-Star Speech to Speech Translation Workshop, pp 207-212., 2006.

Author:
Alexandre Trilla (atrilla@salle.url.edu)

Constructor Summary
SentenceSplitterBDT()
          Main constructor of the SentenceSplitterBDT.
 
Method Summary
 void applySentenceSplitting(TextData inputTextDataObject)
          Method to perform the sentence segmentation process.
 void initialize()
          Method to initialize the SentenceSplitterBDT.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 
Methods inherited from class emolib.splitter.SentenceSplitter
getData
 
Methods inherited from class emolib.util.proc.TextDataProcessor
flush, getName, getPredecessor, setPredecessor, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SentenceSplitterBDT

public SentenceSplitterBDT()
Main constructor of the SentenceSplitterBDT.

Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Overrides:
register in class SentenceSplitter
Parameters:
name - the name of the component
registry - the registry for this component
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class SentenceSplitter
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

initialize

public void initialize()
Method to initialize the SentenceSplitterBDT.

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class SentenceSplitter

applySentenceSplitting

public void applySentenceSplitting(TextData inputTextDataObject)
Method to perform the sentence segmentation process. It is assumed that each data input/output from the system corresponds to a paragraph in the text/document.

Specified by:
applySentenceSplitting in class SentenceSplitter
Parameters:
inputTextDataObject - The TextData object to process.