emolib.util.eval
Class CorpusVocabularyStatistics

java.lang.Object
  extended by emolib.util.eval.CorpusVocabularyStatistics

public class CorpusVocabularyStatistics
extends java.lang.Object

The CorpusVocabularyStatistics class performs a vocabulary analysis on the input text file.

The CorpusVocabularyStatistics outputs the total vocabulary size (size of training corpus), the vocabulary size (number of words with a frequency over 20, 15, 10, 5 and 3) and the amount of observed bigrams wrt the number of possible events.

Author:
Alexandre Trilla (atrilla@salle.url.edu)

Constructor Summary
CorpusVocabularyStatistics()
          Void constructor.
 
Method Summary
 FeatureBox getFeatures(java.lang.String text)
          Function to extract the features from the given text.
static void main(java.lang.String[] args)
           
 void printSynopsis()
          Prints the synopsis.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CorpusVocabularyStatistics

public CorpusVocabularyStatistics()
Void constructor.

Method Detail

printSynopsis

public void printSynopsis()
Prints the synopsis.


getFeatures

public FeatureBox getFeatures(java.lang.String text)
Function to extract the features from the given text.

Parameters:
text - The given text.
Returns:
The extracted features.

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception