emolib.classifier.eval
Class KFoldXValidation

java.lang.Object
  extended by emolib.classifier.eval.KFoldXValidation

public class KFoldXValidation
extends java.lang.Object

The KFoldXValidation class performs the k-fold cross-validation method on the stratified input data file.

The k-fold cross-validation method divides the input file into `k' parts, trains the specified classifier with `k-1' parts and tests it with the remaining part. The effectiveness is scored using a macroaveraging method (precision and recall calculations) and the results obtained over all iterations are averaged with the arithmetic mean.

It is emphasised that the input dataset needs to be stratified, i.e., each fold must maintain the category balance of the dataset.

This class uses the whole textual affect processing pipeline of EmoLib defined in an external config file, taking advantage from the partial contributions of each module. The KFoldXValidation class seeks the kfoldcv component in the XML config file, so please beware of its existence and correct definition.

The the KFoldXValidation is launched with one fold (test data) and a fixed dataset (training data), it performs the train-test process.

Author:
Alexandre Trilla (atrilla@salle.url.edu)

Constructor Summary
KFoldXValidation()
          Void constructor.
 
Method Summary
 void evaluate(Classifier theClassifier)
          Method to evaluate the dataset and output the result of the k-fold cross-validation process.
 int indexOf(java.lang.String query, java.lang.String[] theArray)
          Function to get the index of a query in an array of strings.
 void inputFixedInstance(java.lang.String inputFixedInstance)
          Method to include a new fixed instance.
 void inputInstance(java.lang.String inputInstance)
          Mehtod to include a new input corpus instance into the system.
static void main(java.lang.String[] args)
          The main method of the KFoldXValidation application.
 void printSynopsis()
          Prints the synopsis.
 void setBasicCategories(java.lang.String inputCategories)
          Method to set the basic categories of the system.
 void setFixedDataset()
          Method to set only if a fixed dataset is given for training.
 void setNumberOfFolds(int nf)
          Method to set the number of folds.
 void setTextProcessingPipeline(AffectiveTagger pipe)
          Method to set the text processing pipeline.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KFoldXValidation

public KFoldXValidation()
Void constructor.

Method Detail

setNumberOfFolds

public void setNumberOfFolds(int nf)
Method to set the number of folds.

Parameters:
The - number of folds.

setTextProcessingPipeline

public void setTextProcessingPipeline(AffectiveTagger pipe)
Method to set the text processing pipeline.

Parameters:
Reference - to the pipeline;

setFixedDataset

public void setFixedDataset()
Method to set only if a fixed dataset is given for training.


setBasicCategories

public void setBasicCategories(java.lang.String inputCategories)
Method to set the basic categories of the system. This method requires a string with the different categories separated by a hyphen.

Parameters:
inputCategories - The categories.

inputInstance

public void inputInstance(java.lang.String inputInstance)
Mehtod to include a new input corpus instance into the system. The category of this training instance must be space separated at the end of the text.

Parameters:
inputInstance - The input instance.

inputFixedInstance

public void inputFixedInstance(java.lang.String inputFixedInstance)
Method to include a new fixed instance. A method similar to inputInstance. With this method, for each fold in the cross-validation procedure, these fixed instanced will always be present to train the classifier.

Parameters:
inputFixedInstance - The input fixed instance.

evaluate

public void evaluate(Classifier theClassifier)
Method to evaluate the dataset and output the result of the k-fold cross-validation process. The performance metrics are evaluated with macroaveraging, thus the balance of the dataset does not bias the results, and the partial results in each fold are finally averaged with the arithmetic mean.

Parameters:
theClassifier - The classifier.

printSynopsis

public void printSynopsis()
Prints the synopsis.


indexOf

public int indexOf(java.lang.String query,
                   java.lang.String[] theArray)
Function to get the index of a query in an array of strings.

Parameters:
query - The query.
theArray - The array of strings.
Returns:
The index of the query in the array.

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
The main method of the KFoldXValidation application.

Parameters:
args - The input arguments.
Throws:
java.lang.Exception