| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectemolib.util.proc.TextDataProcessor
emolib.classifier.Classifier
emolib.classifier.machinelearning.ARNReduced
public class ARNReduced
The ARNReduced classifies according to a cosine similarity in a weighted Vector Space Model with co-occurrences, which are assumed to capture the style in text.
The Associative Relational Network - Reduced is word co-occurrence network-based model, see the figure below, which constructs a Vector Space Model (VSM) with a term selection method "on the fly" based on the observation of test features (Alías et al., 2008). This term selection refinement is reported to improve the classical VSM for classification. Dense vectors representing the input text and the class are retrieved (no learning process is involved) and evaluated by the cosine similarity measure. The basic hypothesis in using the ARN-R for classification is the contiguity hypothesis, where terms in the same class form a contiguous region and regions of different classes do not overlap.
  
 
The ARN-R also provides several methods, i.e., criteria, 1) to weight the features in order to enhance their discriminative features, and 2) to select the most relevant features in order to reduce the sparsity in the VSM. These approaches intend to simplify the model in order to generalise better.
In addition, the ARNReduced provides a classical VSM implementation which enables the retrieval of sparse vectors, and therefore standardises the interface to the textual features for any vector-based classifier.
 --
 (Alías et al., 2008) Francesc Alías, Xavier Sevillano, Joan Claudi Socoró
 and Xavier Gonzalvo, "Towards high quality next-generation Text-to-Speech synthesis: a
 Multidomain approach by automatic domain classification", IEEE Transactions on Audio, Speech
 and Language Processing (Special issue on New Approaches to Statistical Speech and Text Processing)
 (ISSN 1558-7916), vol. 16 (7), pp. 1340-1354, September.
 
| Nested Class Summary | |
|---|---|
|  class | ARNReduced.GraphGeneric graph inner class. | 
|  class | ARNReduced.GraphElementInner class representing an element of the graph. | 
| Field Summary | |
|---|---|
| static java.lang.String | PROP_EXTERNAL_FILEProperty to indicate a pre-trained classifier. | 
| Constructor Summary | |
|---|---|
| ARNReduced()Main constructor of this classifier. | |
| Method Summary | |
|---|---|
|  void | applyModelTermWeighing(ARNReduced.Graph inputGraph,
                       int cat)Method to weight the terms of corresponding to the model vector. | 
|  void | applyTermWeighing(ARNReduced.Graph inputGraph,
                  int cat)Method to apply a term weighting methodology to the given graph. | 
|  ARNReduced.Graph | buildFullGraph(ARNReduced.Graph input)Function to build a full graph with the term frequencies given by the input terms. | 
|  ARNReduced.Graph | buildGraph(FeatureBox inputFeatures)Function to build a graph from input features. | 
|  int | getBigramVocabularySize(int bigramFreqThreshold,
                        java.lang.String cat)Function to retrieve the number of terms (vocabulary size, bigrams alone) which frequency is greater than the given threshold, wrt a given category. | 
|  java.lang.String | getCategory(FeatureBox inputFeatures)The function that decides the most appropriate emotional category. | 
|  java.util.ArrayList<ARNReduced.Graph> | getCategoryGraphs()Function to recover the category-specific graphs. | 
|  java.util.HashMap | getCategoryHash()Function to retrieve a hash map of the categories to deal with. | 
|  java.util.ArrayList<java.lang.String> | getCategoryList()Function to retrieve a list of the categories to deal with. | 
|  int | getCorpusSize(java.lang.String cat)Function to retrieve the corpus size (number of words) of the given category. | 
|  int | getCorpusTupleSize(java.lang.String cat)Function to retrieve the corpus size of tuples of the given category. | 
|  void | getOrderedList(java.lang.String cat,
               java.util.ArrayList<java.lang.String> wList,
               java.util.ArrayList<java.lang.Integer> fList)Function to retrieve a sorted list (in frequency descending order) of words. | 
|  void | getOrderedTupleList(java.lang.String cat,
                    java.util.ArrayList<java.lang.String> wList,
                    java.util.ArrayList<java.lang.Integer> fList)Function to retrieve a sorted list (in frequency descending order) of tuples. | 
|  float | getSimilarity(FeatureBox inputText,
              java.lang.String cat)Function to retrieve the similarity of a given text with a given category. | 
|  ARNReduced.Graph | getVocabularyGraph()Function to recover the full vocabulary graph. | 
|  int | getVocabularySize(int wordFreqThreshold,
                  java.lang.String cat)Function to retrieve the number of terms (vocabulary size, words alone) which frequency is greater than the given threshold, wrt a given category. | 
|  void | initialize()Method to initialize the Classifier. | 
|  void | load(java.lang.String path)Generic function to load a previously saved classifier. | 
|  void | newProperties(PropertySheet ps)This method is called when this configurable component has new data. | 
|  void | register(java.lang.String name,
         Registry registry)Register my properties. | 
|  void | resetExamples()Method to reset the classifier and flush the training examples. | 
|  void | save(java.lang.String path)Generic method to save the fully fledged classifier into a given file path. | 
|  void | setCOF(boolean flag)Method to set the assessment of co-ocurrence frequencies (tuples actually). | 
|  void | setFeatSelChi2(boolean flag,
               int numFeats)Method to set the Chi square global feature selection. | 
|  void | setFeatSelMI(boolean flag,
             int numFeats)Method to set the Mutual Information global feature selection. | 
|  void | setFeatSelTF(boolean flag,
             int numFeats)Method to set the Term Frequency global feature selection. | 
|  void | setPOS(boolean flag)Method to set the assessment of POS tags (grammatical analysis). | 
|  void | setSimilarityMeasure(java.lang.String simil)Method to set the similarity measure. | 
|  void | setStems(boolean flag)Method to set the assessment of stemmed terms. | 
|  void | setSynonyms(boolean flag)Method to set the assessment of synonyms. | 
|  void | setTermWeighingMeasure(java.lang.String twm)Method to set the term weighting measure. | 
|  void | simpleClassification()Functionality test. | 
|  void | trainingProcedure()Generic training procedure. | 
| Methods inherited from class emolib.classifier.Classifier | 
|---|
| applyClassification, getData, getListOfExampleCategories, getListOfExampleFeatures, inputTrainingExample, train | 
| Methods inherited from class emolib.util.proc.TextDataProcessor | 
|---|
| flush, getName, getPredecessor, setPredecessor, toString | 
| Methods inherited from class java.lang.Object | 
|---|
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait | 
| Field Detail | 
|---|
public static final java.lang.String PROP_EXTERNAL_FILE
| Constructor Detail | 
|---|
public ARNReduced()
| Method Detail | 
|---|
public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Configurable
register in interface Configurableregister in class Classifiername - the name of the componentregistry - the registry for this component
PropertyException
public void newProperties(PropertySheet ps)
                   throws PropertyException
Configurable
newProperties in interface ConfigurablenewProperties in class Classifierps - a property sheet holding the new data
PropertyException - if there is a problem with the properties.public void initialize()
initialize in interface DataProcessorinitialize in class Classifierpublic java.util.ArrayList<java.lang.String> getCategoryList()
public java.util.HashMap getCategoryHash()
public java.util.ArrayList<ARNReduced.Graph> getCategoryGraphs()
public ARNReduced.Graph getVocabularyGraph()
public int getCorpusSize(java.lang.String cat)
cat - The given category.
public int getCorpusTupleSize(java.lang.String cat)
cat - The given category.
public int getVocabularySize(int wordFreqThreshold,
                             java.lang.String cat)
wordFreqThreshold - The word frequency treshold.cat - The given category.
public int getBigramVocabularySize(int bigramFreqThreshold,
                                   java.lang.String cat)
bigramFreqThreshold - The bigram frequency treshold.cat - The given category.
public void getOrderedList(java.lang.String cat,
                           java.util.ArrayList<java.lang.String> wList,
                           java.util.ArrayList<java.lang.Integer> fList)
cat - The given category.wList - The list if words to produce.fList - The list of frequencies to produce.
public void getOrderedTupleList(java.lang.String cat,
                                java.util.ArrayList<java.lang.String> wList,
                                java.util.ArrayList<java.lang.Integer> fList)
cat - The given category.wList - The list if tuples to produce.fList - The list of frequencies to produce.public void setTermWeighingMeasure(java.lang.String twm)
twm - The term weighting measure.public void setSimilarityMeasure(java.lang.String simil)
simil - The similarity measure.public void setCOF(boolean flag)
flag - The set flag.public void setPOS(boolean flag)
flag - The set flag.public void setSynonyms(boolean flag)
flag - The set flag.public void setStems(boolean flag)
flag - The set flag.
public void setFeatSelMI(boolean flag,
                         int numFeats)
flag - The set flag.numFeats - The number of feats per class to select.
public void setFeatSelChi2(boolean flag,
                           int numFeats)
flag - The set flag.numFeats - The number of feats per class to select.
public void setFeatSelTF(boolean flag,
                         int numFeats)
flag - The set flag.numFeats - The number of feats per class to select.
public float getSimilarity(FeatureBox inputText,
                           java.lang.String cat)
inputText - The given text.cat - The given category.
public void applyModelTermWeighing(ARNReduced.Graph inputGraph,
                                   int cat)
The - given graph to weight.The - given catetory for supervised term weighting methods.
public void applyTermWeighing(ARNReduced.Graph inputGraph,
                              int cat)
The - given graph to weight.The - given catetory for supervised term weighting methods.public ARNReduced.Graph buildGraph(FeatureBox inputFeatures)
inputFeatures - The input features.
public ARNReduced.Graph buildFullGraph(ARNReduced.Graph input)
input - The input text graph.
public java.lang.String getCategory(FeatureBox inputFeatures)
Classifier
getCategory in class ClassifierinputFeatures - The input emotional features.
public void trainingProcedure()
Classifier
trainingProcedure in class Classifierpublic void save(java.lang.String path)
Classifier
save in class Classifierpath - The file path to save the classifier.public void load(java.lang.String path)
Classifier
load in class Classifierpath - The path of the file which contains the previously saved
 classifier.public void resetExamples()
Classifier
resetExamples in class Classifierpublic void simpleClassification()
| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||