Enum Class TermWeightingStrategy

java.lang.Object
java.lang.Enum<TermWeightingStrategy>
opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
All Implemented Interfaces:
Serializable, Comparable<TermWeightingStrategy>, Constable

public enum TermWeightingStrategy extends Enum<TermWeightingStrategy>
Defines strategies for weighting term features in SVM-based text classification.

The weighting strategy determines how raw term occurrences are converted into numeric feature values for the SVM feature vectors.

See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from class java.lang.Enum

    Enum.EnumDesc<E extends Enum<E>>
  • Enum Constant Summary

    Enum Constants
    Enum Constant
    Description
    Binary weighting: 1.0 if the term is present in the document, 0.0 otherwise.
    Logarithmically normalized term frequency: 1 + log(tf) for terms that appear at least once, 0.0 otherwise.
    Raw term frequency: the number of times a term occurs in a document.
    TF-IDF (Term Frequency - Inverse Document Frequency): tf * log(N / df), where N is the total number of documents and df is the number of documents containing the term.
  • Method Summary

    Modifier and Type
    Method
    Description
    Returns the enum constant of this class with the specified name.
    Returns an array containing the constants of this enum class, in the order they are declared.
    abstract double
    weight(int termFrequency, double inverseDocumentFrequency)
    Computes the feature weight for a term.

    Methods inherited from class java.lang.Enum

    compareTo, describeConstable, equals, getDeclaringClass, hashCode, name, ordinal, toString, valueOf

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait
  • Enum Constant Details

    • BINARY

      public static final TermWeightingStrategy BINARY
      Binary weighting: 1.0 if the term is present in the document, 0.0 otherwise. Ignores term frequency.
    • TERM_FREQUENCY

      public static final TermWeightingStrategy TERM_FREQUENCY
      Raw term frequency: the number of times a term occurs in a document.
    • TF_IDF

      public static final TermWeightingStrategy TF_IDF
      TF-IDF (Term Frequency - Inverse Document Frequency): tf * log(N / df), where N is the total number of documents and df is the number of documents containing the term.

      This downweights terms that appear in many documents (common terms) and upweights terms that are discriminative.

    • LOG_NORMALIZED_TF

      public static final TermWeightingStrategy LOG_NORMALIZED_TF
      Logarithmically normalized term frequency: 1 + log(tf) for terms that appear at least once, 0.0 otherwise.

      Dampens the effect of high term frequencies while still distinguishing between present and absent terms.

  • Method Details

    • values

      public static TermWeightingStrategy[] values()
      Returns an array containing the constants of this enum class, in the order they are declared.
      Returns:
      an array containing the constants of this enum class, in the order they are declared
    • valueOf

      public static TermWeightingStrategy valueOf(String name)
      Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum class has no constant with the specified name
      NullPointerException - if the argument is null
    • weight

      public abstract double weight(int termFrequency, double inverseDocumentFrequency)
      Computes the feature weight for a term.
      Parameters:
      termFrequency - The number of occurrences of the term in the document.
      inverseDocumentFrequency - The IDF value for the term (only used by TF_IDF).
      Returns:
      The computed feature weight.