Package opennlp.tools.ml.libsvm.doccat
Enum Class TermWeightingStrategy
- All Implemented Interfaces:
Serializable,Comparable<TermWeightingStrategy>,Constable
Defines strategies for weighting term features in SVM-based text classification.
The weighting strategy determines how raw term occurrences are converted into numeric feature values for the SVM feature vectors.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class java.lang.Enum
Enum.EnumDesc<E extends Enum<E>> -
Enum Constant Summary
Enum ConstantsEnum ConstantDescriptionBinary weighting:1.0if the term is present in the document,0.0otherwise.Logarithmically normalized term frequency:1 + log(tf)for terms that appear at least once,0.0otherwise.Raw term frequency: the number of times a term occurs in a document.TF-IDF (Term Frequency - Inverse Document Frequency):tf * log(N / df), whereNis the total number of documents anddfis the number of documents containing the term. -
Method Summary
Modifier and TypeMethodDescriptionstatic TermWeightingStrategyReturns the enum constant of this class with the specified name.static TermWeightingStrategy[]values()Returns an array containing the constants of this enum class, in the order they are declared.abstract doubleweight(int termFrequency, double inverseDocumentFrequency) Computes the feature weight for a term.Methods inherited from class java.lang.Enum
compareTo, describeConstable, equals, getDeclaringClass, hashCode, name, ordinal, toString, valueOf
-
Enum Constant Details
-
BINARY
Binary weighting:1.0if the term is present in the document,0.0otherwise. Ignores term frequency. -
TERM_FREQUENCY
Raw term frequency: the number of times a term occurs in a document. -
TF_IDF
TF-IDF (Term Frequency - Inverse Document Frequency):tf * log(N / df), whereNis the total number of documents anddfis the number of documents containing the term.This downweights terms that appear in many documents (common terms) and upweights terms that are discriminative.
-
LOG_NORMALIZED_TF
Logarithmically normalized term frequency:1 + log(tf)for terms that appear at least once,0.0otherwise.Dampens the effect of high term frequencies while still distinguishing between present and absent terms.
-
-
Method Details
-
values
Returns an array containing the constants of this enum class, in the order they are declared.- Returns:
- an array containing the constants of this enum class, in the order they are declared
-
valueOf
Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)- Parameters:
name- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
IllegalArgumentException- if this enum class has no constant with the specified nameNullPointerException- if the argument is null
-
weight
public abstract double weight(int termFrequency, double inverseDocumentFrequency) Computes the feature weight for a term.- Parameters:
termFrequency- The number of occurrences of the term in the document.inverseDocumentFrequency- The IDF value for the term (only used byTF_IDF).- Returns:
- The computed feature weight.
-