Enum Class FeatureSelectionStrategy

java.lang.Object
java.lang.Enum<FeatureSelectionStrategy>
opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
All Implemented Interfaces:
Serializable, Comparable<FeatureSelectionStrategy>, Constable

public enum FeatureSelectionStrategy extends Enum<FeatureSelectionStrategy>
Defines strategies for selecting the most informative features for SVM-based text classification.

Feature selection reduces the dimensionality of the feature space by retaining only the features that are most useful for distinguishing between categories.

See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from class java.lang.Enum

    Enum.EnumDesc<E extends Enum<E>>
  • Enum Constant Summary

    Enum Constants
    Enum Constant
    Description
    Chi-Square based feature selection: features are ranked by the maximum chi-square statistic across all categories, and only the top-k features are retained.
    Document Frequency based feature selection: features are ranked by the number of documents they appear in, and only the top-k features are retained.
    Information Gain based feature selection: features are ranked by their information gain score, and only the top-k features are retained.
    No feature selection: all features from the vocabulary are used.
    Term Frequency based feature selection: features are ranked by their total occurrence count across all documents in the corpus, and only the top-k features are retained.
  • Method Summary

    Modifier and Type
    Method
    Description
    Returns the enum constant of this class with the specified name.
    Returns an array containing the constants of this enum class, in the order they are declared.

    Methods inherited from class java.lang.Enum

    compareTo, describeConstable, equals, getDeclaringClass, hashCode, name, ordinal, toString, valueOf

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, wait, wait, wait
  • Enum Constant Details

    • NONE

      public static final FeatureSelectionStrategy NONE
      No feature selection: all features from the vocabulary are used.
    • INFORMATION_GAIN

      public static final FeatureSelectionStrategy INFORMATION_GAIN
      Information Gain based feature selection: features are ranked by their information gain score, and only the top-k features are retained.

      Information gain measures the reduction in entropy of the class variable achieved by observing the presence or absence of a feature.

    • CHI_SQUARE

      public static final FeatureSelectionStrategy CHI_SQUARE
      Chi-Square based feature selection: features are ranked by the maximum chi-square statistic across all categories, and only the top-k features are retained.

      Chi-square measures the statistical dependence between a feature and a class label. A high chi-square value indicates that the feature and the class are not independent.

    • TERM_FREQUENCY

      public static final FeatureSelectionStrategy TERM_FREQUENCY
      Term Frequency based feature selection: features are ranked by their total occurrence count across all documents in the corpus, and only the top-k features are retained.

      This is a simple baseline strategy that favors frequent terms. It can be useful to filter out very rare features that may be noise.

    • DOCUMENT_FREQUENCY

      public static final FeatureSelectionStrategy DOCUMENT_FREQUENCY
      Document Frequency based feature selection: features are ranked by the number of documents they appear in, and only the top-k features are retained.

      Unlike TERM_FREQUENCY, this counts each feature at most once per document, regardless of how often it occurs within that document.

  • Method Details

    • values

      public static FeatureSelectionStrategy[] values()
      Returns an array containing the constants of this enum class, in the order they are declared.
      Returns:
      an array containing the constants of this enum class, in the order they are declared
    • valueOf

      public static FeatureSelectionStrategy valueOf(String name)
      Returns the enum constant of this class with the specified name. The string must match exactly an identifier used to declare an enum constant in this class. (Extraneous whitespace characters are not permitted.)
      Parameters:
      name - the name of the enum constant to be returned.
      Returns:
      the enum constant with the specified name
      Throws:
      IllegalArgumentException - if this enum class has no constant with the specified name
      NullPointerException - if the argument is null