Index
All Classes and Interfaces|All Packages|Serialized Form
B
- BINARY - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
Binary weighting:
1.0if the term is present in the document,0.0otherwise. - build() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
- Builder() - Constructor for class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
C
- categorize(String[]) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- categorize(String[], Map<String, Object>) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- CHI_SQUARE - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
Chi-Square based feature selection: features are ranked by the maximum chi-square statistic across all categories, and only the top-k features are retained.
D
- DEFAULT - Static variable in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Default limits.
- DeserializationLimits(long, long, long) - Constructor for record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Creates an instance of a
DeserializationLimitsrecord class. - deserialize(InputStream) - Static method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- deserialize(InputStream, SvmDoccatModel.DeserializationLimits) - Static method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- DOCUMENT_FREQUENCY - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
Document Frequency based feature selection: features are ranked by the number of documents they appear in, and only the top-k features are retained.
- DocumentCategorizerSVM - Class in opennlp.tools.ml.libsvm.doccat
-
An implementation of
DocumentCategorizerthat uses Support Vector Machines (SVM) via the zlibsvm library for document classification. - DocumentCategorizerSVM(SvmDoccatModel, FeatureGenerator...) - Constructor for class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
-
Instantiates a
DocumentCategorizerSVMwith a trained model and feature generators.
E
- equals(Object) - Method in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Indicates whether some other object is "equal to" this one.
F
- FeatureSelectionStrategy - Enum Class in opennlp.tools.ml.libsvm.doccat
-
Defines strategies for selecting the most informative features for SVM-based text classification.
G
- getAllResults(double[]) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- getBestCategory(double[]) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- getCategory(int) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- getCategoryToIndex() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getConfiguration() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getFeatureMaxValues() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getFeatureMinValues() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getFeatureSelectionStrategy() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
- getFeatureVocabulary() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getIdfValues() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getIndex(String) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- getIndexToCategory() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getLanguageCode() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getMaxFeatures() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
- getNumberOfCategories() - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- getNumberOfCategories() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getScaleLower() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
- getScaleUpper() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
- getSvmConfiguration() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
- getSvmModel() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
- getTermWeightingStrategy() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
H
- hashCode() - Method in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Returns a hash code value for this object.
I
- INFORMATION_GAIN - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
Information Gain based feature selection: features are ranked by their information gain score, and only the top-k features are retained.
- isScaleFeatures() - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration
L
- LOG_NORMALIZED_TF - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
Logarithmically normalized term frequency:
1 + log(tf)for terms that appear at least once,0.0otherwise.
M
- maxArrayLength() - Method in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Returns the value of the
maxArrayLengthrecord component. - maxDepth() - Method in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Returns the value of the
maxDepthrecord component. - maxRefs() - Method in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Returns the value of the
maxRefsrecord component.
N
- NONE - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
No feature selection: all features from the vocabulary are used.
O
- opennlp.tools.ml.libsvm.doccat - package opennlp.tools.ml.libsvm.doccat
S
- scoreMap(String[]) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- serialize(OutputStream) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel
-
Serializes this model to the given
OutputStreamusing Java object serialization. - setFeatureSelectionStrategy(FeatureSelectionStrategy) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
-
Sets the feature selection strategy.
- setMaxFeatures(int) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
-
Sets the maximum number of features to retain after feature selection.
- setScaleFeatures(boolean) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
-
Sets whether feature values should be scaled to the range [
lower,upper]. - setScaleRange(double, double) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
-
Sets the lower and upper bounds of the feature scaling range.
- setSvmConfiguration(SvmConfiguration) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
-
Sets the underlying SVM configuration.
- setTermWeightingStrategy(TermWeightingStrategy) - Method in class opennlp.tools.ml.libsvm.doccat.SvmDoccatConfiguration.Builder
-
Sets the term weighting strategy.
- sortedScoreMap(String[]) - Method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- SvmDoccatConfiguration - Class in opennlp.tools.ml.libsvm.doccat
-
Configuration for SVM-based document categorization, combining the underlying SVM classifier settings with text-specific parameters for term weighting, feature selection, and feature scaling.
- SvmDoccatConfiguration.Builder - Class in opennlp.tools.ml.libsvm.doccat
-
A builder for
SvmDoccatConfiguration. - SvmDoccatModel - Class in opennlp.tools.ml.libsvm.doccat
-
A model for SVM-based document categorization.
- SvmDoccatModel.DeserializationLimits - Record Class in opennlp.tools.ml.libsvm.doccat
-
Resource limits applied to the
ObjectInputFilterused bySvmDoccatModel.deserialize(InputStream, DeserializationLimits).
T
- TERM_FREQUENCY - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
Term Frequency based feature selection: features are ranked by their total occurrence count across all documents in the corpus, and only the top-k features are retained.
- TERM_FREQUENCY - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
Raw term frequency: the number of times a term occurs in a document.
- TermWeightingStrategy - Enum Class in opennlp.tools.ml.libsvm.doccat
-
Defines strategies for weighting term features in SVM-based text classification.
- TF_IDF - Enum constant in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
TF-IDF (Term Frequency - Inverse Document Frequency):
tf * log(N / df), whereNis the total number of documents anddfis the number of documents containing the term. - toString() - Method in record class opennlp.tools.ml.libsvm.doccat.SvmDoccatModel.DeserializationLimits
-
Returns a string representation of this record class.
- train(String, ObjectStream<DocumentSample>, FeatureGenerator...) - Static method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
-
Trains an SVM-based document categorization model using default configuration (TF-IDF weighting, no feature selection, scaling to [0, 1]).
- train(String, ObjectStream<DocumentSample>, SvmDoccatConfiguration, FeatureGenerator...) - Static method in class opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
-
Trains an SVM-based document categorization model with a custom configuration.
V
- valueOf(String) - Static method in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
Returns the enum constant of this class with the specified name.
- valueOf(String) - Static method in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
Returns the enum constant of this class with the specified name.
- values() - Static method in enum class opennlp.tools.ml.libsvm.doccat.FeatureSelectionStrategy
-
Returns an array containing the constants of this enum class, in the order they are declared.
- values() - Static method in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
Returns an array containing the constants of this enum class, in the order they are declared.
W
- weight(int, double) - Method in enum class opennlp.tools.ml.libsvm.doccat.TermWeightingStrategy
-
Computes the feature weight for a term.
All Classes and Interfaces|All Packages|Serialized Form