Package opennlp.tools.ml.libsvm.doccat
Class DocumentCategorizerSVM
java.lang.Object
opennlp.tools.ml.libsvm.doccat.DocumentCategorizerSVM
- All Implemented Interfaces:
opennlp.tools.doccat.DocumentCategorizer
public class DocumentCategorizerSVM
extends Object
implements opennlp.tools.doccat.DocumentCategorizer
An implementation of
DocumentCategorizer that uses Support Vector Machines
(SVM) via the zlibsvm library for document classification.
This categorizer supports configurable:
Term weighting(binary, TF, TF-IDF, log-normalized TF)Feature selection(information gain, chi-square, term frequency, document frequency)- Feature scaling to a configurable range (e.g., [0, 1])
- SVM classifier parameters (kernel, cost, gamma, etc.) via
SvmConfiguration
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDocumentCategorizerSVM(SvmDoccatModel model, opennlp.tools.doccat.FeatureGenerator... featureGenerators) Instantiates aDocumentCategorizerSVMwith a trained model and feature generators. -
Method Summary
Modifier and TypeMethodDescriptiondouble[]categorize(String[] text) double[]categorize(String[] text, Map<String, Object> extraInformation) getAllResults(double[] results) getBestCategory(double[] outcome) getCategory(int index) intintsortedScoreMap(String[] text) static SvmDoccatModeltrain(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, opennlp.tools.doccat.FeatureGenerator... featureGenerators) Trains an SVM-based document categorization model using default configuration (TF-IDF weighting, no feature selection, scaling to [0, 1]).static SvmDoccatModeltrain(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, SvmDoccatConfiguration config, opennlp.tools.doccat.FeatureGenerator... featureGenerators) Trains an SVM-based document categorization model with a custom configuration.
-
Constructor Details
-
DocumentCategorizerSVM
public DocumentCategorizerSVM(SvmDoccatModel model, opennlp.tools.doccat.FeatureGenerator... featureGenerators) Instantiates aDocumentCategorizerSVMwith a trained model and feature generators.- Parameters:
model- The trainedSvmDoccatModel. Must not benull.featureGenerators- TheFeatureGeneratorinstances used to extract features. Must not benullor empty.
-
-
Method Details
-
categorize
- Specified by:
categorizein interfaceopennlp.tools.doccat.DocumentCategorizer
-
categorize
- Specified by:
categorizein interfaceopennlp.tools.doccat.DocumentCategorizer
-
getBestCategory
- Specified by:
getBestCategoryin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getIndex
- Specified by:
getIndexin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getCategory
- Specified by:
getCategoryin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getNumberOfCategories
public int getNumberOfCategories()- Specified by:
getNumberOfCategoriesin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getAllResults
- Specified by:
getAllResultsin interfaceopennlp.tools.doccat.DocumentCategorizer
-
scoreMap
- Specified by:
scoreMapin interfaceopennlp.tools.doccat.DocumentCategorizer
-
sortedScoreMap
- Specified by:
sortedScoreMapin interfaceopennlp.tools.doccat.DocumentCategorizer
-
train
public static SvmDoccatModel train(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, opennlp.tools.doccat.FeatureGenerator... featureGenerators) throws IOException Trains an SVM-based document categorization model using default configuration (TF-IDF weighting, no feature selection, scaling to [0, 1]).- Parameters:
lang- The ISO conform language code.samples- TheObjectStreamofDocumentSampleused as input for training.featureGenerators- TheFeatureGeneratorinstances used to extract features.- Returns:
- A trained
SvmDoccatModel. - Throws:
IOException- Thrown if IO errors occurred during training.
-
train
public static SvmDoccatModel train(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.doccat.DocumentSample> samples, SvmDoccatConfiguration config, opennlp.tools.doccat.FeatureGenerator... featureGenerators) throws IOException Trains an SVM-based document categorization model with a custom configuration.- Parameters:
lang- The ISO conform language code.samples- TheObjectStreamofDocumentSampleused as input for training.config- TheSvmDoccatConfigurationcontrolling term weighting, feature selection, scaling, and SVM parameters.featureGenerators- TheFeatureGeneratorinstances used to extract features.- Returns:
- A trained
SvmDoccatModel. - Throws:
IOException- Thrown if IO errors occurred during training.
-