Class DocumentCategorizerDL
- All Implemented Interfaces:
AutoCloseable,opennlp.tools.doccat.DocumentCategorizer
DocumentCategorizer that performs document classification
using ONNX models.
Tokenization performs BERT basic tokenization (text normalization)
before wordpiece, see BertTokenizer. Input
text is lower cased and accent stripped by default, matching the uncased
models commonly used for classification. For cased models, set
InferenceOptions.setLowerCase(boolean) to false.
This class is thread-safe and may be shared across threads, provided the supplied
ClassificationScoringStrategy is thread-safe (the built-in
AverageClassificationScoringStrategy is stateless).
Inference holds no per-call instance state, the relevant InferenceOptions values
are snapshotted into final fields at construction (so mutating the passed options
afterwards does not affect a shared instance), and the underlying OrtSession
supports concurrent execution. This thread-safety guarantee applies until
AbstractDL.close() is called; callers must not race close() with inference
methods.
- See Also:
-
Field Summary
Fields inherited from class opennlp.dl.AbstractDL
ATTENTION_MASK, INPUT_IDS, TOKEN_TYPE_IDS -
Constructor Summary
ConstructorsConstructorDescriptionDocumentCategorizerDL(File model, File vocabulary, File config, ClassificationScoringStrategy classificationScoringStrategy, InferenceOptions inferenceOptions) Instantiates adocument categorizerusing ONNX models.DocumentCategorizerDL(File model, File vocabulary, Map<Integer, String> categories, ClassificationScoringStrategy classificationScoringStrategy, InferenceOptions inferenceOptions) Instantiates adocument categorizerusing ONNX models. -
Method Summary
Modifier and TypeMethodDescriptiondouble[]categorize(String[] strings) Categorizes the document, failing loudly rather than returning an invalid distribution: malformed input is rejected withIllegalArgumentException, and any failure executing the model is surfaced as anIllegalStateException(cause preserved).double[]categorize(String[] strings, Map<String, Object> map) getAllResults(double[] doubles) getBestCategory(double[] doubles) getCategory(int i) intintsortedScoreMap(String[] strings) Methods inherited from class opennlp.dl.AbstractDL
close, loadVocab
-
Constructor Details
-
DocumentCategorizerDL
public DocumentCategorizerDL(File model, File vocabulary, Map<Integer, String> categories, ClassificationScoringStrategy classificationScoringStrategy, InferenceOptions inferenceOptions) throws IOException, ai.onnxruntime.OrtExceptionInstantiates adocument categorizerusing ONNX models.- Parameters:
model- The ONNX model file.vocabulary- The model file's vocabulary file.categories- The categories.classificationScoringStrategy- Implementation ofClassificationScoringStrategyused to calculate the classification scores given the score of each individual document part.inferenceOptions-InferenceOptionsto control the inference.- Throws:
ai.onnxruntime.OrtException- Thrown if themodelcannot be loaded.IOException- Thrown if errors occurred loading themodelorvocabulary.
-
DocumentCategorizerDL
public DocumentCategorizerDL(File model, File vocabulary, File config, ClassificationScoringStrategy classificationScoringStrategy, InferenceOptions inferenceOptions) throws IOException, ai.onnxruntime.OrtException Instantiates adocument categorizerusing ONNX models.- Parameters:
model- The ONNX model file.vocabulary- The model file's vocabulary file.config- The model's config file. The file will be used to determine the classification categories.classificationScoringStrategy- Implementation ofClassificationScoringStrategyused to calculate the classification scores given the score of each individual document part.inferenceOptions-InferenceOptionsto control the inference.- Throws:
ai.onnxruntime.OrtException- Thrown if themodelcannot be loaded.IOException- Thrown if errors occurred loading themodelorvocabulary.
-
-
Method Details
-
categorize
Categorizes the document, failing loudly rather than returning an invalid distribution: malformed input is rejected withIllegalArgumentException, and any failure executing the model is surfaced as anIllegalStateException(cause preserved).- Specified by:
categorizein interfaceopennlp.tools.doccat.DocumentCategorizer- Parameters:
strings- The document to categorize;strings[0]is classified.- Returns:
- The per-category probabilities.
- Throws:
IllegalArgumentException- Ifstringsisnullor empty.IllegalStateException- If inference fails or the model returns an unexpected output.
-
categorize
- Specified by:
categorizein interfaceopennlp.tools.doccat.DocumentCategorizer
-
getBestCategory
- Specified by:
getBestCategoryin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getIndex
- Specified by:
getIndexin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getCategory
- Specified by:
getCategoryin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getNumberOfCategories
public int getNumberOfCategories()- Specified by:
getNumberOfCategoriesin interfaceopennlp.tools.doccat.DocumentCategorizer
-
getAllResults
- Specified by:
getAllResultsin interfaceopennlp.tools.doccat.DocumentCategorizer
-
scoreMap
- Specified by:
scoreMapin interfaceopennlp.tools.doccat.DocumentCategorizer
-
sortedScoreMap
- Specified by:
sortedScoreMapin interfaceopennlp.tools.doccat.DocumentCategorizer
-