Class NameFinderDL
- All Implemented Interfaces:
AutoCloseable,opennlp.tools.namefind.TokenNameFinder
TokenNameFinder that uses ONNX models.
Tokenization performs BERT basic tokenization (text normalization)
before wordpiece, see BertTokenizer. Input
text is not lower cased by default, because named entity recognition
models are commonly cased: capitalization is a strong signal for entity
boundaries. For uncased models, set
InferenceOptions.setLowerCase(boolean) to true.
This class is thread-safe and may be shared across threads, provided the supplied
SentenceDetector is itself thread-safe (e.g.
invalid reference
opennlp.tools.sentdetect.SentenceDetectorME@ThreadSafe). Inference holds no per-call instance state, the relevant
InferenceOptions values are snapshotted into final fields at construction (so
mutating the passed options afterwards does not affect a shared instance), and the
underlying OrtSession supports concurrent execution. This thread-safety
guarantee applies until AbstractDL.close() is called; callers must not race
close() with inference methods.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final Stringstatic final StringExample person labels; retained for reference.Tokens after which the following token attaches directly when span text is reconstructed.Tokens that attach directly to the preceding token when span text is reconstructed.static final StringPrefix used by BIO labels for the first token in an entity span.static final StringPrefix used by BIO labels for continuation tokens in an entity span.static final StringFields inherited from class opennlp.dl.AbstractDL
ATTENTION_MASK, INPUT_IDS, TOKEN_TYPE_IDS -
Constructor Summary
ConstructorsConstructorDescriptionNameFinderDL(File model, File vocabulary, Map<Integer, String> ids2Labels, InferenceOptions inferenceOptions, opennlp.tools.sentdetect.SentenceDetector sentenceDetector) Instantiates aname finderusing ONNX models.NameFinderDL(File model, File vocabulary, Map<Integer, String> ids2Labels, opennlp.tools.sentdetect.SentenceDetector sentenceDetector) Instantiates aname finderusing ONNX models. -
Method Summary
Modifier and TypeMethodDescriptionvoidopennlp.tools.util.Span[]Methods inherited from class opennlp.dl.AbstractDL
close, loadVocab
-
Field Details
-
I_PER
Example person labels; retained for reference. Decoding handles any B-/I- type.- See Also:
-
B_PER
- See Also:
-
SEPARATOR
- See Also:
-
CLS_TOKEN
- See Also:
-
PREFIX_BEGIN
Prefix used by BIO labels for the first token in an entity span.- See Also:
-
PREFIX_INSIDE
Prefix used by BIO labels for continuation tokens in an entity span.- See Also:
-
NO_SPACE_BEFORE_TOKENS
Tokens that attach directly to the preceding token when span text is reconstructed. -
NO_SPACE_AFTER_TOKENS
Tokens after which the following token attaches directly when span text is reconstructed.
-
-
Constructor Details
-
NameFinderDL
public NameFinderDL(File model, File vocabulary, Map<Integer, String> ids2Labels, opennlp.tools.sentdetect.SentenceDetector sentenceDetector) throws IOException, ai.onnxruntime.OrtExceptionInstantiates aname finderusing ONNX models.- Parameters:
model- The ONNX model file.vocabulary- The model file's vocabulary file.ids2Labels- The mapping of model output indices to BIO labels. This must be exhaustive over the model's output indices; a token whose predicted index is unmapped raises anIllegalStateExceptionduringfind(String[]).sentenceDetector- TheSentenceDetectorto be used.- Throws:
ai.onnxruntime.OrtException- Thrown if themodelcannot be loaded.IOException- Thrown if errors occurred loading themodelorvocabulary.
-
NameFinderDL
public NameFinderDL(File model, File vocabulary, Map<Integer, String> ids2Labels, InferenceOptions inferenceOptions, opennlp.tools.sentdetect.SentenceDetector sentenceDetector) throws IOException, ai.onnxruntime.OrtExceptionInstantiates aname finderusing ONNX models.- Parameters:
model- The ONNX model file.vocabulary- The model file's vocabulary file.ids2Labels- The mapping of model output indices to BIO labels. This must be exhaustive over the model's output indices; a token whose predicted index is unmapped raises anIllegalStateExceptionduringfind(String[]).inferenceOptions-InferenceOptionsto control the inference.sentenceDetector- TheSentenceDetectorto be used.- Throws:
ai.onnxruntime.OrtException- Thrown if themodelcannot be loaded.IOException- Thrown if errors occurred loading themodelorvocabulary.
-
-
Method Details
-
find
This method joins the provided tokens with spaces, sentence-splits the joined text, runs each sentence through the ONNX token-classification model, decodes BIO labels into
spans, and resolves those spans back to character offsets in the joined text.- Specified by:
findin interfaceopennlp.tools.namefind.TokenNameFinder- Throws:
IllegalStateException- Thrown if inference fails, if the model output shape is not the expectedfloat[batch][token][label]form, if the model output contains no usable label score for a token, or if the model's predicted index for a token is not present in the configured label map.IllegalArgumentException- Thrown if a token produced for the input is not present in the vocabulary, which indicates the vocabulary file does not match the model.
-
clearAdaptiveData
public void clearAdaptiveData()- Specified by:
clearAdaptiveDatain interfaceopennlp.tools.namefind.TokenNameFinder
-