Class NameFinderDL

java.lang.Object
opennlp.dl.AbstractDL
opennlp.dl.namefinder.NameFinderDL
All Implemented Interfaces:
AutoCloseable, opennlp.tools.namefind.TokenNameFinder

@ThreadSafe public class NameFinderDL extends AbstractDL implements opennlp.tools.namefind.TokenNameFinder
An implementation of TokenNameFinder that uses ONNX models.

Tokenization performs BERT basic tokenization (text normalization) before wordpiece, see BertTokenizer. Input text is not lower cased by default, because named entity recognition models are commonly cased: capitalization is a strong signal for entity boundaries. For uncased models, set InferenceOptions.setLowerCase(boolean) to true.

This class is thread-safe and may be shared across threads, provided the supplied SentenceDetector is itself thread-safe (e.g.

invalid reference
opennlp.tools.sentdetect.SentenceDetectorME
, which is @ThreadSafe). Inference holds no per-call instance state, the relevant InferenceOptions values are snapshotted into final fields at construction (so mutating the passed options afterwards does not affect a shared instance), and the underlying OrtSession supports concurrent execution. This thread-safety guarantee applies until AbstractDL.close() is called; callers must not race close() with inference methods.

See Also:
  • Field Details

    • I_PER

      public static final String I_PER
      Example person labels; retained for reference. Decoding handles any B-/I- type.
      See Also:
    • B_PER

      public static final String B_PER
      See Also:
    • SEPARATOR

      public static final String SEPARATOR
      See Also:
    • CLS_TOKEN

      public static final String CLS_TOKEN
      See Also:
    • PREFIX_BEGIN

      public static final String PREFIX_BEGIN
      Prefix used by BIO labels for the first token in an entity span.
      See Also:
    • PREFIX_INSIDE

      public static final String PREFIX_INSIDE
      Prefix used by BIO labels for continuation tokens in an entity span.
      See Also:
    • NO_SPACE_BEFORE_TOKENS

      public static final Set<String> NO_SPACE_BEFORE_TOKENS
      Tokens that attach directly to the preceding token when span text is reconstructed.
    • NO_SPACE_AFTER_TOKENS

      public static final Set<String> NO_SPACE_AFTER_TOKENS
      Tokens after which the following token attaches directly when span text is reconstructed.
  • Constructor Details

    • NameFinderDL

      public NameFinderDL(File model, File vocabulary, Map<Integer,String> ids2Labels, opennlp.tools.sentdetect.SentenceDetector sentenceDetector) throws IOException, ai.onnxruntime.OrtException
      Instantiates a name finder using ONNX models.
      Parameters:
      model - The ONNX model file.
      vocabulary - The model file's vocabulary file.
      ids2Labels - The mapping of model output indices to BIO labels. This must be exhaustive over the model's output indices; a token whose predicted index is unmapped raises an IllegalStateException during find(String[]).
      sentenceDetector - The SentenceDetector to be used.
      Throws:
      ai.onnxruntime.OrtException - Thrown if the model cannot be loaded.
      IOException - Thrown if errors occurred loading the model or vocabulary.
    • NameFinderDL

      public NameFinderDL(File model, File vocabulary, Map<Integer,String> ids2Labels, InferenceOptions inferenceOptions, opennlp.tools.sentdetect.SentenceDetector sentenceDetector) throws IOException, ai.onnxruntime.OrtException
      Instantiates a name finder using ONNX models.
      Parameters:
      model - The ONNX model file.
      vocabulary - The model file's vocabulary file.
      ids2Labels - The mapping of model output indices to BIO labels. This must be exhaustive over the model's output indices; a token whose predicted index is unmapped raises an IllegalStateException during find(String[]).
      inferenceOptions - InferenceOptions to control the inference.
      sentenceDetector - The SentenceDetector to be used.
      Throws:
      ai.onnxruntime.OrtException - Thrown if the model cannot be loaded.
      IOException - Thrown if errors occurred loading the model or vocabulary.
  • Method Details

    • find

      public opennlp.tools.util.Span[] find(String[] input)

      This method joins the provided tokens with spaces, sentence-splits the joined text, runs each sentence through the ONNX token-classification model, decodes BIO labels into spans, and resolves those spans back to character offsets in the joined text.

      Specified by:
      find in interface opennlp.tools.namefind.TokenNameFinder
      Throws:
      IllegalStateException - Thrown if inference fails, if the model output shape is not the expected float[batch][token][label] form, if the model output contains no usable label score for a token, or if the model's predicted index for a token is not present in the configured label map.
      IllegalArgumentException - Thrown if a token produced for the input is not present in the vocabulary, which indicates the vocabulary file does not match the model.
    • clearAdaptiveData

      public void clearAdaptiveData()
      Specified by:
      clearAdaptiveData in interface opennlp.tools.namefind.TokenNameFinder