Class SentenceVectorsDL

java.lang.Object
opennlp.dl.AbstractDL
opennlp.dl.vectors.SentenceVectorsDL
All Implemented Interfaces:
AutoCloseable

@ThreadSafe public class SentenceVectorsDL extends AbstractDL
Facilitates the generation of sentence vectors using a sentence-transformers model converted to ONNX.

The model inputs follow the standard single-segment BERT encoding: attention_mask is 1 for every real token and token_type_ids is 0 throughout.

Release note (OpenNLP 3.0.0): prior releases sent an all-zero attention_mask and all-one token_type_ids, so the encoder attended to nothing and the output vectors were incorrect. Additionally, tokenization now performs BERT basic tokenization (lower casing and accent stripping by default, see BertTokenizer) before wordpiece. Output vectors change with the corrected encoding and tokenization; any embeddings persisted from the previous behavior are not comparable with the corrected output and must be re-embedded.

This class is thread-safe and may be shared across threads: getVectors(String) holds no per-call instance state and the underlying OrtSession supports concurrent execution. This thread-safety guarantee applies until AbstractDL.close() is called; callers must not race close() with inference methods.

  • Constructor Details

    • SentenceVectorsDL

      public SentenceVectorsDL(File model, File vocabulary) throws ai.onnxruntime.OrtException, IOException
      Instantiates a sentence vector generator for an uncased model. Input text is lower cased and accent stripped during tokenization, as required by uncased models such as the sentence-transformers MiniLM family.
      Parameters:
      model - The file name of a sentence vectors ONNX model.
      vocabulary - The file name of the vocabulary file for the model.
      Throws:
      ai.onnxruntime.OrtException - Thrown if the model cannot be loaded.
      IOException - Thrown if errors occurred loading the model or vocabulary.
    • SentenceVectorsDL

      public SentenceVectorsDL(File model, File vocabulary, boolean lowerCase) throws ai.onnxruntime.OrtException, IOException
      Instantiates a sentence vector generator using ONNX models.
      Parameters:
      model - The file name of a sentence vectors ONNX model.
      vocabulary - The file name of the vocabulary file for the model.
      lowerCase - true for uncased models (lower casing and accent stripping during tokenization), false for cased models.
      Throws:
      ai.onnxruntime.OrtException - Thrown if the model cannot be loaded.
      IOException - Thrown if errors occurred loading the model or vocabulary.
  • Method Details

    • getVectors

      public float[] getVectors(String sentence) throws ai.onnxruntime.OrtException
      Generates vectors given a sentence.
      Parameters:
      sentence - The input sentence.
      Returns:
      The sentence vector.
      Throws:
      ai.onnxruntime.OrtException - Thrown if an error occurs during inference.