Class SentenceVectorsDL
- All Implemented Interfaces:
AutoCloseable
The model inputs follow the standard single-segment BERT
encoding: attention_mask is 1 for every real
token and token_type_ids is 0 throughout.
Release note (OpenNLP 3.0.0): prior releases sent an
all-zero attention_mask and all-one token_type_ids,
so the encoder attended to nothing and the output vectors were
incorrect. Additionally, tokenization now performs BERT basic
tokenization (lower casing and accent stripping by default, see
BertTokenizer) before wordpiece.
Output vectors change with the corrected encoding and tokenization;
any embeddings persisted from the previous behavior are not
comparable with the corrected output and must be re-embedded.
This class is thread-safe and may be shared across threads: getVectors(String)
holds no per-call instance state and the underlying OrtSession supports
concurrent execution. This thread-safety guarantee applies until AbstractDL.close()
is called; callers must not race close() with inference methods.
-
Field Summary
Fields inherited from class opennlp.dl.AbstractDL
ATTENTION_MASK, INPUT_IDS, TOKEN_TYPE_IDS -
Constructor Summary
ConstructorsConstructorDescriptionSentenceVectorsDL(File model, File vocabulary) Instantiates asentence vector generatorfor an uncased model.SentenceVectorsDL(File model, File vocabulary, boolean lowerCase) Instantiates asentence vector generatorusing ONNX models. -
Method Summary
Modifier and TypeMethodDescriptionfloat[]getVectors(String sentence) Generates vectors given a sentence.Methods inherited from class opennlp.dl.AbstractDL
close, loadVocab
-
Constructor Details
-
SentenceVectorsDL
public SentenceVectorsDL(File model, File vocabulary) throws ai.onnxruntime.OrtException, IOException Instantiates asentence vector generatorfor an uncased model. Input text is lower cased and accent stripped during tokenization, as required by uncased models such as the sentence-transformers MiniLM family.- Parameters:
model- The file name of a sentence vectors ONNX model.vocabulary- The file name of the vocabulary file for the model.- Throws:
ai.onnxruntime.OrtException- Thrown if themodelcannot be loaded.IOException- Thrown if errors occurred loading themodelorvocabulary.
-
SentenceVectorsDL
public SentenceVectorsDL(File model, File vocabulary, boolean lowerCase) throws ai.onnxruntime.OrtException, IOException Instantiates asentence vector generatorusing ONNX models.- Parameters:
model- The file name of a sentence vectors ONNX model.vocabulary- The file name of the vocabulary file for the model.lowerCase-truefor uncased models (lower casing and accent stripping during tokenization),falsefor cased models.- Throws:
ai.onnxruntime.OrtException- Thrown if themodelcannot be loaded.IOException- Thrown if errors occurred loading themodelorvocabulary.
-
-
Method Details
-
getVectors
Generates vectors given a sentence.- Parameters:
sentence- The input sentence.- Returns:
- The sentence vector.
- Throws:
ai.onnxruntime.OrtException- Thrown if an error occurs during inference.
-