Package opennlp.spellcheck.dictionary
Class SymSpellModels
java.lang.Object
opennlp.spellcheck.dictionary.SymSpellModels
Convenience factory and (de)serialization helpers for
SymSpellModel.
This is the high-level entry point for the persistence layer: build a model from
plain-text frequency dictionaries, write a model to / read it from a stream, and emit
the model.properties descriptor consumed by the OpenNLP model-resolver.
Classpath resolution of packaged models lives in SymSpellModelResolver.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe Maven artifactId pattern for packaged spellcheck model jars.static final StringProperty key for the model language tag.static final StringProperty key for the model name.static final StringProperty key for the SHA-256 of the binary model.static final StringProperty key for the model version. -
Method Summary
Modifier and TypeMethodDescriptionstatic StringartifactId(String language) static SymSpellModelbuildModel(String language, SymSpellConfig config, Charset charset, opennlp.tools.util.InputStreamFactory unigramSource, opennlp.tools.util.InputStreamFactory bigramSource) Builds aSymSpellModelfrom a unigram dictionary and an optional bigram dictionary using the supplied configuration.static PropertiesbuildProperties(SymSpellModel model, byte[] modelBytes) Builds themodel.propertiesdescriptor for a serialized model, computing themodel.sha256over the supplied binary form.static SymSpellModelDeserializes a model from the given stream usingSymSpellModelSerializer.static SymSpellModelfromBytes(byte[] bytes) Deserializes a model from a byte array.static voidserialize(SymSpellModel model, OutputStream out) Serializes a model to the given stream usingSymSpellModelSerializer.static byte[]toBytes(SymSpellModel model) Serializes a model to a byte array.static voidwritePackage(SymSpellModel model, OutputStream binaryOut, OutputStream propertiesOut) Writes a packaged model pair to the given streams: the binary model and the matchingmodel.properties.
-
Field Details
-
PROP_LANGUAGE
Property key for the model language tag.- See Also:
-
PROP_NAME
Property key for the model name.- See Also:
-
PROP_VERSION
Property key for the model version.- See Also:
-
PROP_SHA256
Property key for the SHA-256 of the binary model.- See Also:
-
MODEL_ARTIFACT_PREFIX
The Maven artifactId pattern for packaged spellcheck model jars.- See Also:
-
-
Method Details
-
buildModel
public static SymSpellModel buildModel(String language, SymSpellConfig config, Charset charset, opennlp.tools.util.InputStreamFactory unigramSource, opennlp.tools.util.InputStreamFactory bigramSource) throws IOException Builds aSymSpellModelfrom a unigram dictionary and an optional bigram dictionary using the supplied configuration.The dictionaries are parsed with a
FrequencyDictionaryLoaderinto source count maps (with duplicate keys accumulated, mirroring engine semantics); the engine itself is then built bySymSpellModel.- Parameters:
language- the language tag (e.g."en"); must not be blankconfig- the engine configuration; must not benullcharset- the charset to decode the dictionaries with; must not benullunigramSource- theword<TAB>countdictionary source; must not benullbigramSource- thew1 w2<TAB>countdictionary source; may benullto skip bigrams- Returns:
- the built model
- Throws:
IOException- Thrown on IO errors or a malformed dictionary line.
-
serialize
Serializes a model to the given stream usingSymSpellModelSerializer.- Parameters:
model- the model to write; must not benullout- the destination stream; must not benull. The stream is not closed by this method.- Throws:
IOException- Thrown on IO errors.
-
deserialize
Deserializes a model from the given stream usingSymSpellModelSerializer.- Parameters:
in- the source stream; must not benull. The stream is not closed by this method.- Returns:
- the deserialized model
- Throws:
IOException- Thrown on IO errors or on a malformed stream.
-
toBytes
Serializes a model to a byte array.- Parameters:
model- the model to serialize; must not benull- Returns:
- the binary model bytes
- Throws:
IOException- Thrown on IO errors.
-
fromBytes
Deserializes a model from a byte array.- Parameters:
bytes- the binary model bytes; must not benull- Returns:
- the deserialized model
- Throws:
IOException- Thrown on IO errors or on a malformed stream.
-
buildProperties
Builds themodel.propertiesdescriptor for a serialized model, computing themodel.sha256over the supplied binary form.- Parameters:
model- the model the properties describe; must not benullmodelBytes- the serialized binary form ofmodel(seetoBytes(opennlp.spellcheck.dictionary.SymSpellModel)); must not benull- Returns:
- the populated
Properties
-
writePackage
public static void writePackage(SymSpellModel model, OutputStream binaryOut, OutputStream propertiesOut) throws IOException Writes a packaged model pair to the given streams: the binary model and the matchingmodel.properties. This is the on-disk shape expected inside anopennlp-models-spellcheck-{lang}jar (the binary entry must have a.binsuffix to be discoverable by the model-resolver).- Parameters:
model- the model to package; must not benullbinaryOut- destination for the binary model; must not benull. Not closed by this method.propertiesOut- destination formodel.properties; must not benull. Not closed by this method.- Throws:
IOException- Thrown on IO errors.
-
artifactId
- Parameters:
language- a language tag; must not benull- Returns:
- the conventional Maven artifactId for a packaged model of that language,
e.g.
"opennlp-models-spellcheck-en"
-