Class SymSpellModel

java.lang.Object
opennlp.spellcheck.dictionary.SymSpellModel
All Implemented Interfaces:
opennlp.tools.util.model.SerializableArtifact

public final class SymSpellModel extends Object implements opennlp.tools.util.model.SerializableArtifact
A serializable spell-correction model: a built SymSpell engine together with the source frequency data and the metadata needed to reproduce and identify it.

The model keeps the source dictionary (unigram counts and optional bigram counts) and the configuration rather than the derived delete index. This is what the SymSpellModelSerializer writes; the (much larger) delete index is rebuilt by replaying the source through SymSpell.add(java.lang.String, long) / SymSpell.addBigram(java.lang.String, java.lang.String, long) when the engine is constructed. See the serializer's class javadoc for the rationale.

Instances are immutable: the engine is built once at construction time and exposed read-only via getSymSpell(); the source maps returned by unigrams() and bigrams() are unmodifiable views.

As a SerializableArtifact this type can be embedded in OpenNLP model containers and is round-tripped by SymSpellModelSerializer.

  • Field Details

    • DEFAULT_MODEL_NAME

      public static final String DEFAULT_MODEL_NAME
      The default model name fragment used for classpath discovery.
      See Also:
    • DEFAULT_MODEL_VERSION

      public static final String DEFAULT_MODEL_VERSION
      The default model version used when none is supplied.
      See Also:
  • Constructor Details

    • SymSpellModel

      public SymSpellModel(String language, SymSpellConfig config, Map<String,Long> unigrams, Map<String,Long> bigrams)
      Creates a model and builds its SymSpell engine from the supplied source data.
      Parameters:
      language - an IETF/ISO language tag (e.g. "en"); must not be null or blank
      config - the engine configuration; must not be null
      unigrams - the word -> count source map; must not be null
      bigrams - the "w1 w2" -> count source map; must not be null (may be empty)
    • SymSpellModel

      public SymSpellModel(String language, String name, String version, SymSpellConfig config, Map<String,Long> unigrams, Map<String,Long> bigrams)
      Creates a model with explicit name and version, and builds its SymSpell engine from the supplied source data.
      Parameters:
      language - an IETF/ISO language tag (e.g. "en"); must not be null or blank
      name - the model name (becomes model.name); must not be null or blank
      version - the model version (becomes model.version); must not be null or blank
      config - the engine configuration; must not be null
      unigrams - the word -> count source map; must not be null
      bigrams - the "w1 w2" -> count source map; must not be null (may be empty)
  • Method Details

    • getSymSpell

      public SymSpell getSymSpell()
      Returns:
      the ready-to-query engine backed by this model.
    • getLanguage

      public String getLanguage()
      Returns:
      the language tag of this model.
    • getName

      public String getName()
      Returns:
      the model name (also emitted as model.name).
    • getVersion

      public String getVersion()
      Returns:
      the model version (also emitted as model.version).
    • getConfig

      public SymSpellConfig getConfig()
      Returns:
      the configuration used to build the engine.
    • unigrams

      public Map<String,Long> unigrams()
      Returns:
      an unmodifiable view of the word -> count source map.
    • bigrams

      public Map<String,Long> bigrams()
      Returns:
      an unmodifiable view of the "w1 w2" -> count source map.
    • getArtifactSerializerClass

      public Class<?> getArtifactSerializerClass()
      Specified by:
      getArtifactSerializerClass in interface opennlp.tools.util.model.SerializableArtifact