Class SymSpell

java.lang.Object
opennlp.spellcheck.symspell.SymSpell
All Implemented Interfaces:
SpellChecker

public final class SymSpell extends Object implements SpellChecker
Symmetric Delete spelling correction engine (SymSpell).

The engine precomputes a deletes-only index: for every dictionary term it derives all strings obtained by deleting up to maxDictionaryEditDistance symbols from the term's prefix, and maps each such delete back to the originating terms. A query is answered by generating the deletes of the query and intersecting them with the index, which turns the costly fuzzy search into hash-map lookups; candidates are then verified with the injected EditDistance.

The algorithm and its compound-correction heuristic are ported from the SymSpell reference implementation (MIT, Wolf Garbe). This is an independent Java 21 rewrite, not a verbatim copy; attribution is recorded in the project NOTICE file.

Populate the engine through add(String, long) and addBigram(String, String, long) (typically driven by a separate loader), then issue lookup(java.lang.String) / lookupCompound(java.lang.String, int) queries. After population the engine is safe for concurrent reads.

  • Constructor Details

    • SymSpell

      public SymSpell(SymSpellConfig config)
      Creates an engine from the given configuration. The configuration fixes the index geometry and scoring constants for the engine's lifetime: the maximum dictionary edit distance, the delete-generation prefix length, the count threshold below which terms stay unindexed, the verification EditDistance metric, and the corpus normalization constant. See SymSpellConfig for the individual tunables and their defaults.
      Parameters:
      config - the engine configuration; must not be null
      Throws:
      NullPointerException - if config is null
    • SymSpell

      public SymSpell()
      Creates an engine with the default config.
  • Method Details

    • add

      public boolean add(String word, long count)
      Adds (or accumulates) a dictionary term and its count, updating the deletes index.

      If the term already exists, count is added to the existing count. Terms whose accumulated count stays below the configured countThreshold are tracked but not indexed until they reach the threshold.

      Parameters:
      word - the dictionary term; must not be null
      count - the corpus count to add; must be >= 0
      Returns:
      true if the term became (or remained) indexed
      Throws:
      NullPointerException - if word is null
      IllegalArgumentException - if count is negative
    • addBigram

      public void addBigram(String w1, String w2, long count)
      Adds (or accumulates) a bigram and its count for compound correction.
      Parameters:
      w1 - the first word; must not be null
      w2 - the second word; must not be null
      count - the corpus count to add; must be >= 0
      Throws:
      NullPointerException - if w1 or w2 is null
      IllegalArgumentException - if count is negative
    • wordCount

      public int wordCount()
      Returns:
      the number of indexed unigram entries (including sub-threshold terms).
    • entryCount

      public int entryCount()
      Returns:
      the number of distinct delete keys in the index.
    • bigramCount

      public int bigramCount()
      Returns:
      the number of bigram entries.
    • maxEditDistance

      public int maxEditDistance()
      Specified by:
      maxEditDistance in interface SpellChecker
      Returns:
      the largest edit distance this checker can answer queries for (the configured maximum dictionary edit distance); a maxEditDistance argument to SpellChecker.lookup(String, Verbosity, int) must not exceed this value.
    • lookup

      public List<SuggestItem> lookup(String term)
      Description copied from interface: SpellChecker
      Convenience overload that uses Verbosity.TOP and the implementation's configured maximum dictionary edit distance.

      As with SpellChecker.lookup(String, Verbosity, int), a blank term is looked up verbatim and normally yields an empty list.

      Specified by:
      lookup in interface SpellChecker
      Parameters:
      term - the (possibly misspelled) term to correct; must not be null
      Returns:
      the matching suggestions in natural order (best first); never null
    • lookup

      public List<SuggestItem> lookup(String term, Verbosity verbosity, int maxEditDistance)
      Description copied from interface: SpellChecker
      Looks up suggestions for a single term within maxEditDistance.

      A blank (empty or whitespace-only) term is a valid argument: it is looked up verbatim and, as it matches no dictionary entry, normally yields an empty list rather than an error.

      Specified by:
      lookup in interface SpellChecker
      Parameters:
      term - the (possibly misspelled) term to correct; must not be null
      verbosity - controls how many suggestions are returned; must not be null
      maxEditDistance - the maximum edit distance to consider; must not be negative and must not exceed SpellChecker.maxEditDistance()
      Returns:
      the matching suggestions in natural order (best first); never null
    • lookupCompound

      public List<SuggestItem> lookupCompound(String input, int maxEditDistance)
      Description copied from interface: SpellChecker
      Corrects a whole input string (a phrase or sentence), supporting word splits and merges, and combining candidates using a bigram language model.

      A blank (empty or whitespace-only) input is a valid argument: it contains no tokens to correct, so the returned singleton holds a suggestion whose term is the empty string at edit distance 0.

      Specified by:
      lookupCompound in interface SpellChecker
      Parameters:
      input - the input phrase to correct; must not be null
      maxEditDistance - the maximum edit distance per token; must not be negative
      Returns:
      a singleton list holding the best correction of the whole input; never null