Class SymSpellConfig

java.lang.Object
opennlp.spellcheck.symspell.SymSpellConfig

public final class SymSpellConfig extends Object
Immutable configuration for SymSpell, created through builder().

The tunables mirror the SymSpell reference implementation:

  • maxDictionaryEditDistance – the largest edit distance for which the deletes index is precomputed; queries cannot exceed this.
  • prefixLength – only the first prefixLength symbols of each term are used to generate deletes, trading index size for recall on long words.
  • countThreshold – the minimum corpus count for a term to be indexed.
  • editDistance – the verification metric, defaulting to DamerauOSADistance.
  • corpusWordCount – the corpus normalization constant N used by the Naive-Bayes word combine/split scoring in SymSpell.lookupCompound(String, int). Defaults to DERIVE_CORPUS_WORD_COUNT, which makes the engine derive N from the summed counts of the loaded dictionary so it is always corpus-correct; set it explicitly to pin N (e.g. to the full-corpus total a reference dictionary was drawn from).
See Also:
  • Field Details

    • DERIVE_CORPUS_WORD_COUNT

      public static final long DERIVE_CORPUS_WORD_COUNT
      Sentinel for corpusWordCount() meaning "derive N from the summed counts of the loaded dictionary" rather than pinning it to a fixed value.
      See Also:
  • Method Details

    • maxDictionaryEditDistance

      public int maxDictionaryEditDistance()
    • prefixLength

      public int prefixLength()
    • countThreshold

      public long countThreshold()
    • editDistance

      public EditDistance editDistance()
    • corpusWordCount

      public long corpusWordCount()
      Returns:
      the pinned corpus normalization constant N, or DERIVE_CORPUS_WORD_COUNT when N is derived from the loaded dictionary's summed counts.
    • builder

      public static SymSpellConfig.Builder builder()
      Returns:
      a builder with the SymSpell reference defaults (maxDictionaryEditDistance=2, prefixLength=7, countThreshold=1, editDistance=DamerauOSADistance).
    • defaultConfig

      public static SymSpellConfig defaultConfig()
      Returns:
      a configuration with all reference defaults.