Package opennlp.spellcheck.symspell
Class SymSpellConfig
java.lang.Object
opennlp.spellcheck.symspell.SymSpellConfig
Immutable configuration for
SymSpell, created through builder().
The tunables mirror the SymSpell reference implementation:
- maxDictionaryEditDistance – the largest edit distance for which the deletes index is precomputed; queries cannot exceed this.
- prefixLength – only the first
prefixLengthsymbols of each term are used to generate deletes, trading index size for recall on long words. - countThreshold – the minimum corpus count for a term to be indexed.
- editDistance – the verification metric, defaulting to
DamerauOSADistance. - corpusWordCount – the corpus normalization constant N used by
the Naive-Bayes word combine/split scoring in
SymSpell.lookupCompound(String, int). Defaults toDERIVE_CORPUS_WORD_COUNT, which makes the engine derive N from the summed counts of the loaded dictionary so it is always corpus-correct; set it explicitly to pin N (e.g. to the full-corpus total a reference dictionary was drawn from).
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final longSentinel forcorpusWordCount()meaning "derive N from the summed counts of the loaded dictionary" rather than pinning it to a fixed value. -
Method Summary
Modifier and TypeMethodDescriptionstatic SymSpellConfig.Builderbuilder()longlongstatic SymSpellConfigintint
-
Field Details
-
DERIVE_CORPUS_WORD_COUNT
public static final long DERIVE_CORPUS_WORD_COUNTSentinel forcorpusWordCount()meaning "derive N from the summed counts of the loaded dictionary" rather than pinning it to a fixed value.- See Also:
-
-
Method Details
-
maxDictionaryEditDistance
public int maxDictionaryEditDistance() -
prefixLength
public int prefixLength() -
countThreshold
public long countThreshold() -
editDistance
-
corpusWordCount
public long corpusWordCount()- Returns:
- the pinned corpus normalization constant N, or
DERIVE_CORPUS_WORD_COUNTwhen N is derived from the loaded dictionary's summed counts.
-
builder
- Returns:
- a builder with the SymSpell reference defaults
(maxDictionaryEditDistance=2, prefixLength=7, countThreshold=1,
editDistance=
DamerauOSADistance).
-
defaultConfig
- Returns:
- a configuration with all reference defaults.
-