Class SymSpell
- All Implemented Interfaces:
SpellChecker
The engine precomputes a deletes-only index: for every dictionary term it derives
all strings obtained by deleting up to maxDictionaryEditDistance symbols from
the term's prefix, and maps each such delete back to the originating terms. A query is
answered by generating the deletes of the query and intersecting them with the index,
which turns the costly fuzzy search into hash-map lookups; candidates are then verified
with the injected EditDistance.
The algorithm and its compound-correction heuristic are ported from the SymSpell reference implementation (MIT, Wolf Garbe). This is an independent Java 21 rewrite, not a verbatim copy; attribution is recorded in the project NOTICE file.
Populate the engine through add(String, long) and
addBigram(String, String, long) (typically driven by a separate loader), then
issue lookup(java.lang.String) / lookupCompound(java.lang.String, int) queries. After population the engine is
safe for concurrent reads.
-
Constructor Summary
ConstructorsConstructorDescriptionSymSpell()Creates an engine with the default config.SymSpell(SymSpellConfig config) Creates an engine from the given configuration. -
Method Summary
Modifier and TypeMethodDescriptionbooleanAdds (or accumulates) a dictionary term and its count, updating the deletes index.voidAdds (or accumulates) a bigram and its count for compound correction.intintConvenience overload that usesVerbosity.TOPand the implementation's configured maximum dictionary edit distance.Looks up suggestions for a singletermwithinmaxEditDistance.lookupCompound(String input, int maxEditDistance) Corrects a whole input string (a phrase or sentence), supporting word splits and merges, and combining candidates using a bigram language model.intint
-
Constructor Details
-
SymSpell
Creates an engine from the given configuration. The configuration fixes the index geometry and scoring constants for the engine's lifetime: the maximum dictionary edit distance, the delete-generation prefix length, the count threshold below which terms stay unindexed, the verificationEditDistancemetric, and the corpus normalization constant. SeeSymSpellConfigfor the individual tunables and their defaults.- Parameters:
config- the engine configuration; must not benull- Throws:
NullPointerException- ifconfigisnull
-
SymSpell
public SymSpell()Creates an engine with the default config.
-
-
Method Details
-
add
Adds (or accumulates) a dictionary term and its count, updating the deletes index.If the term already exists,
countis added to the existing count. Terms whose accumulated count stays below the configuredcountThresholdare tracked but not indexed until they reach the threshold.- Parameters:
word- the dictionary term; must not benullcount- the corpus count to add; must be>= 0- Returns:
trueif the term became (or remained) indexed- Throws:
NullPointerException- ifwordisnullIllegalArgumentException- ifcountis negative
-
addBigram
Adds (or accumulates) a bigram and its count for compound correction.- Parameters:
w1- the first word; must not benullw2- the second word; must not benullcount- the corpus count to add; must be>= 0- Throws:
NullPointerException- ifw1orw2isnullIllegalArgumentException- ifcountis negative
-
wordCount
public int wordCount()- Returns:
- the number of indexed unigram entries (including sub-threshold terms).
-
entryCount
public int entryCount()- Returns:
- the number of distinct delete keys in the index.
-
bigramCount
public int bigramCount()- Returns:
- the number of bigram entries.
-
maxEditDistance
public int maxEditDistance()- Specified by:
maxEditDistancein interfaceSpellChecker- Returns:
- the largest edit distance this checker can answer queries for (the configured
maximum dictionary edit distance); a
maxEditDistanceargument toSpellChecker.lookup(String, Verbosity, int)must not exceed this value.
-
lookup
Description copied from interface:SpellCheckerConvenience overload that usesVerbosity.TOPand the implementation's configured maximum dictionary edit distance.As with
SpellChecker.lookup(String, Verbosity, int), a blanktermis looked up verbatim and normally yields an empty list.- Specified by:
lookupin interfaceSpellChecker- Parameters:
term- the (possibly misspelled) term to correct; must not benull- Returns:
- the matching suggestions in natural order (best first); never
null
-
lookup
Description copied from interface:SpellCheckerLooks up suggestions for a singletermwithinmaxEditDistance.A blank (empty or whitespace-only)
termis a valid argument: it is looked up verbatim and, as it matches no dictionary entry, normally yields an empty list rather than an error.- Specified by:
lookupin interfaceSpellChecker- Parameters:
term- the (possibly misspelled) term to correct; must not benullverbosity- controls how many suggestions are returned; must not benullmaxEditDistance- the maximum edit distance to consider; must not be negative and must not exceedSpellChecker.maxEditDistance()- Returns:
- the matching suggestions in natural order (best first); never
null
-
lookupCompound
Description copied from interface:SpellCheckerCorrects a whole input string (a phrase or sentence), supporting word splits and merges, and combining candidates using a bigram language model.A blank (empty or whitespace-only)
inputis a valid argument: it contains no tokens to correct, so the returned singleton holds a suggestion whose term is the empty string at edit distance0.- Specified by:
lookupCompoundin interfaceSpellChecker- Parameters:
input- the input phrase to correct; must not benullmaxEditDistance- the maximum edit distance per token; must not be negative- Returns:
- a singleton list holding the best correction of the whole input; never
null
-