Class SpellCheckingCharSequenceNormalizer

java.lang.Object
opennlp.spellcheck.normalizer.SpellCheckingCharSequenceNormalizer
All Implemented Interfaces:
Serializable, opennlp.tools.util.normalizer.CharSequenceNormalizer

public class SpellCheckingCharSequenceNormalizer extends Object implements opennlp.tools.util.normalizer.CharSequenceNormalizer
A CharSequenceNormalizer that corrects spelling in text using a SpellChecker (typically a SymSpell engine).

The normalizer works in one of two modes:

Several guards keep the corrector from "fixing" tokens that should be left as they are (configurable through the SpellCheckingCharSequenceNormalizer.Builder):

  • tokens shorter than minTokenLength are skipped;
  • numeric tokens are skipped (skipNumbers, on by default);
  • URL- and email-like tokens are skipped (skipUrls, on by default);
  • a token whose lower-cased form is already in the dictionary is never changed (the engine returns it at edit distance 0).

Casing. Dictionaries are normally lower-cased, so lookups are performed on the lower-cased token, and the original casing pattern is re-applied to the correction: an all-upper token yields an all-upper correction, a leading-capital token yields a leading-capital correction, otherwise the suggestion's own casing is used. When no correction applies, the original token (including its casing and any surrounding punctuation) is emitted unchanged.

This normalizer composes cleanly inside an AggregateCharSequenceNormalizer; place it after noise-removing normalizers (URL, emoji, shrink) so it sees clean tokens.

Serialization. CharSequenceNormalizer is Serializable, but the backing SpellChecker usually is not; it is therefore held in a transient field and is null after Java deserialization. A deserialized instance is inert until a checker is re-attached: obtain a working copy with the same settings via withSpellChecker(SpellChecker) (this matches how the engine is rebuilt from a model rather than Java-serialized). Calling normalize(java.lang.CharSequence) on an instance with no checker throws IllegalStateException.

See Also: