Class SpellCorrectingTokenStream
- All Implemented Interfaces:
AutoCloseable,opennlp.tools.util.ObjectStream<String>
FilterObjectStream for tokenized data: each element read from the
wrapped ObjectStream is a string of tokens separated by a known delimiter
(whitespace by default). Every token is spell-corrected independently and the tokens
are re-joined with the same delimiter.
This is the shape produced by OpenNLP tokenizers / token-sample formats and is
what the trainable components consume: a fixed sequence of tokens per element. Unlike
SpellCorrectingObjectStream in compound mode, this stream is
token-count preserving – it never splits or merges tokens, so the
corrected element stays aligned with any parallel annotation (tags, spans).
Correction always runs in
per-token mode and reuses
the normalizer's guards (minimum length, skip numbers/URLs, never change a word the
dictionary already contains) and its casing preservation.
null (end of stream) is forwarded unchanged; FilterObjectStream.reset() and
FilterObjectStream.close() delegate to the wrapped stream.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe default delimiter splitting and re-joining tokens (a single space). -
Constructor Summary
ConstructorsConstructorDescriptionSpellCorrectingTokenStream(opennlp.tools.util.ObjectStream<String> samples, SymSpellModel model) SpellCorrectingTokenStream(opennlp.tools.util.ObjectStream<String> samples, SpellCheckingCharSequenceNormalizer normalizer, String delimiter) Wrapssampleswith an explicitly configured corrector and delimiter.SpellCorrectingTokenStream(opennlp.tools.util.ObjectStream<String> samples, SpellChecker spellChecker) -
Method Summary
Methods inherited from class opennlp.tools.util.FilterObjectStream
close, reset
-
Field Details
-
DEFAULT_DELIMITER
The default delimiter splitting and re-joining tokens (a single space).- See Also:
-
-
Constructor Details
-
SpellCorrectingTokenStream
public SpellCorrectingTokenStream(opennlp.tools.util.ObjectStream<String> samples, SpellChecker spellChecker) - Parameters:
samples- the source token-line stream; must not benullspellChecker- the engine used to correct tokens; must not benull
-
SpellCorrectingTokenStream
public SpellCorrectingTokenStream(opennlp.tools.util.ObjectStream<String> samples, SymSpellModel model) - Parameters:
samples- the source token-line stream; must not benullmodel- the loaded model whose engine is used; must not benull
-
SpellCorrectingTokenStream
public SpellCorrectingTokenStream(opennlp.tools.util.ObjectStream<String> samples, SpellCheckingCharSequenceNormalizer normalizer, String delimiter) Wrapssampleswith an explicitly configured corrector and delimiter.The corrector is forced into per-token mode regardless of how it was built, so the token count is always preserved.
- Parameters:
samples- the source token-line stream; must not benullnormalizer- the corrector whose guards/config are reused; must not benulldelimiter- the literal token delimiter to split and re-join on; must not benullor empty- Throws:
NullPointerException- ifnormalizerordelimiterisnullIllegalArgumentException- ifdelimiteris empty
-
-
Method Details
-
read
- Throws:
IOException
-