Class DictionaryStopwordFilter
- All Implemented Interfaces:
opennlp.tools.stopword.StopwordFilter
StopwordFilter backed by an OpenNLP
Dictionary.
The backing store supports both 1-gram and n-gram entries. Multi-word
entries are queried via isStopword(String...); the
filter(String[]) method performs a greedy left-to-right window
scan, preferring the longest registered match at each position.
Instances are constructed once and never modified afterwards. Use the
DictionaryStopwordFilter.Builder (builder()) to assemble a filter from one or
more sources (programmatic entries, an input stream, an existing
Dictionary), or the public constructors for the common cases.
Thread-safety: instances are immutable after
construction and may be shared freely across threads without external
synchronization. All fields are final; the only mutation of the
backing Dictionary happens inside the constructor / builder before
the instance is published.
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionDictionaryStopwordFilter(InputStream in, Charset cs, boolean caseSensitive) Loads a stopword list from the given input stream and freezes it into an immutable filter.DictionaryStopwordFilter(Dictionary source) Creates an immutable filter from a defensive copy ofsource. -
Method Summary
Modifier and TypeMethodDescriptionbuilder()String[]booleanbooleanisStopword(CharSequence token) booleanisStopword(String... tokens) static DictionaryStopwordFilterloadUnchecked(InputStream in, Charset cs, boolean caseSensitive) Convenience factory equivalent toDictionaryStopwordFilter(InputStream, Charset, boolean)but wrapping anyIOExceptionthrown during reading in anUncheckedIOException.
-
Constructor Details
-
DictionaryStopwordFilter
public DictionaryStopwordFilter(InputStream in, Charset cs, boolean caseSensitive) throws IOException Loads a stopword list from the given input stream and freezes it into an immutable filter.Format: UTF-8 (or the supplied
Charset), one entry per line. Whitespace-separated tokens on the same line form one multi-word entry. Blank lines and lines starting with#are skipped.- Parameters:
in- The input stream to read from. Must not benull.cs- TheCharsetto decode with. Must not benull.caseSensitive- Whether matching is case-sensitive.- Throws:
IllegalArgumentException- ifinorcsisnull.IOException- Thrown if an IO error occurs while reading.
-
DictionaryStopwordFilter
Creates an immutable filter from a defensive copy ofsource. Subsequent mutation ofsourcedoes not affect this filter.- Parameters:
source- The dictionary whose contents seed the filter. Must not benull.- Throws:
IllegalArgumentException- ifsourceisnull.
-
-
Method Details
-
builder
- Returns:
- A new
DictionaryStopwordFilter.Builderthat assembles aDictionaryStopwordFilter.
-
loadUnchecked
public static DictionaryStopwordFilter loadUnchecked(InputStream in, Charset cs, boolean caseSensitive) Convenience factory equivalent toDictionaryStopwordFilter(InputStream, Charset, boolean)but wrapping anyIOExceptionthrown during reading in anUncheckedIOException. Useful in contexts where a checked exception is inconvenient (e.g. lambdas, static initializers).- Parameters:
in- The input stream. Must not benull.cs- The charset. Must not benull.caseSensitive- Whether matching is case-sensitive.- Returns:
- A new filter loaded from
in. - Throws:
IllegalArgumentException- ifinorcsisnull.UncheckedIOException- if an IO error occurs while reading fromin.
-
isStopword
- Specified by:
isStopwordin interfaceopennlp.tools.stopword.StopwordFilter- Parameters:
token- The token to test. May benull, in which case this method returnsfalse.- Returns:
trueiftokenis registered as a single-token stopword,falseotherwise.
-
isStopword
- Specified by:
isStopwordin interfaceopennlp.tools.stopword.StopwordFilter- Parameters:
tokens- The tokens to test as one entry. May benullor empty, in which case this method returnsfalse.- Returns:
trueif the sequence is registered as a stopword,falseotherwise.
-
filter
Performs a greedy left-to-right window scan: at each position the longest registered window is tried first. If it matches, those tokens are dropped; otherwise the position advances by one and the current token is kept.
nullelements never participate in a window and are kept as-is.- Specified by:
filterin interfaceopennlp.tools.stopword.StopwordFilter- Throws:
IllegalArgumentException- iftokensisnull.
-
isCaseSensitive
public boolean isCaseSensitive()- Specified by:
isCaseSensitivein interfaceopennlp.tools.stopword.StopwordFilter
-
stopwords
- Specified by:
stopwordsin interfaceopennlp.tools.stopword.StopwordFilter- Returns:
- An unmodifiable
Setof single-token stopwords. Nevernull. - Throws:
UnsupportedOperationException- if a caller attempts to mutate the returnedSet.
-