Class DictionaryStopwordFilter

java.lang.Object
opennlp.tools.stopword.DictionaryStopwordFilter
All Implemented Interfaces:
opennlp.tools.stopword.StopwordFilter

@ThreadSafe public final class DictionaryStopwordFilter extends Object implements opennlp.tools.stopword.StopwordFilter
An immutable, thread-safe StopwordFilter backed by an OpenNLP Dictionary.

The backing store supports both 1-gram and n-gram entries. Multi-word entries are queried via isStopword(String...); the filter(String[]) method performs a greedy left-to-right window scan, preferring the longest registered match at each position.

Instances are constructed once and never modified afterwards. Use the DictionaryStopwordFilter.Builder (builder()) to assemble a filter from one or more sources (programmatic entries, an input stream, an existing Dictionary), or the public constructors for the common cases.

Thread-safety: instances are immutable after construction and may be shared freely across threads without external synchronization. All fields are final; the only mutation of the backing Dictionary happens inside the constructor / builder before the instance is published.

  • Constructor Details

    • DictionaryStopwordFilter

      public DictionaryStopwordFilter(InputStream in, Charset cs, boolean caseSensitive) throws IOException
      Loads a stopword list from the given input stream and freezes it into an immutable filter.

      Format: UTF-8 (or the supplied Charset), one entry per line. Whitespace-separated tokens on the same line form one multi-word entry. Blank lines and lines starting with # are skipped.

      Parameters:
      in - The input stream to read from. Must not be null.
      cs - The Charset to decode with. Must not be null.
      caseSensitive - Whether matching is case-sensitive.
      Throws:
      IllegalArgumentException - if in or cs is null.
      IOException - Thrown if an IO error occurs while reading.
    • DictionaryStopwordFilter

      public DictionaryStopwordFilter(Dictionary source)
      Creates an immutable filter from a defensive copy of source. Subsequent mutation of source does not affect this filter.
      Parameters:
      source - The dictionary whose contents seed the filter. Must not be null.
      Throws:
      IllegalArgumentException - if source is null.
  • Method Details

    • builder

      public static DictionaryStopwordFilter.Builder builder()
      Returns:
      A new DictionaryStopwordFilter.Builder that assembles a DictionaryStopwordFilter.
    • loadUnchecked

      public static DictionaryStopwordFilter loadUnchecked(InputStream in, Charset cs, boolean caseSensitive)
      Convenience factory equivalent to DictionaryStopwordFilter(InputStream, Charset, boolean) but wrapping any IOException thrown during reading in an UncheckedIOException. Useful in contexts where a checked exception is inconvenient (e.g. lambdas, static initializers).
      Parameters:
      in - The input stream. Must not be null.
      cs - The charset. Must not be null.
      caseSensitive - Whether matching is case-sensitive.
      Returns:
      A new filter loaded from in.
      Throws:
      IllegalArgumentException - if in or cs is null.
      UncheckedIOException - if an IO error occurs while reading from in.
    • isStopword

      public boolean isStopword(CharSequence token)
      Specified by:
      isStopword in interface opennlp.tools.stopword.StopwordFilter
      Parameters:
      token - The token to test. May be null, in which case this method returns false.
      Returns:
      true if token is registered as a single-token stopword, false otherwise.
    • isStopword

      public boolean isStopword(String... tokens)
      Specified by:
      isStopword in interface opennlp.tools.stopword.StopwordFilter
      Parameters:
      tokens - The tokens to test as one entry. May be null or empty, in which case this method returns false.
      Returns:
      true if the sequence is registered as a stopword, false otherwise.
    • filter

      public String[] filter(String[] tokens)

      Performs a greedy left-to-right window scan: at each position the longest registered window is tried first. If it matches, those tokens are dropped; otherwise the position advances by one and the current token is kept. null elements never participate in a window and are kept as-is.

      Specified by:
      filter in interface opennlp.tools.stopword.StopwordFilter
      Throws:
      IllegalArgumentException - if tokens is null.
    • isCaseSensitive

      public boolean isCaseSensitive()
      Specified by:
      isCaseSensitive in interface opennlp.tools.stopword.StopwordFilter
    • stopwords

      public Set<String> stopwords()
      Specified by:
      stopwords in interface opennlp.tools.stopword.StopwordFilter
      Returns:
      An unmodifiable Set of single-token stopwords. Never null.
      Throws:
      UnsupportedOperationException - if a caller attempts to mutate the returned Set.