Class FrequencyDictionaryLoader

java.lang.Object
opennlp.spellcheck.dictionary.FrequencyDictionaryLoader

public final class FrequencyDictionaryLoader extends Object
Loads plain-text frequency dictionaries into a SymSpell engine.

Two text formats are supported, both consumed line-by-line through an ObjectStream (a PlainTextByLineStream over a caller-supplied InputStreamFactory):

Columns are separated by whitespace – a TAB or one or more spaces – so the canonical space-delimited SymSpell reference dictionaries (e.g. frequency_dictionary_en_82_765.txt) load as-is, as do TAB-delimited files.

The loader is encoding-aware (UTF-8 by default) and tolerant of input noise: a leading UTF-8 byte-order mark is stripped; blank lines, lines that are entirely whitespace, and lines starting with # (comments) are skipped. A line that does not match the expected shape (too few columns, unparsable count) is reported through MalformedDictionaryLineException.

This class performs only parsing and dispatch; it never mutates the engine's configuration. Build the SymSpell with the desired SymSpellConfig first, then load one or more dictionaries into it.

  • Field Details

    • DEFAULT_CHARSET

      public static final Charset DEFAULT_CHARSET
      The default character set used when none is supplied.
  • Constructor Details

    • FrequencyDictionaryLoader

      public FrequencyDictionaryLoader()
      Creates a loader using the default UTF-8 charset.
    • FrequencyDictionaryLoader

      public FrequencyDictionaryLoader(Charset charset)
      Creates a loader using the supplied charset.
      Parameters:
      charset - the character set used to decode the dictionary text; must not be null
  • Method Details

    • loadUnigrams

      public long loadUnigrams(SymSpell target, opennlp.tools.util.InputStreamFactory factory) throws IOException
      Loads a unigram frequency dictionary (word<sep>count) into target.
      Parameters:
      target - the engine to populate; must not be null
      factory - the source of the dictionary text; must not be null
      Returns:
      the number of dictionary entries that were read (after skipping blank and comment lines)
      Throws:
      IOException - Thrown on IO errors or on a malformed line.
    • loadBigrams

      public long loadBigrams(SymSpell target, opennlp.tools.util.InputStreamFactory factory) throws IOException
      Loads a bigram frequency dictionary (w1<sep>w2<sep>count) into target.
      Parameters:
      target - the engine to populate; must not be null
      factory - the source of the dictionary text; must not be null
      Returns:
      the number of bigram entries that were read (after skipping blank and comment lines)
      Throws:
      IOException - Thrown on IO errors or on a malformed line.