The moder package determines single or multiple modes (most frequent
values). By default, its functions check whether missing values make
this impossible, and return NA
in this case. They have no
dependencies.
Mode functions fill a gap in measures of central tendency in R.
mean()
and median()
are built into the
standard library, but there is a lack of properly
NA
-sensitive functions for calculating the mode. Use moder
for this!
You can install the development version of moder like so:
::install_github("lhdjung/moder") remotes
library(moder)
mode_first()
Everything is fine here:
mode_first(c(7, 8, 8, 9, 9, 9))
#> [1] 9
But what if some values are missing? Maybe there are so many missings
that it’s impossible to tell which value is the most frequent one. If
both NA
s below are secretly 2
, then
2
is the (first) mode. Otherwise, 1
is. The
mode is unclear, so the function returns NA
:
mode_first(c(1, 1, 2, NA, NA))
#> [1] NA
Ignore NA
s using na.rm = TRUE
if there is a
strong rationale for it:
mode_first(c(1, 1, 2, NA, NA), na.rm = TRUE)
#> [1] 1
The next example is different. Even if the NA
stands in
for 8
, there will only be three instances of 8
but four instances of 7
. The mode is 7
,
independent of the true value behind NA
.
mode_first(c(7, 7, 7, 7, 8, 8, NA))
#> [1] 7
mode_all()
This function captures multiple modes:
mode_all(c("a", "a", "b", "b", "c", "d", "e"))
#> [1] "a" "b"
If some values are missing but there would be multiple modes when
ignoring NA
s, mode_all()
returns
NA
. That’s because missings can easily create an imbalance
between the equally-frequent known values:
mode_all(c(1, 1, 2, 2, NA))
#> [1] NA
If NA
masks either 1
or 2
,
that number is the (single) mode. As before, if the mode depends on
missing values, the function returns NA
.
Yet na.rm = TRUE
makes the function ignore this:
mode_all(c(1, 1, 2, 2, NA), na.rm = TRUE)
#> [1] 1 2
NA
) with mode_single()
mode_single()
is stricter than
mode_first()
: It returns NA
by default if
there are multiple modes. Otherwise, it works the same way.
mode_single(c(3, 4, 4, 5, 5, 5))
#> [1] 5
mode_single(c("x", "x", "y", "y", "z"))
#> [1] NA
These minimal and maximal sets of modes are possible given the missing value:
mode_possible_min(c("a", "a", "a", "b", "b", "c", NA))
#> [1] "a"
mode_possible_max(c("a", "a", "a", "b", "b", "c", NA))
#> [1] "a" "b"
Ken Williams’ mode functions on Stack Overflow were pivotal to moder.