This release is intended to be the last before stable version 1.0.0.
Passing a background dataset bg_X
is now optional.
If the explanation data X
is sufficiently large (>=
50 rows), bg_X
is derived as a random sample of
bg_n = 200
rows from X
. If X
has
less than bg_n
rows, then simply bg_X = X
. If
X
has too few rows (< 50), you will have to pass an
explicit bg_X
.
ranger()
survival models now also work out-of-the-box
without passing a tailored prediction function. Use the new argument
survival = "chf"
in kernelshap()
and
permshap()
to distinguish cumulative hazards (default) and
survival probabilities per time point.kernelshap()
and
permshap()
now contain bg_X
and
bg_w
used to calculate the SHAP values.gam::gam()
.New additive explainer additive_shap()
that works for
models fitted via
lm()
,glm()
,mgcv::gam()
,mgcv::bam()
,gam::gam()
,survival::coxph()
,survival::survreg()
.The explainer uses predict(..., type = "terms")
, a
beautiful trick used in fastshap::explain.lm()
. The result
will be identical to those returned by kernelshap()
and
permshap()
but exponentially faster. Thanks David Watson
for the great idea discussed in #130.
permshap()
now returns an object of class “kernelshap”
to reduce the number of redundant methods.kernelshap()
,
permshap()
(and additive_shap()
) got an
element “algorithm”.is.permshap()
has been removed.predict_type = "prob"
.permshap()
by caching calculations
for the two special permutations of all 0 and all 1. Consequently, the
m_exact
component in the output is reduced by 2.permshap()
to calculate exact permutation SHAP
values. The function currently works for up to 14 features.S
and SE
lists.feature_names
as
dimnames (https://github.com/ModelOriented/kernelshap/issues/96).ks_extract()
function. It was designed to
extract objects like the matrix S
of SHAP values from the
resulting “kernelshap” object x
. We feel that the standard
extraction options (x$S
, x[["S"]]
, or
getElement(x, "S")
) are sufficient.X
, and \(K\) is the dimension of a single prediction
(usually 1).verbose = FALSE
now does not suppress the
warning on too large background data anymore. Use
suppressWarnings()
instead.bg_X
contained more columns than X
,
unflexible prediction functions could fail when being applied to
bg_X
.feature_names
allows to specify the
features to calculate SHAP values for. The default equals to
colnames(X)
. This should be changed only in situations when
X
(the dataset to be explained) contains non-feature
columns.Thanks to David Watson, exact calculations are now also possible for \(p>5\) features. By default, the algorithm uses exact calculations for \(p \le 8\) and a hybrid strategy otherwise, see the next section. At the same time, the exact algorithm became much more efficient.
A word of caution: Exact calculations mean to create \(2^p-2\) on-off vectors \(z\) (cheap step) and evaluating the model on a whopping \((2^p-2)N\) rows, where \(N\) is the number of rows of the background data (expensive step). As this explodes with large \(p\), we do not recommend the exact strategy for \(p > 10\).
The iterative Kernel SHAP sampling algorithm of Covert and Lee (2021) [1] works by randomly sample \(m\) on-off vectors \(z\) so that their sum follows the SHAP Kernel weight distribution (renormalized to the range from \(1\) to \(p-1\)). Based on these vectors, many predictions are formed. Then, Kernel SHAP values are derived as the solution of a constrained linear regression, see [1] for details. This is done multiple times until convergence.
A drawback of this strategy is that many (at least 75%) of the \(z\) vectors will have \(\sum z \in \{1, p-1\}\), producing many duplicates. Similarly, at least 92% of the mass will be used for the \(p(p+1)\) possible vectors with \(\sum z \in \{1, 2, p-1, p-2\}\) etc. This inefficiency can be fixed by a hybrid strategy, combining exact calculations with sampling. The hybrid algorithm has two steps:
The default behaviour of kernelshap()
is as follows:
It is also possible to use a pure sampling strategy, see Section
“User visible changes” below. While this is usually not advisable
compared to a hybrid approach, the options of kernelshap()
allow to study different properties of Kernel SHAP and doing empirical
research on the topic.
Kernel SHAP in the Python implementation “shap” uses a quite similar hybrid strategy, but without iterating. The new logic in the R package thus combines the efficiency of the Python implementation with the convergence monitoring of [1].
[1] Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
m
is reduced from \(8p\) to \(2p\) except when
hybrid_degree = 0
(pure sampling).exact
is now TRUE
for
\(p \le 8\) instead of \(p \le 5\).hybrid_degree
is introduced to control
the exact part of the hybrid algorithm. The default is 2 for \(4 \le p \le 16\) and degree 1 otherwise.
Set to 0 to force a pure sampling strategy (not recommended but useful
to demonstrate superiority of hybrid approaches).tol
was reduced from 0.01 to
0.005.max_iter
was reduced from 250 to
100.m
.print()
is now more slim.summary()
function shows more infos.m_exact
(the number
of on-off vectors used for the exact part), prop_exact
(proportion of mass treated in exact fashion), exact
flag,
and txt
(the info message when starting the
algorithm).mgcv::gam()
would cause an error in
check_pred()
(they are 1D-arrays).The interface of kernelshap()
has been revised. Instead
of specifying a prediction function, it suffices now to pass the fitted
model object. The default pred_fun
is now
stats::predict
, which works in most cases. Some other cases
are catched via model class (“ranger” and mlr3 “Learner”). The
pred_fun
can be overwritten by a function of the form
function(object, X, ...)
. Additional arguments to the
prediction function are passed via ...
of
kernelshap()
.
Some examples:
kernelshap(fit, X, bg_X)
kernelshap(fit, X, bg_X, type = "response")
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
kernelshap()
has received a more intuitive interface,
see breaking change above.kernelshap()
, e.g., using the “doFuture” package, and then
set parallel = TRUE
. Especially on Windows, sometimes not
all global variables or packages are loaded in the parallel instances.
These can be specified by parallel_args
, a list of
arguments passed to foreach()
.kernelshap()
has
become much faster.matrix
, data.frame
s, and
tibble
s, the package now also accepts
data.table
s (if the prediction function can deal with
them).kernelshap()
is less picky regarding the output
structure of pred_fun()
.kernelshap()
is less picky about the column structure
of the background data bg_X
. It should simply contain the
columns of X
(but can have more or in different order). The
old behaviour was to launch an error if
colnames(X) != colnames(bg_X)
.m = "auto"
has been changed from
trunc(20 * sqrt(p))
to
max(trunc(20 * sqrt(p)), 5 * p
. This will have an effect
for cases where the number of features \(p
> 16\). The change will imply more robust results for large
p.ks_extract(, what = "S")
.MASS::ginv()
, the Moore-Penrose pseudoinverse using
svd()
.This is the initial release.