The diffpriv
package makes privacy-aware data science in
R easy. diffpriv
implements the formal framework of
differential privacy: differentially-private mechanisms can safely
release to untrusted third parties: statistics computed, models fit, or
arbitrary structures derived on privacy-sensitive data. Due to the
worst-case nature of the framework, mechanism development typically
requires involved theoretical analysis. diffpriv
offers a
turn-key approach to differential privacy by automating this process
with sensitivity sampling in place of theoretical sensitivity
analysis.
Obtaining diffpriv
is easy. From within R:
## Install the development version of diffpriv from GitHub:
install.packages("devtools")
::install_github("brubinstein/diffpriv") devtools
A typical example in differential privacy is privately releasing a
simple target
function of privacy-sensitive input data
X
. Say the mean of numeric
data:
## a target function we'd like to run on private data X, releasing the result
<- function(X) mean(X) target
First load the diffpriv
package (installed as above) and
construct a chosen differentially-private mechanism for privatizing
target
.
## target seeks to release a numeric, so we'll use the Laplace mechanism---a
## standard generic mechanism for privatizing numeric responses
library(diffpriv)
<- DPMechLaplace(target = target) mech
To run mech
on a dataset X
we must first
determine the sensitivity of target
to small changes to
input dataset. One avenue is to analytically bound sensitivity (on
paper; see the vignette)
and supply it via the sensitivity
argument of mechanism
construction: in this case not hard if we assume bounded data, but in
general sensitivity can be very non-trivial to calculate manually. The
other approach, which we follow in this example, is sensitivity
sampling: repeated probing of target
to estimate
sensitivity automatically. We need only specify a distribution for
generating random probe datasets; sensitivitySampler()
takes care of the rest. The price we pay for this convenience is the
weaker form of random differential privacy.
## set a dataset sampling distribution, then estimate target sensitivity with
## sufficient samples for subsequent mechanism responses to achieve random
## differential privacy with confidence 1-gamma
<- function(n) rnorm(n)
distr <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1)
mech #> Sampling sensitivity with m=285 gamma=0.1 k=285
@sensitivity ## DPMech and subclasses are S4: slots accessed via @
mech#> [1] 0.8089517
With a sensitivity-calibrated mechanism in hand, we can release
private responses on a dataset X
, displayed alongside the
non-private response for comparison:
<- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() n
X <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X)
r cat("Private response r$response: ", r$response,
"\nNon-private response target(X):", target(X))
#> Private response r$response: -1.119506
#> Non-private response target(X): -0.707
The above example demonstrates the main components of
diffpriv
:
DPMech
for generic mechanisms that
captures the non-private target
and releases privatized
responses from it. Current subclasses
DPMechLaplace
, DPMechGaussian
: the Laplace
and Gaussian mechanisms for releasing numeric responses with additive
noise;DPMechExponential
: the exponential mechanism for
privately optimizing over finite sets (which need not be numeric);
andDPMechBernstein
: the Bernstein mechanism for privately
releasing multivariate real-valued functions. See the bernstein
vignette for more.DPParamsEps
and subclasses for encapsulating
privacy parameters.sensitivitySampler()
method of DPMech
subclasses estimates target sensitivity necessary to run
releaseResponse()
of DPMech
generic
mechanisms. This provides an easy alternative to exact sensitivity
bounds requiring mathematical analysis. The sampler repeatedly probes
DPMech@target
to estimate sensitivity to data perturbation.
Running mechanisms with obtained sensitivities yield random differential
privacy.Read the package vignette for more, or news for the latest release notes.
diffpriv
is an open-source package offered with a
permissive MIT License. Please acknowledge use of diffpriv
by citing the paper on the sensitivity sampler:
Benjamin I. P. Rubinstein and Francesco Aldà. “Pain-Free Random Differential Privacy with Sensitivity Sampling”, to appear in the 34th International Conference on Machine Learning (ICML’2017), 2017.
Other relevant references to cite depending on usage: