diffpriv

packageversion CRAN_Status_Badge Travis Build Status Coverage Status license minimal R version

Overview

The diffpriv package makes privacy-aware data science in R easy. diffpriv implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data. Due to the worst-case nature of the framework, mechanism development typically requires involved theoretical analysis. diffpriv offers a turn-key approach to differential privacy by automating this process with sensitivity sampling in place of theoretical sensitivity analysis.

Installation

Obtaining diffpriv is easy. From within R:

##  Install the development version of diffpriv from GitHub:
install.packages("devtools")
devtools::install_github("brubinstein/diffpriv")

Example

A typical example in differential privacy is privately releasing a simple target function of privacy-sensitive input data X. Say the mean of numeric data:

## a target function we'd like to run on private data X, releasing the result
target <- function(X) mean(X)

First load the diffpriv package (installed as above) and construct a chosen differentially-private mechanism for privatizing target.

## target seeks to release a numeric, so we'll use the Laplace mechanism---a
## standard generic mechanism for privatizing numeric responses
library(diffpriv)
mech <- DPMechLaplace(target = target)

To run mech on a dataset X we must first determine the sensitivity of target to small changes to input dataset. One avenue is to analytically bound sensitivity (on paper; see the vignette) and supply it via the sensitivity argument of mechanism construction: in this case not hard if we assume bounded data, but in general sensitivity can be very non-trivial to calculate manually. The other approach, which we follow in this example, is sensitivity sampling: repeated probing of target to estimate sensitivity automatically. We need only specify a distribution for generating random probe datasets; sensitivitySampler() takes care of the rest. The price we pay for this convenience is the weaker form of random differential privacy.

## set a dataset sampling distribution, then estimate target sensitivity with
## sufficient samples for subsequent mechanism responses to achieve random
## differential privacy with confidence 1-gamma
distr <- function(n) rnorm(n)
mech <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1)
#> Sampling sensitivity with m=285 gamma=0.1 k=285
mech@sensitivity    ## DPMech and subclasses are S4: slots accessed via @
#> [1] 0.8089517

With a sensitivity-calibrated mechanism in hand, we can release private responses on a dataset X, displayed alongside the non-private response for comparison:

X <- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() n
r <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X)
cat("Private response r$response:   ", r$response,
  "\nNon-private response target(X):", target(X))
#> Private response r$response:    -1.119506 
#> Non-private response target(X): -0.707

Getting Started

The above example demonstrates the main components of diffpriv:

Read the package vignette for more, or news for the latest release notes.

Citing the Package

diffpriv is an open-source package offered with a permissive MIT License. Please acknowledge use of diffpriv by citing the paper on the sensitivity sampler:

Benjamin I. P. Rubinstein and Francesco Aldà. “Pain-Free Random Differential Privacy with Sensitivity Sampling”, to appear in the 34th International Conference on Machine Learning (ICML’2017), 2017.

Other relevant references to cite depending on usage: