The goal of checkpoint
is to solve the problem of
package reproducibility in R. Specifically, checkpoint
solve the problems that occur when you don’t have the correct versions
of R packages. Since packages get updated on CRAN all the time, it can
be difficult to recreate an environment where all your packages are
consistent with some earlier state.
To solve this, checkpoint
allows you to install packages
from a specific snapshot date. In other words, checkpoint
makes it possible to install package versions from a specific date in
the past, as if you had a CRAN time machine.
Version 1.0 of checkpoint is a major refactoring/rewrite, aimed at resolving many long-standing issues. You can provide feedback by opening an issue or by contacting me.
With the checkpoint
package, you can easily:
The checkpoint
package has 3 main functions.
The create_checkpoint
function
~/.checkpoint
by default, but you can
change its location.library()
and
require()
calls, as well as the namespacing operators
::
and :::
.The use_checkpoint
function
options(repos)
create_checkpoint
, i.e. modifies
.libPaths()
This means the remainder of your script will run with the packages from your specified date.
Finally, the checkpoint
function serves as a unified
interface to create_checkpoint
and
use_checkpoint
. It looks for a pre-existing checkpoint
directory, and if not found, creates it with
create_checkpoint
. It then calls
use_checkpoint
to put the checkpoint into use.
Sharing a script to be reproducible is as easy as placing the following snippet at the top:
library(checkpoint)
checkpoint("2020-01-01") # replace with desired date
Then send this script to your collaborators. When they run this
script on their machine for the first time, checkpoint
will
perform the same steps of scanning for package dependencies, creating
the checkpoint directory, installing the necessary packages, and setting
your session to use the checkpoint. On subsequent runs,
checkpoint
will find and use the created checkpoint, so the
packages don’t have to be installed again.
If you have more than one script in your project, you can place the above snippet in every standalone script. Alternatively, you can put it in a script of its own, and run it before running any other script.
The checkpoint
package is designed to be used with
projects, which are directories that contain the R code and
output associated with the tasks you’re working on. If you use RStudio,
you will probably be aware of the concept, but the same applies for many
other programming editors and IDEs including Visual Studio Code,
Notepad++ and Sublime Text.
When it is run, create_checkpoint
scans all R files
inside a given project to determine what packages your code requires.
The default project is the current directory "."
.
If you do not have an actual project open, this will usually expand
to your R user directory (~/<username>
on Unix/Linux
and MacOS, or C:\Users\<username>\Documents
on
Windows). For most people, this means that the function will scan
through all the projects they have on their machine, which can
lead to checkpointing a very large number of packages. Because of this,
you should ensure that you are not in your user directory when you run
checkpoint
. A mitigating factor is that this should happen
only once, as long as the checkpoint directory remains intact.
For an even more stringent form of reproducibility, you can use the following:
library(checkpoint)
checkpoint("2020-01-01", r_version="3.6.2") # replace with desired date and R version
This requires that anyone running the script must be using the specified version of R. The benefit of this is because changes in R over time can affect reproducibility just like changes in third-party packages, so by restricting the script to only one R version, we remove another possible source of variation. However, R itself is usually very stable, and requiring a specific version can be excessively demanding especially in locked-down IT environments. For this reason, specifying the R version is optional.
checkpoint
will automatically add the
rmarkdown
package as a dependency if it finds any
Rmarkdown-based files (those with extension .Rmd
,
.Rpres
or .Rhtml
) in your project. This allows
you to continue working with such documents after checkpointing.
To reset your session to the way it was before checkpointing, call
uncheckpoint()
. Alternatively, you can simply restart
R.
To update an existing checkpoint, for example if you need new
packages installed, call create_checkpoint()
again. Any
existing packages will remain untouched.
The functions delete_checkpoint()
and
delete_all_checkpoints()
allow you to remove checkpoint
directories that are no longer required. They check that the
checkpoint(s) in question are not actually in use before deleting.
Each time create_checkpoint()
is run, it saves a series
of json files in the main checkpoint directory. These are outputs from
the pkgdepends
package, which checkpoint
uses
to perform the actual package installation, and can help you debug any
problems that may occur.
<date>_<time>_refs.json
: Packages to be
installed into the checkpoint<date>_<time>_config.json
: Configuration
parameters for the checkpoint<date>_<time>_resolution.json
: Dependency
resolution result<date>_<time>_solution.json
: Solution to
package dependencies<date>_<time>_downloads.json
: Download
result<date>_<time>_install_plan.json
: Package
install plan<date>_<time>_installs.json
: Final
installation resultFor more information, see the help for
pkgdepends::pkg_installation_proposal
.
checkpoint
is on CRAN:
install.packages("checkpoint")
The development version of checkpoint
is on GitHub:
install.packages("devtools")
::install_github("RevolutionAnalytics/checkpoint") devtools
https://github.com/RevolutionAnalytics/checkpoint
Post an issue on the Issue tracker at https://github.com/RevolutionAnalytics/checkpoint/issues
https://github.com/RevolutionAnalytics/checkpoint-server
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.