If you are distributing binary R packages (or any other binary) for Linux, it is important that you check and declare the run-time dependencies for your binaries. This can easily be automated, and prevents many problems and conflicts. Currently RSPM leaves the client guessing which system libraries the binaries are linked to, which results in users installing unnecessary build-time dependencies, sometimes even the wrong ones.
Once you distinguish between build-time and run-time system libraries in Linux distributions, the solution is obvious, and the system will become much simpler and more robust.
This is not a hack, Linux package managers have been designed to automatically
determine dependencies between system libraries. You should use the
same tools when providing binaries for R packages, even if they are not
distributed in a rpm
or deb
format.
Many R packages on Linux require external system libraries. When you
build the package from source, you need the build-time system library,
which includes header files and has many additional dependencies needed
at build-time. These build-time system libraries are always named with a
-dev
or -devel
postfix, for example libcurl4-openssl-dev
on Debian/Ubuntu, and curl-devel
on Fedora/RHEL.
But, here is the crucial part: once the R package has been compiled, you only need the run-time system library to use it! This is a different package which is much lighter, because the build-time package always depends on the run-time package, but not the other way around.
Run-time system libraries are:
ldd
on the R
package .so
fileFor example, if you build an R package against libcurl4-openssl-dev
,
then the run-time dependency is libcurl4
.
When you provide users with pre-compiled binaries on Linux, you really need to provide the metadata about the run-time dependencies of those binaries. You can easily automate this, and it would make RSPM dependency management much simpler and more reliable.
In a nutshell: After you have successfully built an R package on your
Linux server, run ldd
on the package .so
file
to list the shared libraries it links to. The operating system package
manager (e.g. yum
or dpkg
) can tell you which
system package each file belongs to. Simply add this information to the
binary package DESCRIPTION file that you are shipping. That’s it!
To make it even easier: the maketools
package has an
example function that shows the system dependencies for installed R
packages on Linux. For example, let’s have a look at the dependencies of
the sf
CRAN package. On Ubuntu 20.04 we see:
> maketools::package_sysdeps("sf")
shlib package headers source version
1 libproj.so.15.3.1 libproj15 libproj-dev proj 6.3.1-1
2 libgdal.so.26.0.4 libgdal26 libgdal-dev gdal 3.0.4+dfsg-1build3
3 libgeos_c.so.1.13.1 libgeos-c1v5 libgeos-dev geos 3.8.0-1build1
4 libstdc++.so.6.0.28 libstdc++6 <NA> gcc 10-20200411-0ubuntu1
And on Fedora 32 we get:
> maketools::package_sysdeps("sf")
shlib package headers source version
1 libproj.so.15.3.2 proj proj-devel proj 6.3.2
2 libgdal.so.26.0.4 gdal-libs gdal-devel gdal 3.0.4
3 libgeos_c.so.1.13.3 geos geos-devel geos 3.8.1
4 libstdc++.so.6.0.28 libstdc++ <NA> gcc 10.2.1
The first column shlib
tells you which shared libraries
the R package is linked to, i.e. the filenames of the .so
files. The second column shows which system package this file belongs
to. This is the (only) relevant piece of information when you are
distributing the binary, because these are exactly the system packages
the client needs to have installed for the binary R package to work.
Nothing more, nothing less!
A simple way to build R binary packages is on a server or container
that has all build-time libraries pre-installed (the per-package
build-time dependencies are really not relevant). For example you can
use the cranlike cran/debian
or cran/ubuntu
docker images for the latest version of Debian and Ubuntu.
docker run -it cran/ubuntu
After building and installing an R package, you check the package run-time dependencies, for example:
> install.packages("openssl")
## ...
## ...
## ** checking absolute paths in shared objects and dynamic libraries
## ** testing if installed package can be loaded from final location
## ** testing if installed package keeps a record of temporary installation path
## * DONE (openssl)
> maketools::package_sysdeps("openssl")
shlib package headers source version
1 libssl.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2
2 libcrypto.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2
For every R binary package you distribute, you should provide, at a
minimum, the information from the package
column. The best
way would be to add this to the DESCRIPTION file of the binary R
package, and ideally also expose this in the PACKAGES
repository index. Thereby clients can lookup the required system
dependencies needed for this binary R package, 100% reliably, without
guessing or conflicts.