One of the most important aspects of the R ecosystem is the ease with
which extensions and new features can be developed and distributed in
the form of packages. The main distribution channel is the
Comprehensive R Archive Network, from which packages can be
installed directly from R. Another popular option is using GitHub
repositories from which packages can also be painlessly installed,
e.g. using install_github
from the devtools
package.
The import
package provides an alternative approach to
using external functionality in R scripts; first however, it is useful
to describe the standard approach to clarify how import
may
serve as improvement. The most common way to include the functionality
provided by a package is to use the library
function:
library(PackageA)
library(PackageB)
value1 <- function_a(...) # Supposedly this comes from PackageA,
value2 <- function_b(...) # and this from PackageB, but who knows?!
...
In some situations this is fine; however there are some subtle shortcomings:
The problem with (1) is that the search path is populated with more
objects than are needed and it is not immediately clear whether name
clashes will occur. Problem (2) refers to the case where packages export
different objects with the same names, say if function_b
is
exported in both PackageA
and PackageB
above.
In this case the name will point to the object from the package attached
last. The earlier exposed objects are said to masked.
Even if this is not a problem when writing the script, an
update of packages may cause this problem later on when
executing the script; and tracking down the resulting errors
may be tough and time consuming. Problem (3) may appear unimportant, but
it is not to be underestimated. Code snippets are very commonly shared
and spending time figuring out where functionality comes from is not a
very satisfying nor value-adding activity.
It is possible to unambiguously specify where a function comes from
by prefixing it with ::
every time it used, but this is
often overly verbose and does not provide an easily accessible overview
of what external functionality is used in a script. One may also import
single exported objects, one at a time, using the (double) “colon
syntax”,
The downside of this approach is that the object is placed in
the user’s global work space, rather than being encapsulated
somewhere else in the search path (when using library
to
load pkg
, a namespace package:pkg
will be
attached in the search path which will contain the exported functions
from pkg
). Another minor point is that one can only
import one object at a time using this approach.
While packages form the backbone of code distribution, another option
comes in the form of scripts, but these are usually task
specific and not commonly used to “bundle” functionality for use in
other scripts. In particular, when source
is used
to include contents from one script in another, once again all
objects produced by the script will be “exposed” and may “over populate”
the working environment, masking other objects, if not only producing
some mental clutter. Scope management is therefore not too comfortable
when splitting functionality across files in a modular way.
The import
package sets out to improve the way external
functionality is included in your code by alleviating some of the
concerns raised above by providing an expressive way of importing object
from both packages and scripts. The latter provides a bridge between the
package approach to distribution and simple stand-alone script
files. This allows for the use of scripts as modules, a
collection of related object definitions, each of which may be used at
different places without exposing more than necessary.
The package is inspired in part by Python’s
from some_module import some_function
syntax, and solves
the two issues raised above. It is also similar to
roxygen2
s
@importFrom package function1 function2
for packages. While
import
will also work for package development, the intended
use case is when using external functions R
scripts.
In addition to being able to import objects from packages,
import
also allows you to import objects from other
scripts (i.e. a kind of module). This allows a simple
way to distribute and use functionality without the need to write a full
package. One example is a Shiny app, where one can place definitions in
a script and import only the needed objects where they are used. This
avoids workspace clutter and name clashes. For more details see
below.
The most basic use case is to import a few functions from package
(here the psych
package):
The imported objects are placed in a separate entity in the search
path which by default is named “imports”. It is therefore also easy to
get rid of them again with detach("imports")
. The main
point is that it is clear which functions will be used and where
they come from. It’s noteworthy that there is nothing special
going on: the import::from
function is only a convenient
wrapper around getExportedValue
(as is ::
itself) and assign
.
The import
package itself should not to be attached
(don’t include it via library
, you will get a warning).
Rather, it is designed to be expressive when using the colon syntax. To
import non-exported objects one must use triple-colon syntax:
import:::from(pkg, obj)
. If any of the import
functions are called regularly, i.e. without preceding
import::
or import:::
, an error is raised. If
import
is attached, a startup message will inform that
import
should not be attached.
If one of the function names conflicts with an existing function
(such as filter
from the dplyr
package) it is
simple to rename it:
This does pretty much what it says: three functions are imported from
dplyr
, two of which will keep their original name, and one
which is renamed, e.g. to avoid name clash with
stats::filter
.
You can use .all=TRUE
to import all functions from a
package, but rename one of them:
To omit a function from the import, use .except
(which
takes a character vector):
Note that import
tries to be smart about this and
assumes that if you are using the .except
parameter, you
probably want to import everything you are not explicitly
omitting, and sets the .all
parameter to TRUE
.
You can override this in exceptional cases, but you seldom need to.
Finally, a more complex example, combining a few different import statements:
import::from(magrittr, "%>%")
import::from(dplyr, starwars, select, mutate, keep_when = filter)
import::from(tidyr, unnest)
import::from(broom, tidy)
ready_data <-
starwars %>%
keep_when(mass < 100) %>%
select(name, height, mass, films) %>%
unnest(films) %>%
mutate( log_mass = log(mass), films=factor(films))
linear_model <-
lm(log_mass ~ height + films, data = ready_data) %>%
tidy
In the above, it is clear which package provides
which functions (one could e.g. otherwise be tempted to think
that tidy
belonged to tidyr
). Note that
ordering is irrelevant, even if tidyr
at some point exposes
a function tidy
after an update, as import
is
explicit about importing.
It also shows that one can import multiple objects in a single
statement, and even rename objects if desired; for example, in the above
one can imagine that filter
from stats
is
needed later on, and so dplyr
’s filter
is
renamed to avoid confusion. Sometimes, it is not at all clear what
purpose a package has; e.g. the name magrittr
does not
immediately reveal that it’s main purpose is to provide the pipe
operator, %>%
.
The import
package allows for importing objects defined
in script files, which we will here refer to as “modules”. The module
will be fully evaluated by import
when an import is
requested, after which objects such as functions or data can be
imported. Such modules should be side-effect free, but this is not
enforced.
Attachments are detached (e.g. packages attached by
library
) but loaded namespaces remain loaded. This means
that values created by functions in an attached namespace will
work with import
, but functions to be exported should
not rely on such functions (use function importing in the modules
instead).
For example, the file sequence_module.R contains several functions calculating terms of mathematical sequences. It is possible to import from such files, just as one imports from packages:
Renaming, the .all
, and the .except
parameters work in the same way as for packages:
If a module is modified, import
will realize this and
reload the script if further imports are executed or re-executed;
otherwise additional imports will not cause the script to be reloaded
for efficiency. As the script is loaded in its own environment
(maintained by import
) dependencies are kept (except those
exposed through attachment), as the following small example shows.
Contents of “some_module.R”:
## Do not use library() inside a module. This results in a warning,
## and functions relying on ggplot2 will not work.
#library(ggplot2)
## This is also not recommended, because it is not clear wether recursively
## imported functions should be available after the module is imported
#import::here(qplot, .from = ggplot2)
## This is the recommended way to recursively import functions on which
## module functions depend. The qplot function will be available to
## module functions, but will not itself be available after import
import::here(qplot, .from = ggplot2)
## Note this operator overload is not something you want to `source`!
`+` <- function(e1, e2)
paste(e1, e2)
## Some function relying on the above overload:
a <- function(s1, s2)
s1 + rep(s2, 3)
## Another value.
b <- head(iris, 10)
## A value created using a recursively imported function
p <- qplot(Sepal.Length, Sepal.Width, data = iris, color = Species)
## A function relying on a function exposed through attachment:
plot_it <- function()
qplot(Sepal.Length, Sepal.Width, data = iris, color = Species)
Usage:
import::from(some_module.R, a, b, p, plot_it)
## Works:
a("cool", "import")
## The `+` is not affecting anything here, so this won't work:
# "cool" + "import"
# Works:
b
p
plot_it()
Suppose that you have some related functionality that you wish to
bundle, and that authoring a full package seems excessive or
inappropriate for the specific task, for example bundling related user
interface components for a shiny
application. One option
with import
is to author a module (script), say as outlined
below:
# File: foo.R
# Desc: Functionality related to foos.
# Imports from other_resources.R
# When recursively importing from another module or package for use by
# your module functions, you should always use import::here() rather
# than import::from() or library()
import::here(fun_a, fun_b, .from = "other_resources.R")
internal_fun <- function(...) ...
fun_c <- function(...)
{
...
a <- fun_a(...)
i <- internal_fun(...)
...
}
fun_d <- function(...) ...
Then in another file we need fun_c
:
# File: bar.R
# Desc: Functionality related to bars.
# Imports from foo.R
import::here(fun_c, .from = "foo.R")
...
In the above, only fun_c
is visible inside
bar.R
. The functions on which it depends exist, but are not
exposed. Also, note that imported scripts may themselves import.
Since the desired effect of import::from
inside a module
script is ambiguous, this results in a warning (but the functions will
still be imported into the local environment of the script, just as with
import::here
which only imports are only exposed to the
module itself.
When importing from a module, it is sourced into an environment
managed by import
, and will not be sourced again upon
subsequent imports (unless the file has changed). For example, in a
shiny
application, importing some objects in
server.R
and others in ui.R
from the same
module will not cause it to be sourced twice.
The import
package will by default use the current set
of library paths, i.e. the result of .libPaths()
. It is,
however, possible to specify a different set of library paths using the
.library
argument in any of the import
functions, for example to import packages installed in a custom
location, or to remove any ambiguity as to where imports come from.
Note that in versions up to and including 1.3.0
this
defaulted to use only the first entry in the library paths,
i.e. .library=.libPaths()[1L]
. We believe the new default
is applicable in a broader set of circumstances, but if this change
causes any issues, we would very much appreciate hearing about it.
When importing from a module (.R file), the directory where
import
looks for the module script can be specified with
the with .directory
parameter. The default is
.
(the current working directory).
One can also specify which names to use in the search path and use
several to group imports. Names can be specified either as character
literals or as variables of type character
(for example if
the environment needs to be determined dynamically).
import::from(magrittr, "%>%", "%$%", .into = "operators")
import::from(dplyr, arrange, .into = "datatools")
import::from(psych, describe, .into=month.name[1]) # Uses env: "January"
The import::into
and import::from
accept
the same parameters and achieve the same result. The the choice between
them a matter of preference). If using custom search path entities
actively, one might prefer the alternative syntax (which does the same
but reverses the argument order):
import::into("operators", "%>%", "%$%", .from = magrittr)
import::into("datatools", arrange, .from = dplyr)
import::into(month.name[1], describe, .from=psych)
Be aware that beginning in version 1.3.0
hidden objects
(those with names prefixed by a period) are supported. Take care to
avoid name clashes with argument names.
If it is desired to import objects directly into the current
environment, this can be accomplished by import::here
. This
is particularly useful when importing inside a function definition, or
module scripts as described here.
Instead of specifying a named environment on the search path, by
passing a character
to the .into
parameter, it
is possible to directly specify an environment. The function
automatically determines which use case is involved, based on the
mode()
of the .into
parameter (either
character
or environment
).
Prior to version 1.3.0
, non-standard evaluation (NSE)
was applied to the .into
parameter, and it was necessary to
surround it with {}
, in order for it to be treated as an
environment
. This is no longer needed, although it is still
allowed (curly brackets are simply ignored).
Examples include:
# Import into the local environment
import::into(environment(), "%>%", .from = magrittr)
# Import into the global environment, curlies are optional
import::into({.GlobalEnv}, "%>%", "%$%", .from = magrittr)
# Import into a new environment, mainly useful for python-style imports
# (see below)
x = import::into(new.env(), "%<>%", .from = magrittr)
The import
package uses non-standard evaluation (NSE) on
the .from
and ...
parameters, allowing the
names of packages and functions to be listed without quoting them. This
makes some common use-cases very straightforward, but can get in the way
of more programmatic usages.
This is where the .character_only
parameter comes in
handy. By setting .character_only=TRUE
, the non-standard
evaluation of the .from
and the ...
parameters
is disabled. Instead, the parameters are processed as character vectors
containing the relevant values.
(Previously, NSE was also applied to the .into
parameter, but as of version 1.3.0
this is no longer the
case, and all parameters except .from
and ...
are always evaluated in a standard way.)
It is useful to examine some examples of how specifying
.character_only=TRUE
can be helpful.
It is not always know in advance which objects to import from a given
package. For example, assume we have a list of objects from the
broom
package that we need to import, we can do it as
follows:
This will import the three functions specified in the
objects
vector. It is worth noting that because
.character_only
disables non-standard evaluation on
all parameters, the name of the package must now be quoted.
One common use case is when one wants to import all objects except
one or a few, because of conflicts with other packages. Should one, for
example, want to use the stats
versions of the
filter()
and lag()
functions, but import all
the other functions in the dplyr
package, one could do it
like this:
The same approach can be used when the directory of the source file for a module is not known in advance. This can be useful when the original source file is not always run with the original working directory, but one still does not want to specify a hard-coded absolute path, but to determine it at run time:
mymodule <- file.path(mypath, "module.R")
import::from(mymodule, "myfunction", .character_only=TRUE)
Again, note that now the name of the function must be quoted because non-standard evaluation is disabled on all parameters.
The here
package is useful in many circumstances like
this; it allows the setting of a “root” directory for a project and by
using the here::here()
function to figure out the correct
directory, regardless of the working directory.
Alternatively, if the file name is always the same and it is only the
directory that differs, you could use the .directory
parameter, which always expects standard evaluation arguments.
Note that here::here()
has no relation to
import::here()
despite the similarity in names.
Another case where .character_only
comes in handy is
when one wants to import some functions from a URL. While
import
does not allow direct importing from a URL (because
of difficult questions about when a URL target has changes, whether to
download a file and other things), it easy to achieve the desired result
by using the pins
package (whose main purpose is to resolve
such difficult questions). A simple example follows, which directly
imports the myfunc()
function, which is defined in the plusone_module.R
:
A frequent pattern in python imports packages under an alias; all subsequent use of the imported objects then explicitly includes the alias:
In order to achieve this functionality with the import package, use
.into={new.env()}
which assign to a new environment without
attaching it. import::from()
returns this environment, so
it can be assigned to a variable:
# Import into a new namespace, use $ to access
td <- import::from(tidyr, spread, pivot_wider, .into={new.env()})
dp <- import::from(dplyr, .all=TRUE, .into={new.env()})
dp$select(head(cars),dist)
#> dist
#> 1 2
#> 2 10
#> 3 4
#> 4 22
#> 5 16
#> 6 10
# Note that functions are not visible without dp$ prefix
select(head(cars),dist)
#> Error in select(head(cars), dist): could not find function "select"
S3 methods work well in local context, but when the method is called
from a different environment, it must be registered (in packages, this
is done in the NAMESPACE
file). import
can now
register methods of form generic.class
or
generic.class.name
automatically using the new
.S3
argument. By specifying .S3=TRUE
,
import
will automatically detect methods for existing or
new generics. No need to export and/or register them manually!
Consider following script foo.r
with a generic and two
methods:
# foo.r
# functions with great foonctionality
foo = function(x){
UseMethod("foo", x)
}
foo.numeric <- function(x){
x + 1
}
foo.character <- function(x){
paste0("_", x, "_")
}
Now, all we need is to import the foo
generic:
This is an experimental feature. We think it should work well and you are encouraged to use it and report back – but the syntax and semantics may change in the future to improve the feature.