wrapr
now supplies a name based multiple assignment notation for R
.
In R
there are many functions that return named lists or
other structures keyed by names. Let’s start with a simple example:
base::split()
.
First some example data.
d <- data.frame(
x = 1:9,
group = c('train', 'calibrate', 'test'),
stringsAsFactors = FALSE)
knitr::kable(d)
x | group |
---|---|
1 | train |
2 | calibrate |
3 | test |
4 | train |
5 | calibrate |
6 | test |
7 | train |
8 | calibrate |
9 | test |
One way to use base::split()
is to call it on a
data.frame
and then unpack the desired portions from the
returned value.
parts <- split(d, d$group)
train_data <- parts$train
calibrate_data <- parts$calibrate
test_data <- parts$test
x | group | |
---|---|---|
1 | 1 | train |
4 | 4 | train |
7 | 7 | train |
x | group | |
---|---|---|
2 | 2 | calibrate |
5 | 5 | calibrate |
8 | 8 | calibrate |
x | group | |
---|---|---|
3 | 3 | test |
6 | 6 | test |
9 | 9 | test |
If we use a multiple assignment notation we can collect some steps
together, and avoid possibly leaving a possibly large temporary variable
such as parts
in our environment.
Let’s clear out our earlier results.
And now let’s apply split()
and unpack the results in
one step.
library(wrapr)
to[
train_data <- train,
calibrate_data <- calibrate,
test_data <- test
] <- split(d, d$group)
x | group | |
---|---|---|
1 | 1 | train |
4 | 4 | train |
7 | 7 | train |
x | group | |
---|---|---|
2 | 2 | calibrate |
5 | 5 | calibrate |
8 | 8 | calibrate |
x | group | |
---|---|---|
3 | 3 | test |
6 | 6 | test |
9 | 9 | test |
The semantics of []<-
imply that an object named
“to
” is left in our workspace as a side effect. However,
this object is small and if there is already an object name
to
in the workspace that is not of class
Unpacker
the unpacking is aborted prior to overwriting
anything. The unpacker two modes: unpack
(a function that
needs a dot in pipes) and to
(an eager function factory
that does not require a dot in pipes). The side-effect can be avoided by
using :=
for assigment.
rm(list = c('train_data', 'calibrate_data', 'test_data', 'to'))
to[
train_data <- train,
calibrate_data <- calibrate,
test_data <- test
] := split(d, d$group)
ls()
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
Also the side-effect can be avoided by using alternate non-array update notations.
We will demonstrate a few of these. First is pipe to array notation.
split(d, d$group) %.>% to[
train_data <- train,
calibrate_data <- calibrate,
test_data <- test
]
ls()
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
Note the above is the wrapr
dot arrow pipe (which requires explicit dots to denote pipe
targets). In this case it is dispatching on the class of the right-hand
side argument to get the effect. This is a common feature of the wrapr
dot arrow pipe. We could get a similar effect by using right-assigment
“->
” instead of the pipe.
We can also use a pipe function notation.
split(d, d$group) %.>% to(
train_data <- train,
calibrate_data <- calibrate,
test_data <- test
)
ls()
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
Notice piping to to()
is like piping to
to[]
, no dot is needed.
We can not currently use the magrittr
pipe in the above
as in that case the unpacked results are lost in a temporary
intermediate environment magrittr
uses during
execution.
A more conventional functional form is given in
unpack()
. unpack()
requires a dot in
wrapr
pipelines.
split(d, d$group) %.>% unpack(
.,
train_data <- train,
calibrate_data <- calibrate,
test_data <- test
)
ls()
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
Unpack also support the pipe to array and assign to array notations.
In addition, with unpack()
we could also use the
conventional function notation.
unpack(
split(d, d$group),
train_data <- train,
calibrate_data <- calibrate,
test_data <- test
)
ls()
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
to()
can not be directly used as a function. It is
strongly suggested that the objects returned by
to[]
, to()
, and unpack[]
not
ever be stored in variables, but instead only produced, used, and
discarded. The issue these are objects of class
"UnpackTarget"
and have the upack destination names already
bound in. This means if one of these is used in code: a user reading the
code can not tell where the side-effects are going without examining the
contents of the object.
The assignments in the unpacking block can be any of
<-
, =
, :=
, or even
->
(though the last one assigns left to right).
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
unpack(
split(d, d$group),
train -> train_data,
calibrate -> calibrate_data,
test -> test_data
)
ls()
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "calibrate_data"
## [9] "d" "d2" "df" "f"
## [13] "inputs" "l" "plotb" "test_data"
## [17] "title" "train_data" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
It is a caught and signaled error to attempt to unpack an item that is not there.
unpack(
split(d, d$group),
train_data <- train,
calibrate_data <- calibrate_misspelled,
test_data <- test
)
## Error in write_values_into_env(unpack_environment = unpack_environment, : wrapr::unpack all source names must be in value, missing: 'calibrate_misspelled'.
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "d"
## [9] "d2" "df" "f" "inputs"
## [13] "l" "plotb" "title" "variable"
## [17] "variable_name" "variable_string" "x" "xvar"
## [21] "xvariable" "yvar" "yvariable"
The unpack attempts to be atomic: preferring to unpack all values or no values.
Also, one does not have to unpack all slots.
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "d"
## [9] "d2" "df" "f" "inputs"
## [13] "l" "plotb" "test_data" "title"
## [17] "train_data" "variable" "variable_name" "variable_string"
## [21] "x" "xvar" "xvariable" "yvar"
## [25] "yvariable"
We can use a name alone as shorthand for name <- name
(i.e. unpacking to the same name as in the incoming object).
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "d"
## [9] "d2" "df" "f" "inputs"
## [13] "l" "plotb" "test" "title"
## [17] "train" "variable" "variable_name" "variable_string"
## [21] "x" "xvar" "xvariable" "yvar"
## [25] "yvariable"
We can also use bquote
.()
notation to use
variables to specify where data is coming from.
## [1] "LEFT_NAME" "OTHER_SYMBOL" "X" "Y"
## [5] "a" "angle" "b" "d"
## [9] "d2" "df" "f" "inputs"
## [13] "l" "plotb" "test" "title"
## [17] "train_result" "train_source" "variable" "variable_name"
## [21] "variable_string" "x" "xvar" "xvariable"
## [25] "yvar" "yvariable"
In all cases the user explicitly documents the intended data sources and data destinations at the place of assignment. This meas a later reader of the source code can see what the operation does, without having to know values of additional variables.
Related work includes:
zeallot::%<-%
arrow already supplies excellent positional or ordered unpacking. But we
feel that style may be more appropriate in the Python world where many
functions return un-named tuples of results. Python functions tend to
have positional tuple return values because the Python language
has had positional tuple unpacking as a core language feature for a very
long time (thus positional structures have become “Pythonic”). R has not
emphasized positional unpacking, so R functions tend to return named
lists or named structures. For named lists or named structures it may
not be safe to rely on value positions. So I feel it is more “R-like” to
use named unpacking.
vadr::bind
supplies named unpacking, but appears to use a “SOURCE =
DESTINATION
” notation. That is the reverse of a
“DESTINATION = SOURCE
” which is how both R assignments and
argument binding are already written.
base::attach
. base::attach
adds items to the
search path with names controlled by the object being attached (instead
of by the user).
base::with()
. unpack(list(a = 1, b = 2), x <- a, y
<- b)
works a lot like with(list(a = 1, b = 2), { x
<<- a; y <<-b })
.
tidytidbits
supplies positional unpacking with a %=%
notation.
wrapr::let()
.
wrapr::let()
re-maps names during code execution using a
“TARGET = NEWNAME
” target replacement scheme, where
TARGET
acts as if it had the name stored in
NEWNAME
for the duration of the let-block.