Multiple Assignment

John Mount

2023-08-19

wrapr now supplies a name based multiple assignment notation for R.

In R there are many functions that return named lists or other structures keyed by names. Let’s start with a simple example: base::split().

First some example data.

d <- data.frame(
  x = 1:9,
  group = c('train', 'calibrate', 'test'),
  stringsAsFactors = FALSE)

knitr::kable(d)

x	group
1	train
2	calibrate
3	test
4	train
5	calibrate
6	test
7	train
8	calibrate
9	test

One way to use base::split() is to call it on a data.frame and then unpack the desired portions from the returned value.

parts <- split(d, d$group)
train_data <- parts$train
calibrate_data <- parts$calibrate
test_data <- parts$test

knitr::kable(train_data)

	x	group
1	1	train
4	4	train
7	7	train

knitr::kable(calibrate_data)

	x	group
2	2	calibrate
5	5	calibrate
8	8	calibrate

knitr::kable(test_data)

	x	group
3	3	test
6	6	test
9	9	test

If we use a multiple assignment notation we can collect some steps together, and avoid possibly leaving a possibly large temporary variable such as parts in our environment.

Let’s clear out our earlier results.

rm(list = c('train_data', 'calibrate_data', 'test_data', 'parts'))

And now let’s apply split() and unpack the results in one step.

library(wrapr)

to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ] <- split(d, d$group)

knitr::kable(train_data)

	x	group
1	1	train
4	4	train
7	7	train

knitr::kable(calibrate_data)

	x	group
2	2	calibrate
5	5	calibrate
8	8	calibrate

knitr::kable(test_data)

	x	group
3	3	test
6	6	test
9	9	test

The semantics of []<- imply that an object named “to” is left in our workspace as a side effect. However, this object is small and if there is already an object name to in the workspace that is not of class Unpacker the unpacking is aborted prior to overwriting anything. The unpacker two modes: unpack (a function that needs a dot in pipes) and to (an eager function factory that does not require a dot in pipes). The side-effect can be avoided by using := for assigment.

rm(list = c('train_data', 'calibrate_data', 'test_data', 'to'))

to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ] := split(d, d$group)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

Also the side-effect can be avoided by using alternate non-array update notations.

We will demonstrate a few of these. First is pipe to array notation.

rm(list = c('train_data', 'calibrate_data', 'test_data'))

split(d, d$group) %.>% to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ]

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

Note the above is the wrapr dot arrow pipe (which requires explicit dots to denote pipe targets). In this case it is dispatching on the class of the right-hand side argument to get the effect. This is a common feature of the wrapr dot arrow pipe. We could get a similar effect by using right-assigment “->” instead of the pipe.

We can also use a pipe function notation.

rm(list = c('train_data', 'calibrate_data', 'test_data'))

split(d, d$group) %.>% to(
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

Notice piping to to() is like piping to to[], no dot is needed.

We can not currently use the magrittr pipe in the above as in that case the unpacked results are lost in a temporary intermediate environment magrittr uses during execution.

A more conventional functional form is given in unpack(). unpack() requires a dot in wrapr pipelines.

rm(list = c('train_data', 'calibrate_data', 'test_data'))

split(d, d$group) %.>% unpack(
  .,
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

Unpack also support the pipe to array and assign to array notations. In addition, with unpack() we could also use the conventional function notation.

rm(list = c('train_data', 'calibrate_data', 'test_data'))

unpack(
  split(d, d$group),
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

to() can not be directly used as a function. It is strongly suggested that the objects returned by to[], to(), and unpack[] not ever be stored in variables, but instead only produced, used, and discarded. The issue these are objects of class "UnpackTarget" and have the upack destination names already bound in. This means if one of these is used in code: a user reading the code can not tell where the side-effects are going without examining the contents of the object.

The assignments in the unpacking block can be any of <-, =, :=, or even -> (though the last one assigns left to right).

rm(list = c('train_data', 'calibrate_data', 'test_data'))

unpack(
  split(d, d$group),
  train_data = train,
  calibrate_data = calibrate,
  test_data = test
)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

rm(list = c('train_data', 'calibrate_data', 'test_data'))

unpack(
  split(d, d$group),
  train -> train_data,
  calibrate -> calibrate_data,
  test -> test_data
)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "calibrate_data" 
##  [9] "d"               "d2"              "df"              "f"              
## [13] "inputs"          "l"               "plotb"           "test_data"      
## [17] "title"           "train_data"      "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

It is a caught and signaled error to attempt to unpack an item that is not there.

rm(list = c('train_data', 'calibrate_data', 'test_data'))

unpack(
  split(d, d$group),
  train_data <- train,
  calibrate_data <- calibrate_misspelled,
  test_data <- test
)

## Error in write_values_into_env(unpack_environment = unpack_environment, : wrapr::unpack all source names must be in value, missing: 'calibrate_misspelled'.

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "d"              
##  [9] "d2"              "df"              "f"               "inputs"         
## [13] "l"               "plotb"           "title"           "variable"       
## [17] "variable_name"   "variable_string" "x"               "xvar"           
## [21] "xvariable"       "yvar"            "yvariable"

The unpack attempts to be atomic: preferring to unpack all values or no values.

Also, one does not have to unpack all slots.

unpack(
  split(d, d$group),
  train_data <- train,
  test_data <- test
)

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "d"              
##  [9] "d2"              "df"              "f"               "inputs"         
## [13] "l"               "plotb"           "test_data"       "title"          
## [17] "train_data"      "variable"        "variable_name"   "variable_string"
## [21] "x"               "xvar"            "xvariable"       "yvar"           
## [25] "yvariable"

We can use a name alone as shorthand for name <- name (i.e. unpacking to the same name as in the incoming object).

rm(list = c('train_data', 'test_data'))

split(d, d$group) %.>%
  to[
     train,
     test
     ]

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "d"              
##  [9] "d2"              "df"              "f"               "inputs"         
## [13] "l"               "plotb"           "test"            "title"          
## [17] "train"           "variable"        "variable_name"   "variable_string"
## [21] "x"               "xvar"            "xvariable"       "yvar"           
## [25] "yvariable"

We can also use bquote .() notation to use variables to specify where data is coming from.

rm(list = c('train', 'test'))

train_source <- 'train'

split(d, d$group) %.>%
  to[
     train_result = .(train_source),
     test
     ]

ls()

##  [1] "LEFT_NAME"       "OTHER_SYMBOL"    "X"               "Y"              
##  [5] "a"               "angle"           "b"               "d"              
##  [9] "d2"              "df"              "f"               "inputs"         
## [13] "l"               "plotb"           "test"            "title"          
## [17] "train_result"    "train_source"    "variable"        "variable_name"  
## [21] "variable_string" "x"               "xvar"            "xvariable"      
## [25] "yvar"            "yvariable"

In all cases the user explicitly documents the intended data sources and data destinations at the place of assignment. This meas a later reader of the source code can see what the operation does, without having to know values of additional variables.

Related work includes:

The zeallot::%<-% arrow already supplies excellent positional or ordered unpacking. But we feel that style may be more appropriate in the Python world where many functions return un-named tuples of results. Python functions tend to have positional tuple return values because the Python language has had positional tuple unpacking as a core language feature for a very long time (thus positional structures have become “Pythonic”). R has not emphasized positional unpacking, so R functions tend to return named lists or named structures. For named lists or named structures it may not be safe to rely on value positions. So I feel it is more “R-like” to use named unpacking.
vadr::bind supplies named unpacking, but appears to use a “SOURCE = DESTINATION” notation. That is the reverse of a “DESTINATION = SOURCE” which is how both R assignments and argument binding are already written.
base::attach. base::attach adds items to the search path with names controlled by the object being attached (instead of by the user).
base::with(). unpack(list(a = 1, b = 2), x <- a, y <- b) works a lot like with(list(a = 1, b = 2), { x <<- a; y <<-b }).
tidytidbits supplies positional unpacking with a %=% notation.
wrapr::let(). wrapr::let() re-maps names during code execution using a “TARGET = NEWNAME” target replacement scheme, where TARGET acts as if it had the name stored in NEWNAME for the duration of the let-block.