Quoting Concatinate

John Mount

2023-08-19

In R data analysis code-capturing interfaces (or non-standard-evaluation/NSE interfaces are considered a fun convenience. These allow users to type column names in directly when working. However in functions and packages there can be a large interface price or code safety risk to pay for supplying the user the minor convenience of eliding a few quotation marks.

Our package wrapr has the sensible design principle: NSE can be convenient to the user, but we are going to isolate it to a few methods (for safe and simple code). Users then pass the objects that those methods create as values to other methods. Computation is meant to be over values, so this is a good trade-off.

The primary place code capturing shows up in our wrapr R package is in the qc() and qchar_frame() methods.

qc() stands for “quoting concatenate” it is much like R’s c() (combine), but it quotes its arguments before concatenating vectors or lists.

It lets you replace this:

c("Petal.Width", "Petal.Length")
## [1] "Petal.Width"  "Petal.Length"

with this:

library("wrapr")
   
qc(Petal.Width, Petal.Length)
## [1] "Petal.Width"  "Petal.Length"

This, in turn, lets you replace this:

library("dplyr")

iris %>%
  select(., Petal.Width, Petal.Length) %>%
  head(.)

with this:

iris[, qc(Petal.Width, Petal.Length)] %.>%
  head(.)
##   Petal.Width Petal.Length
## 1         0.2          1.4
## 2         0.2          1.4
## 3         0.2          1.3
## 4         0.2          1.5
## 5         0.2          1.4
## 6         0.4          1.7

We still skipped the quotes, and the NSE stuff is safely isolated from the rest of the system.

wrapr now incorporates bquote() based quasiquotation in a few of its interfaces.

Quasiquotation was introduced into R by Thomas Lumley in 2003, and allows users to signal they want to turn off quotation for portions of their code. The user indicates they do not wish a portion of their code to be quoted (but instead want it evaluated for its value) by surrounding that portion with the function-notation “.()”.

An example of this is given here.

OTHER_SYMBOL <- "Petal.Length"
  
qc(Petal.Width, OTHER_SYMBOL)
## [1] "Petal.Width"  "OTHER_SYMBOL"
qc(Petal.Width, .(OTHER_SYMBOL))
## [1] "Petal.Width"  "Petal.Length"

This should be familiar to data.table users, as data.table has supported related notations for quite some time.

Also, qc() is designed to have a simple “mutually recursive” relationship with c() (i.e. they call each other when they see one another). This means c() is also a quasiquotation escape-notation for qc():

qc(Petal.Width, c(OTHER_SYMBOL))
## [1] "Petal.Width"  "Petal.Length"

This escape notation arises as a natural consequence of a design of qc() that calls c() instead of quoting it (i.e. delegates c()-expressions to c()). The bquote()-.() should be the preferred notation for regularity, and to match any other bquote() quasiquotation interface (such as qchar_frame() or even a variation of dplyr).

qc() also takes some trouble to work with named vectors:

qc(a = b)
##   a 
## "b"

And we can even re-map left-hand sizes of (or names) if we use the alternate “:=” assignment notation.

LEFT_NAME = "a"

qc(.(LEFT_NAME) := b)
##   a 
## "b"

Notice syntactically qc() fills a general niche much like the specific function ggplot2::aes() fits in the ggplot2 package.

The qc() notation is very powerful and clearly indicates where which quoting rules are in effect when. We strongly suggest users look to it for code-capturing and package developers recommend it to their users.