Constant propagation is the process of substituting the values of known constants in expressions. Constant propagation eliminates cases in which values are copied from one location or variable to another, in order to simply assign their value to another variable.
For example, consider the following code:
x <- 14
y <- 7 - x / 2
z <- y * (28 / x + 2) - x
Here, x
is assigned a constant, and thus, can be propagated (three times). Propagating yields:
x <- 14
y <- 7 - 14 / 2
z <- y * (28 / 14 + 2) - 14
Constant propagation enables the code to assign static values, which is faster than looking up and copying the value of a variable, and also saves time by eliminating assigning a value to a variable that is itself subsequently used only to propagate that value throughout the code. In some cases, copy propagation itself may not provide direct optimizations, but simply facilitates other transformations, such as constant folding, code motion, and dead code elimination.
A simple example would be a code that converts the unit of many temporary samples, from hours to miliseconds miliseconds <- 1000 * 60 * 60 * hours
.
code <- paste(
"n <- 1000",
"hours_vector <- runif(n, 0, 24)",
"ms_vector <- numeric(n)",
"hs_to_mins <- 60",
"mins_to_secs <- 60",
"secs_to_ms <- 1000",
"# of course it would be much efficient to do vectorized operations xP",
"for (i in 1:n) {",
" ms_vector[i] <- secs_to_ms * mins_to_secs * hs_to_mins * hours_vector[i]",
"}",
sep = "\n"
)
cat(code)
## n <- 1000
## hours_vector <- runif(n, 0, 24)
## ms_vector <- numeric(n)
## hs_to_mins <- 60
## mins_to_secs <- 60
## secs_to_ms <- 1000
## # of course it would be much efficient to do vectorized operations xP
## for (i in 1:n) {
## ms_vector[i] <- secs_to_ms * mins_to_secs * hs_to_mins * hours_vector[i]
## }
Then, the automatically optimized code would be:
opt_code <- opt_constant_propagation(list(code))
cat(opt_code$codes[[1]])
## n <- 1000
## hours_vector <- runif(n, 0, 24)
## ms_vector <- numeric(n)
## hs_to_mins <- 60
## mins_to_secs <- 60
## secs_to_ms <- 1000
## # of course it would be much efficient to do vectorized operations xP
## for (i in 1:n) {
## ms_vector[i] <- 1000 * 60 * 60 * hours_vector[i]
## }
And if we measure the execution time of each one, and the speed-up:
bmark_res <- microbenchmark({
eval(parse(text = code))
}, {
eval(parse(text = opt_code))
})
autoplot(bmark_res)
speed_up(bmark_res)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## Expr_2 29.33842 29.0092 27.40792 27.9919 28.49016 29.41789
The opt_constant_propagation
optimizer analyzes the code from top to bottom. As it goes through the code, it performs two tasks:
Saves, in the values
vector, those variables that are assigned a constant value. I.e., if x <- -3
, then values$x <- -3
.
Each variable that is assigned an expression that includes a variable present in values
is given the corresponding constant value. I.e., y <- 7 * x + z
is transformed to y <- 7 * -3 + z
.
Depending on what expression it finds, it applies one of the following criteria:
VAR <- CONST
If it is an assignment of a constant value: then it keeps the same expression, and saves VAR = CONST
in values
.
This criteria also includes, multi-assign ( VAR <- VAR <- CONST
).
VAR * CONST + VAR
If it is only operators, variables, constants, and precedence operators: then it will check if any of the variables is stored in values
vector, and would replace them in the expression. It will return the modified expression.
FUN({FUN_PARAMS})
If it is a function call, and opt_constant_propagation
's in_fun_call
parameter is set to TRUE
: then it will try to constant propagate in {FUN_PARAMS}
. Consider the case that the function call is seq_len(x+30)
, it could be replaced by seq_len(-3+30)
.
Then, it empties the values
vector ( values <- c()
). It should be noted that in R, calling a function can modify the current environment, being the simplest example rm(list = ls())
, or assign("x", 4)
.
LOOP (COND) { BODY }
If it is a loop ( repeat
, while
, or for
): it will get which variables are being assigned in the loop and remove them from values
, and then propagate in COND
and BODY
.
In-loop assigned variables are removed from values
as, if not, it would try to propagate them, and these variables might change in next execution of BODY
. For instance:
x <- 3
y <- 1 + 1
while (y < 5) {
y <- x
x <- x + 1
}
Would try to propagate to:
x <- 3
y <- 1 + 1
while (y < 5) {
y <- 3
x <- 3 + 1
}
which is not equivalent.
IF (COND) { BODY } ELSE { BODY }
If it is an if
: it will propagate values
in COND
and BODY
, then it will get which variables are being assigned in the if/else BODY
and remove them from values
.
In-loop assigned variables are removed from values
, as it is possible that the BODY
is never executed (FALSE
condition), and thus, these variables would not be updated or assigned.
VAR <- EXPR
If it is an assignment (of a non constant value): it will keep VAR <-
and try to propagate on EXPR
. Moreover, it will remove VAR
from values
as it is assigned a non-constant value.
FUN ({FUN_PARAMS}) { BODY }
If it is a function definition: it will propagate on the BODY
but with a new empty values
vector (the global values
vector would not be changed). Note that in R, a function definition will have another environment, and thus it should not use the current values
.
In any other case, it should keep propagating on sub-expressions.
Recognize functions that modify the environment?
When opt_constant_propagation
finds a function call, it deletes the previously found constant-assigned values. This is mainly done to avoid having
x <- 1
assign("x", 3)
y <- x
Propagated to
x <- 1
assign("x", 3)
y <- 1
In R we can list which base
functions modify the environment, for example, assign
, rm
, etc. However, this would not be a solution, as we can define new functions that wrap the base
ones, for instance:
rm_all <- function() {
env <- .GlobalEnv
rm(list = ls(envir = env), envir = env)
}
A possible solution would be to let the user identify which functions edit the environment.
roxygen2
way:# o @edits_env rm_all
...
or
rm_all() # o @edits_env
opt_constant_propagation(list(code), env_edit = c("rm_all", ...))