Execute and Control System Processes
Tools to run system processes in the background, read their standard output and error and kill them.
processx can poll the standard output and error of a single process, or multiple processes, using the operating system’s polling and waiting facilities, with a timeout.
Install the stable version from CRAN:
install.packages("processx")
If you need the development version, install it from GitHub:
::pak("r-lib/processx") pak
library(processx)
Note: the following external commands are usually present in macOS and Linux systems, but not necessarily on Windows. We will also use the
px
command line tool (px.exe
on Windows), that is a very simple program that can produce output tostdout
andstderr
, with the specified timings.
<- paste0(
px system.file(package = "processx", "bin", "px"),
system.file(package = "processx", "bin", .Platform$r_arch, "px.exe")
) px
#> [1] "/Users/gaborcsardi/Library/R/arm64/4.2/library/processx/bin/px"
The run()
function runs an external command. It requires
a single command, and a character vector of arguments. You don’t need to
quote the command or the arguments, as they are passed directly to the
operating system, without an intermediate shell.
run("echo", "Hello R!")
#> $status
#> [1] 0
#>
#> $stdout
#> [1] "Hello R!\n"
#>
#> $stderr
#> [1] ""
#>
#> $timeout
#> [1] FALSE
Short summary of the px
binary we are using extensively
below:
<- run(px, "--help", echo = TRUE) result
#> Usage: px [command arg] [command arg] ...
#>
#> Commands:
#> sleep <seconds> -- sleep for a number os seconds
#> out <string> -- print string to stdout
#> err <string> -- print string to stderr
#> outln <string> -- print string to stdout, add newline
#> errln <string> -- print string to stderr, add newline
#> errflush -- flush stderr stream
#> cat <filename> -- print file to stdout
#> return <exitcode> -- return with exitcode
#> writefile <path> <string> -- write to file
#> write <fd> <string> -- write to file descriptor
#> echo <fd1> <fd2> <nbytes> -- echo from fd to another fd
#> getenv <var> -- environment variable to stdout
Note: From version 3.0.1, processx does not let you specify a full shell command line, as this involves starting a grandchild process from the child process, and it is difficult to clean up the grandchild process when the child process is killed. The user can still start a shell (
sh
orcmd.exe
) directly of course, and then proper cleanup is the user’s responsibility.
By default run()
throws an error if the process exits
with a non-zero status code. To avoid this, specify
error_on_status = FALSE
:
run(px, c("out", "oh no!", "return", "2"), error_on_status = FALSE)
#> $status
#> [1] 2
#>
#> $stdout
#> [1] "oh no!"
#>
#> $stderr
#> [1] ""
#>
#> $timeout
#> [1] FALSE
To show the output of the process on the screen, use the
echo
argument. Note that the order of stdout
and stderr
lines may be incorrect, because they are coming
from two different connections.
<- run(px,
result c("outln", "out", "errln", "err", "outln", "out again"),
echo = TRUE)
#> out
#> out again
#> err
If you have a terminal that support ANSI colors, then the standard error output is shown in red.
The standard output and error are still included in the result of the
run()
call:
result
#> $status
#> [1] 0
#>
#> $stdout
#> [1] "out\nout again\n"
#>
#> $stderr
#> [1] "err\n"
#>
#> $timeout
#> [1] FALSE
Note that run()
is different from system()
,
and it always shows the output of the process on R’s proper standard
output, instead of writing to the terminal directly. This means for
example that you can capture the output with
capture.output()
or use sink()
, etc.:
<- capture.output(r1 <- system("ls"))
out1 <- capture.output(r2 <- run("ls", echo = TRUE)) out2
out1
#> character(0)
out2
#> [1] "CODE_OF_CONDUCT.md" "DESCRIPTION" "LICENSE"
#> [4] "LICENSE.md" "Makefile" "NAMESPACE"
#> [7] "NEWS.md" "R" "README.Rmd"
#> [10] "README.md" "_pkgdown.yml" "codecov.yml"
#> [13] "inst" "man" "processx.Rproj"
#> [16] "src" "tests"
The spinner
option of run()
puts a calming
spinner to the terminal while the background program is running. The
spinner is always shown in the first character of the last line, so you
can make it work nicely with the regular output of the background
process if you like. E.g. try this in your R terminal:
result <- run(px,
c("out", " foo",
"sleep", "1",
"out", "\r bar",
"sleep", "1",
"out", "\rX foobar\n"),
echo = TRUE, spinner = TRUE)
run()
can call an R function for each line of the
standard output or error of the process, just supply the
stdout_line_callback
or the
stderr_line_callback
arguments. The callback functions take
two arguments, the first one is a character scalar, the output line. The
second one is the process
object that represents the
background process. (See more below about process
objects.)
You can manipulate this object in the callback, if you want. For example
you can kill it in response to an error or some text on the standard
output:
<- function(line, proc) {
cb cat("Got:", line, "\n")
if (line == "done") proc$kill()
}<- run(px,
result c("outln", "this", "outln", "that", "outln", "done",
"outln", "still here", "sleep", "10", "outln", "dead by now"),
stdout_line_callback = cb,
error_on_status = FALSE,
)
#> Got: this
#> Got: that
#> Got: done
#> Got: still here
result
#> $status
#> [1] -9
#>
#> $stdout
#> [1] "this\nthat\ndone\nstill here\n"
#>
#> $stderr
#> [1] ""
#>
#> $timeout
#> [1] FALSE
Keep in mind, that while the R callback is running, the background
process is not stopped, it is also running. In the previous example,
whether still here
is printed or not depends on the
scheduling of the R process and the background process by the OS.
Typically, it is printed, because the R callback takes a while to
run.
In addition to the line-oriented callbacks, the
stdout_callback
and stderr_callback
arguments
can specify callback functions that are called with output chunks
instead of single lines. A chunk may contain multiple lines (separated
by \n
or \r\n
), or even incomplete lines.
If you need better control over possibly multiple background
processes, then you can use the R6 process
class
directly.
To start a new background process, create a new instance of the
process
class.
<- process$new("sleep", "20") p
A process can be killed via the kill()
method.
$is_alive() p
#> [1] TRUE
$kill() p
#> [1] TRUE
$is_alive() p
#> [1] FALSE
Note that processes are finalized (and killed) automatically if the
corresponding process
object goes out of scope, as soon as
the object is garbage collected by R:
<- process$new("sleep", "20")
p rm(p)
invisible(gc())
Here, the direct call to the garbage collector kills the
sleep
process as well. See the cleanup
option
if you want to avoid this behavior.
By default the standard output and error of the processes are
ignored. You can set the stdout
and stderr
constructor arguments to a file name, and then they are redirected
there, or to "|"
, and then processx creates connections to
them. (Note that starting from processx 3.0.0 these connections are not
regular R connections, because the public R connection API was
retroactively removed from R.)
The read_output_lines()
and
read_error_lines()
methods can be used to read complete
lines from the standard output or error connections. They work similarly
to the readLines()
base R function.
Note, that the connections have a buffer, which can fill up, if R does not read out the output, and then the process will stop, until R reads the connection and the buffer is freed.
Always make sure that you read out the standard output and/or error of the pipes, otherwise the background process will stop running!
If you don’t need the standard output or error any more, you can also close it, like this:
close(p$get_output_connection())
close(p$get_error_connection())
Note that the connections used for reading the output and error streams are non-blocking, so the read functions will return immediately, even if there is no text to read from them. If you want to make sure that there is data available to read, you need to poll, see below.
<- process$new(px,
p c("sleep", "1", "outln", "foo", "errln", "bar", "outln", "foobar"),
stdout = "|", stderr = "|")
$read_output_lines() p
#> character(0)
$read_error_lines() p
#> character(0)
The standard R way to query the end of the stream for a non-blocking
connection, is to use the isIncomplete()
function.
After a read attempt, this function returns FALSE
if the connection has surely no more data. (If the read attempt returns
no data, but isIncomplete()
returns TRUE
, then
the connection might deliver more data in the future.
The is_incomplete_output()
and
is_incomplete_error()
functions work similarly for
process
objects.
The poll_io()
method waits for data on the standard
output and/or error of a process. It will return if any of the following
events happen:
For example the following code waits about a second for output.
<- process$new(px, c("sleep", "1", "outln", "kuku"), stdout = "|")
p
## No output yet
$read_output_lines() p
#> character(0)
## Wait at most 5 sec
$poll_io(5000) p
#> output error process
#> "ready" "nopipe" "nopipe"
## There is output now
$read_output_lines() p
#> [1] "kuku"
If you need to manage multiple background processes, and need to wait
for output from all of them, processx defines a poll()
function that does just that. It is similar to the
poll_io()
method, but it takes multiple process objects,
and returns as soon as one of them have data on standard output or
error, or a timeout expires. Here is an example:
<- process$new(px, c("sleep", "1", "outln", "output"), stdout = "|")
p1 <- process$new(px, c("sleep", "2", "errln", "error"), stderr = "|")
p2
## After 100ms no output yet
poll(list(p1 = p1, p2 = p2), 100)
#> $p1
#> output error process
#> "timeout" "nopipe" "nopipe"
#>
#> $p2
#> output error process
#> "nopipe" "timeout" "nopipe"
## But now we surely have something
poll(list(p1 = p1, p2 = p2), 1000)
#> $p1
#> output error process
#> "ready" "nopipe" "nopipe"
#>
#> $p2
#> output error process
#> "nopipe" "silent" "nopipe"
$read_output_lines() p1
#> [1] "output"
## Done with p1
close(p1$get_output_connection())
#> NULL
## The second process should have data on stderr soonish
poll(list(p1 = p1, p2 = p2), 5000)
#> $p1
#> output error process
#> "closed" "nopipe" "nopipe"
#>
#> $p2
#> output error process
#> "nopipe" "ready" "nopipe"
$read_error_lines() p2
#> [1] "error"
As seen before, is_alive()
checks if a process is
running. The wait()
method can be used to wait until it has
finished (or a specified timeout expires).. E.g. in the following code
wait()
needs to wait about 2 seconds for the
sleep
px
command to finish.
<- process$new(px, c("sleep", "2"))
p $is_alive() p
#> [1] TRUE
Sys.time()
#> [1] "2022-06-10 13:57:49 CEST"
$wait()
pSys.time()
#> [1] "2022-06-10 13:57:51 CEST"
It is safe to call wait()
multiple times:
$wait() # already finished! p
After a process has finished, its exit status can be queried via the
get_exit_status()
method. If the process is still running,
then this method returns NULL
.
<- process$new(px, c("sleep", "2"))
p $get_exit_status() p
#> NULL
$wait()
p$get_exit_status() p
#> [1] 0
In general, mixing processx (via callr or not) and parallel works
fine. If you use parallel’s ‘fork’ clusters, e.g. via
parallel::mcparallel()
, then you might see two issues. One
is that processx will not be able to determine the exit status of some
processx processes. This is because the status is read out by parallel,
and processx will set it to NA
. The other one is that
parallel might complain that it could not clean up some subprocesses.
This is not an error, and it is harmless, but it does hold up R for
about 10 seconds, before parallel gives up. To work around this, you can
set the PROCESSX_NOTIFY_OLD_SIGCHLD
environment variable to
a non-empty value, before you load processx. This behavior might be the
default in the future.
Errors are typically signalled via non-zero exits statuses. The processx constructor fails if the external program cannot be started, but it does not deal with errors that happen after the program has successfully started running.
<- process$new("nonexistant-command-for-sure") p
#> Error in c("process_initialize(self, private, command, args, stdin, stdout, ", : ! Native call to `processx_exec` failed
#> Caused by error in `chain_call(c_processx_exec, command, c(command, args), pty, pty_options, …` at initialize.R:138:3:
#> ! cannot start processx process 'nonexistant-command-for-sure' (system error 2, No such file or directory) @unix/processx.c:613 (processx_exec)
<- process$new(px, c("sleep", "1", "command-does-not-exist"))
p2 $wait()
p2$get_exit_status() p2
#> [1] 5
The ps
package
can query, list, manipulate all system processes (not just
subprocesses), and processx uses it internally for some of its
functionality. You can also convert a processx::process
object to a ps::ps_handle
with the
as_ps_handle()
method.
The callr
package uses processx to start another R process, and run R code in
it, in the foreground or background.
Please note that the processx project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
MIT © Ascent Digital Services, RStudio, Gábor Csárdi