crew
lets you write custom launchers for different types of workers that connect over the local network. This flexibility can extend crew
to platforms like SLURM, AWS Batch, and Kubernetes. This vignette demonstrates how. It assumes prior familiarity with R6
classes and the computing platform of your plugin.
To create your own launcher plugin, write an R6
subclass of crew_class_launcher
with a launch_worker()
method analogous the one in the callr
launcher. launch_worker()
must accept the same arguments as the callr
launch_worker()
method, generate a call to crew_worker()
, and then submit a new job or process to run that call.
We recommend you implement an optional terminate_worker()
method. Although mirai
has its own way of terminating workers, it only works if the worker already connected, and it cannot reach workers that fail to connect and hang in a crashed state. An optional terminate_worker()
method in your crew
launcher plugin is extra assurance that these workers will exit.
If you implement terminate_worker()
, it must accept a handle that identifies the worker, and this handle must be the return value of the previous call to launch_worker()
. A handle can be any kind of R object: a callr::r_bg()
handle, a process ID, a job name, etc.
token
and name
arguments of the launch_worker()
method can help construct informative job names. token
is a long text string that uniquely identifies the instance of the new worker, and name
is the name of the current launcher object.mirai
dispatcher over the local network and start accepting tasks.terminate_worker()
method, each worker termination may also happen asynchronously. In rare cases when you do not trust the platform to terminate the worker on the first request, you can use crew_wait()
to wait for the job to exit, but this may reduce efficiency.callr
launcher is a helpful reference.The following is a custom custom launcher class whose workers are local R processes.1
custom_launcher_class <- R6::R6Class(
classname = "custom_launcher_class",
inherit = crew::crew_class_launcher,
public = list(
launch_worker = function(socket, host, port, token, name) {
call <- self$call(socket, host, port, token, name)
processx::process$new(command = "R", args = c("-e", call))
},
terminate_worker = function(handle) {
handle$kill()
}
)
)
Above launch_worker()
begins by creating a call to crew_worker()
. Later on, this call will run inside the worker process, connect back to crew
and mirai
over the local network, and accept the tasks you push to the controller.
The call
object above a text string with R code. To see what it looks like, you can try the method in a callr
launcher object.
launcher <- crew::crew_launcher_callr()
launcher$call(
socket = "ws://127.0.0.1:5000",
host = "127.0.0.1",
port = "5711",
token = "my_token",
name = "my_name"
)
#> [1] "crew::crew_worker(token = \"my_token\", host = \"127.0.0.1\", port = \"5711\", settings = list(url = \"ws://127.0.0.1:5000\", asyncdial = TRUE, maxtasks = Inf, idletime = Inf, walltime = Inf, timerstart = 0L, exitlinger = 100, cleanup = FALSE))"
To create an external process that runs a worker, our custom launcher creates a new processx
process to start R and run the crew_worker()
call.
The return value is a handle that terminate_worker()
will use to terminate the process later on.
It is useful to have a helper function that creates controllers with your custom launcher. It should:
crew_controller_callr()
.crew_router()
.new()
method of your custom launcher class.crew_controller()
.validate()
method of the controller.Feel free to borrow from the crew_controller_callr()
source code. For packages, you can use the @inheritParams
roxygen2
tag to inherit the documentation of all the arguments instead of writing it by hand. You may want to adjust the default arguments based on the specifics of your platform, especially seconds_launch
if workers take a long time to launch.
#' @title Create a controller with the custom launcher.
#' @export
#' @description Create an `R6` object to submit tasks and
#' launch workers.
#' @inheritParams crew::crew_controller_callr
crew_controller_custom <- function(
name = "custom controller name",
workers = 1L,
host = NULL,
port = NULL,
seconds_launch = 30,
seconds_interval = 0.001,
seconds_timeout = 5,
seconds_idle = Inf,
seconds_wall = Inf,
seconds_exit = 1,
tasks_max = Inf,
tasks_timers = 0L,
async_dial = TRUE,
cleanup = FALSE,
auto_scale = "demand"
) {
router <- crew::crew_router(
name = name,
workers = workers,
host = host,
port = port,
seconds_interval = seconds_interval,
seconds_timeout = seconds_timeout
)
launcher <- custom_launcher_class$new(
name = name,
seconds_launch = seconds_launch,
seconds_interval = seconds_interval,
seconds_timeout = seconds_timeout,
seconds_idle = seconds_idle,
seconds_wall = seconds_wall,
seconds_exit = seconds_exit,
tasks_max = tasks_max,
tasks_timers = tasks_timers,
async_dial = async_dial,
cleanup = cleanup
)
controller <- crew::crew_controller(
router = router,
launcher = launcher,
auto_scale = auto_scale
)
controller$validate()
controller
}
Before you begin testing, please begin monitoring local processes and remote jobs on your platform. In the case of the above crew
launcher which only creates local processes, it is sufficient to start htop
and filter for R processes, or launch a new R session to monitor the process table from ps::ps()
. However, for more ambitious launchers that submit workers to e.g. AWS Batch, you may need to open the CloudWatch dashboard, then view the AWS billing dashboard after testing.
When you are ready to begin testing, try out the example in the README, but use your your custom controller helper instead of crew_controller_callr()
.
Next, start a new crew
session.
Then, create and start a controller. You may wish to monitor local processes on your computer to make sure the mirai
dispatcher starts.
Try pushing a task that gets the local IP address and process ID of the worker instance.
controller$push(
name = "get worker IP address and process ID",
command = paste(getip::getip(type = "local"), ps::ps_pid())
)
Wait for the task to complete and look at the result.
Please use the result to verify that the task really ran on a worker as intended. The process ID above should agree with the one from the handle. In addition, if the worker is running on a different computer, the worker IP address should be different than the local IP address. Since our custom launcher creates local processes, the IP addresses are the same in this case, but they should be different for a SLURM or AWS Batch launcher.
getip::getip(type = "local")
#> "192.168.0.2"
controller$launcher$workers$handle[[1]]$get_pid()
#> [1] 27336
If you did not set any timeouts or task limits, the worker that ran the task should still be running and connected. The other worker had no tasks, so it did not need to start an instance.
controller$summary(columns = starts_with("worker"))
#> # A tibble: 2 × 5
#> worker_socket worker_connected worker_busy worker_launches worker_instances
#> <chr> <lgl> <lgl> <int> <int>
#> 1 ws://10.0.0.9:58805/1 TRUE FALSE 1 1
#> 2 ws://10.0.0.9:58805/2 FALSE FALSE 0 0
When you are done, terminate the controller. This terminates the mirai
dispatcher process and the crew
workers.
Finally, use the process monitoring interface of your computing platform or operating system to verify that all mirai
dispatchers and crew
workers are terminated.
If the informal testing succeeded, we recommend you scale up testing to more ambitious scenarios. As one example, you can test that your workers can auto-scale and quickly churn through a large number of tasks.
library(crew)
crew_session_start()
controller <- crew_controller_custom(
seconds_idle = 2L,
workers = 2L
)
controller$start()
# Push 100 tasks
for (index in seq_len(100L)) {
name <- paste0("task_", index)
controller$push(name = name, command = index, data = list(index = index))
message(paste("push", name))
}
# Wait for the tasks to complete.
controller$wait()
# Wait for the workers to idle out and exit on their own.
crew_wait(
~all(controller$summary()$worker_connected == FALSE),
seconds_interval = 1,
seconds_timeout = 60
)
# Do the same for 100 more tasks.
for (index in (seq_len(100L) + 100L)) {
name <- paste0("task_", index)
controller$push(name = name, command = index, data = list(index = index))
message(paste("push", name))
}
controller$wait()
crew_wait(
~all(controller$summary()$worker_connected == FALSE),
seconds_interval = 1,
seconds_timeout = 60
)
# Collect the results.
results <- NULL
while (!is.null(out <- controller$pop(scale = FALSE))) {
if (!is.null(out)) {
results <- dplyr::bind_rows(results, out)
}
}
# Check the results
all(sort(unlist(results$result)) == seq_len(200L))
#> [1] TRUE
length(unique(results$socket_session))
#> [1] 4
# View worker and task summaries.
View(controller$summary())
# Terminate the controller.
controller$terminate()
# Now outside crew, verify that the mirai dispatcher
# and crew workers successfully terminated.
See tests/launchers/test-launcher-system2.R
for an example launcher without a termiante_worker()
method.↩︎