Using BioContainers on a multi-user server

2017-04-18 · 565 words · 3 minute read

biocontainers · docker · rkt · rktrunner

Bioinformatics users are a demanding bunch. They need to be able to run a large variety of applications on powerful servers. These servers may be administered by a small team of sysadmins, who quickly become a bottleneck in packaging and installing applications for the local users. Using application packaging such as RPM can make the task manageable, but there is still a per-application overhead on the sysadmin, and with a typical site using perhaps in excess of 300 applications, this becomes unmanageable.

Enter BioContainers, an “open source and community-driven framework which provides system-agnostic executable environments for bioinformatics software”. Over 2000 bioinformatics applications packaged as containers, and ready to run.

There are two ways to run BioContainers: Docker or rkt. But pretty quickly, it becomes apparent that as provided, these are only for privileged users.

Docker first. To run Docker, the user must be in the docker group on the system. Membership of this group enables the user to run any docker command, with any parameters. Including downloading any container image, mounting any host directory into the container, and running it as any user, including root. Say goodbye to any idea of filesystem integrity on a multi-user server.

Rkt is similar. Images may be fetched by unprivileged users, but to actually run an image, it is necessary to be root.

After raising an issue about this on the BioContainers Github site, it emerged that the Go-Docker utility had been developed to address this issue for users of Docker. Go-Docker appears to be fairly featureful, and manages job submission for job scheduling systems such as Sun Grid Engine, Torque, etc. We considered using this, but decided the overhead of assimilating such a tool would be considerable.

So we decided to write rktrunner a “rkt run front-end for unprivileged users”, that is, a simple wrapper for rkt run.

Rktrunner

Rktrunner provides the rkt-run command, which is installed setuid-root, and wraps the underlying rkt program, carefully managing its command-line parameters, as defined by the local sysadmin in a site-wide config file.

A simple configuration of rktrunner enables any user to run any container as themselves, with their home directory mounted in the container. The config file for that looks like this:

rkt = "/usr/bin/rkt"

[environment]
HOME = "/home/{{.Username}}"

[options.common]
general = ["--insecure-options=image"]
run = ["--net=host"]
image = [
    "--user={{.Uid}}",
    "--group={{.Gid}}",
]

[volume.home]
volume = "kind=host,source={{.HomeDir}}"
mount = "target=/home/{{.Username}}"

With this config file, the following rkt-run command by an unprivleged user:

$ rkt-run -e bwa quay.io/biocontainers/bwa:0.7.15--0

results in the following invocation of rkt:

/usr/bin/rkt --insecure-options=image run --set-env-file /tmp/rktrunner32678/env --net=host \
--volume home,kind=host,source=/home/guestsi quay.io/biocontainers/bwa:0.7.15--0 \
--mount volume=home,target=/home/guestsi --user=511 --group=511 --exec bwa --

Aliases

While an improvement over the full rkt command, that rkt-run invocation is still fairly cumbersome. So rktrunner allows the sysadmin to define aliases, for example:

[alias.bwa_]
image = "quay.io/biocontainers/bwa:0.7.15--0"
exec = ["bwa"]

This means that the same rkt command may be run by typing:

$ rkt-run bwa

Note that aliases provide a simple way for the sysadmin to provide easily accessible default versions of standard programs like bwa.

Wrapper scripts

However, a project user may not want to be tied to what the sysadmin has provided. They could run their own version of bwa, say, via a wrapper script or shell function. Here’s a simple wrapper script, which would be saved in a file called bwa somewhere on the project path:

#!/bin/sh
exec rkt-run -e bwa quay.io/biocontainers/bwa:0.7.15--0 "$@"