Piping output between containerized applications

2017-05-05 · 402 words · 2 minute read

biocontainers · rkt · rktrunner

Commonly, BioInformatics pipelines are built by piping output between applications. If each application is itself a BioContainer, and we are running them using rkt, this means we need to be able to pipe output between rkt pods. How can we do this?

Rkt supports running multiple applications within a pod, so has no direct concept of pod standard input and standard output. By default, application output goes to the systemd journal within the pod, which appears on the standard output of the pod itself. It may be tempting to post-process this output to strip the journald prefixes and pipe it, but this is a fragile solution at best, and is not performant when it comes to BioInformatics applications which may be generating hundreds of gigabytes of output. Journald very quickly becomes a bottleneck.

Recent versions of rkt support an experimental feature to access application standard input/output using rkt attach. If a pod has been started using the experimental stream modes, a separate process may use rkt attach to connect to the standard input and/or output of an application running within the pod.

An early version of rktrunner made use of rkt attach to access application standard input and output within a running pod. However, management of the extra process required with this approach threatened to turn the simple rktrunner wrapper into an overly complex piece of software, and the experimental nature of the rkt attach feature soon manifested itself in various glitches. A simpler and more robust solution was required.

Rkt has a modular architecture. It is possible to run an alternate so-called fly stage1, which supports only single-application ACIs, and with a lighter isolation than fully namespaced containers, simply using a chroot. This simplification is in fact a perfect match for our use-case of wanting to run single-application BioContainers in a BioInformatics pipeline, especially since application standard input/output trivially becomes pod standard input/output.

In order to switch rktrunner to use the fly stage1, the following configuration file snippet is required (being careful to match the version of stage1-fly with the rkt version installed on the system.)

[options.common]
image = [
    "--stage1-name=coreos.com/rkt/stage1-fly:1.27.0",
]

Alternatively, it is possible to use the fly stage1 installed locally, perhaps as part of the official rkt RPM, as follows.

[options.common]
image = [
    "--stage1-path=/usr/lib/rkt/stage1-images/stage1-fly.aci",
]

With this in place, piping output between BioContainers for use in a larger BioInformatics pipeline is both straightforward and performant. Problem solved.