You probably already know that R is not the only language you can use in an Rmarkdown file. For example, if you had Python and the R
reticulate package installed, you could write
But that probably doesn’t work because getting Python connected to R this way requires installing additional packages.
Besides that problem, this approach also runs into limitations if you have multiple R projects, each with its own special requirements for outside software. For example, if you have one project that requires Python v3.5 with package X installed, and another that requires Python v3.4 with package Y installed, you will very quickly find yourself managing a rat’s nest of dependencies.
Docker is a tool that helps with this problem. In short, Docker lets you create a separate environment for each of your projects, with different software installed in each environment. The environment are isolated from each other, so your different projects don’t collide with each other.
This is similar to virtualization done by Virtual Box, VMWare Fusion, or other similar software. However, Docker is structured in a way that can be easily integrated with Rmarkdown, making it a much better tool for integrating other software into your Rmarkdown documents.
To begin with, you need to install Docker from the official site. After you install it, make sure that it is working properly by running the following in your terminal:
The output you see shows Docker downloading a pre-made copy of Python 3 (regardless of which operating system you are on and which version of Python you already have installed outside of Docker) and then running some Python code in it to print “Python in Docker”.
If you repeat the same command for a second time, Docker will use the already-downloaded Python and just run your code:
Docker’s name for a packaged software environment is Docker image. For example, the thing that got downloaded above when you ran Python in Docker was the Python 3 image. Images have tags of the form of
software:version — for example,
python:3 is the tag that we used above to tell Docker to download Python version 3.
All the images are isolated from each other — for example, Python version or Python packages available in one image have no bearing on those installed in another image.
Running a docker image creates new session called a Docker container. Just as you can have multiple RStudio sessions running at the same time on your computer, you can run multiple Docker containers at the same time (from the same Docker image, or from different images).
All the containers are also isolated from each other — for example, files created by one container are (by default) not visible to other containers.
In other words, you can think of a docker image as a pre-built collection of software, and a docker container as an isolated session in which you run that collection of software.
The actual thing we are interested in here is using Docker inside Rmarkdown. To do this, you first have to load the
Doing this enables
docker as an option inside Rmarkdown. Let’s run some Python code in Rmarkdown using docker:
#> Python in Docker in Rmarkdown, version 3.8.1 (default, Jan 3 2020, 22:44:00) #> [GCC 8.3.0]
What if we want to use Python v2 instead? Easy:
#> Python in Docker in Rmarkdown, version 2.7.17 (default, Dec 28 2019, 07:48:40) #> [GCC 8.3.0]
If you’ve ever tried to install multiple versions of Python on one computer, you can appreciate how unexpectedly simple this was. (If you haven’t, lucky you.)
Under the hood,
sys::exec_wait() to run
docker run --interactive IMAGE, and passes the code chunk on the standard input. The standard output is then returned in Rmarkdown output.
Normally, Docker containers are isolated from each other and from the rest of your computer. As a result, they don’t have access to files on your computer. For example, this is the list of files seen by Python in Docker:
#> ['lib', 'media', 'home', 'sbin', 'sys', 'var', 'root', 'run', 'boot', 'etc', 'opt', 'tmp', 'proc', 'srv', 'usr', 'bin', 'dev', 'lib64', 'mnt', '.dockerenv']
These files aren’t anywhere (obvious) on your computer — they are inside the Python 3 Docker image.
If you want your Rmarkdown Docker blocks to see the normal files on your computer, use the
share.files=TRUE block option to share your RStudio working directory with the Docker image. (On Windows, you first have to share your drives with Docker in Docker settings.) For example:
That list of files is what’s on my computer; yours would probably be different.
Under the hood,
share.files adds a bind-mount of the current working directory to
/workdir on the Docker container, and sets
/workdir as the working directory of the container.
Whereas some Docker images (such as
python) contain a single piece of software, some others contain multiple tools, and therefore require you to specify which you want to run. This is common for images that contain an entire operating system (such as the
ubuntu image for Ubuntu Linux), or images that contain a suite of related tools. For example, if you want to have access to all the tools built into Ubuntu, you would want to use the
ubuntu image; if you want to run a particular Rmarkdown block through
bash (which is one of the tools included in Ubuntu), you can use the
command block option:
#> Linux aa3c7748fbd1 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Let’s take a moment to pause here and appreciate what just happened there: your computer — regardless of what operating system is installed on it — downloaded a copy of Ubuntu Linux, started it inside an isolated session, fed a chunk of your Rmarkdown file into a Linux command inside that session, and fed the output of that command into your Rmarkdown file.
You will probably find yourself frequently using the same Docker images and commands over and over again. For example, you may have multiple Rmarkdown blocks that you want to run in Python, without having to repeat the Python Docker options every time.
To accomplish this, use
docknitr::docker_alias. For example, run this to configure
python_docker as shorthand for
docker engine='docker', image="python:3", share.files=TRUE:
Your shorthand has to be recognizable by knitr; by default, this means that it must can’t contain anything other than letters, numbers, and underscores.
Now you can use
python_docker as its own Rmarkdown chunk type:
import os print(os.listdir()) #> ['pyseer-tutorial']
That covers the basics of getting up-and-running with Docker in Rmarkdown. This much will be useful to you if you want to run code through existing software environments, such as the plain install of Python 3. The next level of Docker power is making your custom software environments. When you are ready for that, check out the custom images vignette.