Nomad
Allocation Filesystems
Nomad creates a working directory for each allocation on a client. This
directory can be found in the Nomad data_dir
at
./alloc/«alloc_id»
. The allocation working directory is where Nomad
creates task directories and directories shared between tasks, write logs for
tasks, and downloads artifacts or templates.
An allocation with two tasks (named task1
and task2
) will have an
allocation directory like the one below.
.
├── alloc
│ ├── data
│ ├── logs
│ │ ├── task1.stderr.0
│ │ ├── task1.stdout.0
│ │ ├── task2.stderr.0
│ │ └── task2.stdout.0
│ └── tmp
├── task1
│ ├── local
│ ├── private
│ ├── secrets
│ └── tmp
└── task2
├── local
├── private
├── secrets
└── tmp
alloc/: This directory is shared across all tasks in an allocation and can be used to store data that needs to be used by multiple tasks, such as a log shipper. This is the directory that's provided to the task as the
NOMAD_ALLOC_DIR
. Note that thisalloc/
directory is not the same as the "allocation working directory", which is the top-level directory. All tasks in a task group can read and write to thealloc/
directory. But the full host path may differ depending on the task driver's filesystem isolation mode, so tasks should always used theNOMAD_ALLOC_DIR
environment variable to find this path rather than relying on the specific implementation of thenone
,chroot
, orimage
modes. Within thealloc/
directory are three standard directories:alloc/data/: This directory is the location used by the
ephemeral_disk
block for shared data.alloc/logs/: This directory is the location of the log files for every task within an allocation. The
nomad alloc logs
command streams these files to your terminal.alloc/tmp/: A temporary directory used as scratch space by task drivers.
«taskname»: Each task has a task working directory with the same name as the task. Tasks in a task group can't read each other's task working directory. Depending on the task driver's filesystem isolation mode, a task may not be able to access the task working directory. Within the
task/
directory are three standard directories:«taskname»/local/: This directory is the location provided to the task as the
NOMAD_TASK_DIR
. Note this is not the same as the "task working directory". This directory is private to the task.«taskname»/private/: This directory is used by Nomad to store private files related to the allocation, such as Vault tokens, that are not shared with tasks when using
image
isolation. The contents of files in this directory cannot be read by thenomad alloc fs
command or the via Nomad's API.«taskname»/secrets/: This directory is the location provided to the task as
NOMAD_SECRETS_DIR
. The contents of files in this directory cannot be read by thenomad alloc fs
command. It can be used to store secret data that should not be visible outside the task. Where possible it is backed by an in-memory filesystem and mountednoexec
.«taskname»/tmp/: A temporary directory used as scratch space by task drivers.
The allocation working directory is the directory you see when using the
nomad alloc fs
command. If you were to run nomad alloc fs
against the
allocation that made the working directory shown above, you'd see the
following:
$ nomad alloc fs c0b2245f
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z alloc/
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z task1/
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z task2/
$ nomad alloc fs c0b2245f alloc/
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z data/
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z logs/
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/
$ nomad alloc fs c0b2245f task1/
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:33Z local/
drwxrwxrwx 60 B 2020-10-27T18:00:32Z private/
drwxrwxrwx 60 B 2020-10-27T18:00:32Z secrets/
dtrwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/
Task Drivers and Filesystem Isolation Modes
Depending on the task driver, the task's working directory may also be the root directory for the running task. This is determined by the task driver's filesystem isolation capability.
image
isolation
Task drivers like docker
or qemu
use image
isolation, where the task
driver isolates task filesystems as machine images. These filesystems are
owned by the task driver's external process and not by Nomad itself. These
filesystems will not typically be found anywhere in the allocation working
directory. For example, Docker containers will have their overlay filesystem
unpacked to /var/run/docker/containerd/«container_id»
by default.
Nomad will provide the NOMAD_ALLOC_DIR
, NOMAD_TASK_DIR
, and
NOMAD_SECRETS_DIR
to tasks with image
isolation, typically by
bind-mounting them to the task driver's filesystem.
You can see an example of image
isolation by running the following minimal
job:
job "example" {
datacenters = ["dc1"]
task "task1" {
driver = "docker"
config {
image = "redis:6.0"
}
}
}
If you look at the allocation working directory from the host, you'll see a minimal filesystem tree:
.
├── alloc
│ ├── data
│ ├── logs
│ │ ├── task1.stderr.0
│ │ └── task1.stdout.0
│ └── tmp
└── task1
├── local
├── private
├── secrets
└── tmp
The nomad alloc fs
command shows the same bare directory tree:
$ nomad alloc fs b0686b27
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z alloc/
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z task1/
$ nomad alloc fs b0686b27 task1
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z local/
drwxrwxrwx 60 B 2020-10-27T18:51:54Z private/
drwxrwxrwx 60 B 2020-10-27T18:51:54Z secrets/
dtrwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z tmp/
$ nomad alloc fs b0686b27 task1/local
Mode Size Modified Time Name
If you inspect the Docker container that's created, you'll see three directories bind-mounted into the container:
$ docker inspect 32e | jq '.[0].HostConfig.Binds'
[
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc",
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local",
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets"
]
The root filesystem inside the container can see these three mounts, along with the rest of the container filesystem:
$ docker exec -it 32e /bin/sh
# ls /
alloc boot dev home lib64 media opt root sbin srv tmp var
bin data etc lib local mnt proc run secrets sys usr
Note that because the three directories are bind-mounted into the container
filesystem, nothing written outside those three directories elsewhere in the
allocation working directory will be accessible inside the container. This
means templates, artifacts, and dispatch payloads for tasks with image
isolation must be written into the NOMAD_ALLOC_DIR
, NOMAD_TASK_DIR
, or
NOMAD_SECRETS_DIR
.
To work around this limitation, you can use the task driver's mounting
capabilities to mount one of the three directories to another location in the
task. For example, with the Docker driver you can use the driver's mounts
block to bind a secret written by a template
block to the
NOMAD_SECRETS_DIR
into a configuration directory elsewhere in the task:
job "example" {
datacenters = ["dc1"]
task "task1" {
driver = "docker"
config {
image = "redis:6.0"
mounts = [{
type = "bind"
source = "secrets"
target = "/etc/redis.d"
readonly = true
}]
template {
destination = "${NOMAD_SECRETS_DIR}/redis.conf"
data = <<EOT
{{ with secret "secrets/data/redispass" }}
requirepass {{- .Data.data.passwd -}}{{end}}
EOT
}
}
}
}
Note that relative mount source path are relative to the task working
directory, so to bind the NOMAD_ALLOC_DIR
as a mount source, you will need
to use a relative path that traverses up into the allocation working directory
(ex. source = "../alloc"
).
chroot
isolation
Task drivers like exec
or java
(on Linux) use chroot
isolation, where
the task driver isolates task filesystems with chroot
or pivot_root
. These
isolated filesystems will be built inside the task working directory.
You can see an example of chroot
isolation by running the following minimal
job on Linux:
job "example" {
datacenters = ["dc1"]
task "task2" {
driver = "exec"
config {
command = "/bin/sh"
args = ["-c", "sleep 600"]
}
}
}
If you look at the allocation working directory from the host, you'll see a
filesystem tree that has been populated with the task driver's chroot
contents, in addition to the NOMAD_ALLOC_DIR
, NOMAD_TASK_DIR
, and
NOMAD_SECRETS_DIR
:
.
├── alloc
│ ├── container
│ ├── data
│ ├── logs
│ └── tmp
└── task2
├── alloc
├── bin
├── dev
├── etc
├── executor.out
├── lib
├── lib32
├── lib64
├── local
├── private
├── proc
├── run
├── sbin
├── secrets
├── sys
├── tmp
└── usr
Likewise, the root directory of the task is now available in the nomad alloc fs
command output:
$ nomad alloc fs eebd13a7
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z task2/
$ nomad alloc fs eebd13a7 task2
Mode Size Modified Time Name
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z bin/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z dev/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z etc/
-rw-r--r-- 297 B 2020-10-27T19:05:24Z executor.out
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib32/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib64/
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z local/
drwxrwxrwx 60 B 2020-10-27T19:05:22Z private/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z proc/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z run/
drwxr-xr-x 12 KiB 2020-10-27T19:05:22Z sbin/
drwxrwxrwx 60 B 2020-10-27T19:05:22Z secrets/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z sys/
dtrwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z tmp/
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z usr/
Nomad will provide the NOMAD_ALLOC_DIR
, NOMAD_TASK_DIR
, and
NOMAD_SECRETS_DIR
to tasks with chroot
isolation. But unlike with image
isolation, Nomad does not need to bind-mount the NOMAD_TASK_DIR
directory
because it can be directly created inside the chroot.
$ nomad alloc exec eebd13a7 /bin/sh
$ mount
...
/dev/mapper/root on /alloc type ext4 (rw,relatime,errors=remount-ro,data=ordered)
tmpfs on /private type tmpfs (rw,noexec,relatime,size=1024k)
tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k)
...
none
isolation
The raw_exec
task driver (or the java
task driver on Windows) uses the
none
filesystem isolation mode. This means the task driver does not isolate
the filesystem for the task, and the task can read and write anywhere the
user that's running Nomad can.
You can see an example of none
isolation by running the following minimal
raw_exec
job on Linux or Unix.
job "example" {
datacenters = ["dc1"]
task "task3" {
driver = "raw_exec"
config {
command = "/bin/sh"
args = ["-c", "sleep 600"]
}
}
}
If you look at the allocation working directory from the host, you'll see a minimal filesystem tree:
.
├── alloc
│ ├── data
│ ├── logs
│ │ ├── task3.stderr.0
│ │ └── task3.stdout.0
│ └── tmp
└── task3
├── executor.out
├── local
├── private
├── secrets
└── tmp
The nomad alloc fs
command shows the same bare directory tree:
$ nomad alloc fs 87ec7d12 task3
Mode Size Modified Time Name
-rw-r--r-- 140 B 2020-10-27T19:15:33Z executor.out
drwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z local/
drwxrwxrwx 60 B 2020-10-27T19:15:33Z private/
drwxrwxrwx 60 B 2020-10-27T19:15:33Z secrets/
dtrwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z tmp/
But if you use nomad alloc exec
to view the filesystem from inside the
container, you'll see that the task has access to the entire root
filesystem. The NOMAD_ALLOC_DIR
, NOMAD_TASK_DIR
, and NOMAD_SECRETS_DIR
point to the filepath on the host, not a path anchored in the task working
directory. And the task is running as root
, because the Nomad client agent
is running as root
. This is why the raw_exec
driver is disabled by
default.
$ nomad alloc exec 87ec7d12 /bin/sh
# ls /
bin dev home lib lib64 lost+found mnt proc run snap sys usr vmlinuz
boot etc initrd.img lib32 libx32 media opt root sbin srv tmp var
# echo $NOMAD_SECRETS_DIR
/var/nomad/alloc/87ec7d12-5e35-8fba-96cc-09e5376be15a/task3/secrets
# whoami
root
Templates, Artifacts, and Dispatch Payloads
The other contents of the allocation working directory depend on what features the job specification uses. The allocation working directory is populated by other features in a specific order:
- The allocation working directory is created.
- The ephemeral disk data is migrated from any previous allocation.
- CSI volumes are staged.
- Then, for each task:
- Task working directories are created.
- Dispatch payloads are written.
- Artifacts are downloaded.
- Templates are rendered.
- The task is started by the task driver, which includes all bind mounts and volume mounts.
Dispatch payloads, artifacts, and templates are written to the task working
directory before a task can start because the resulting files may be binary or
image run by the task. For example, an artifact
can be used to download a
Docker image or .jar file, or a template
can be used to render a shell
script that's run by exec
.
The artifact
and template
blocks write their data to a destination
relative to the task working directory, not the NOMAD_TASK_DIR
. For task
drivers with image
filesystem isolation, this means the destination
field
path should be prefixed with either NOMAD_TASK_DIR
or
NOMAD_SECRETS_DIR
. Otherwise, the file will not be visible from inside the
resulting container. (The dispatch_payload
block always writes its data to
the NOMAD_TASK_DIR
.)
For CSI volumes, the client will stage the volume before setting up the task working directory. Staging typically involves mounting the volume into the CSI plugin's task directory, sending commands to the plugin to format the volume as required, and making a volume claim to the Nomad server.
The behavior of the volume_mount
block is controlled by the task driver. The
client builds a mount configuration describing the host volume or CSI volume
and passes it to the task driver to execute. Because the task driver mounts
the volume, it is not possible to have artifact
, template
, or
dispatch_payload
blocks write to a volume.