Files on Kubernetes Secret and ConfigMap volumes work in peculiar and
undocumented ways when it comes to watching changes to these files with the
inotify(7) syscall. Your typical file watch that works outside
Kubernetes might not work as you expect when you run the same progam on
On a normal filesystem, you start a watch on a file on disk with a library and
expect to get an event like
IN_MODIFY (file modified) or
(file opened for writing closed) when the file is changed. But these filesystem
events never happen for files on Kubernetes Secret/ConfigMap volumes.
On Kubernetes, you only receive
IN_DELETE_SELF event (as if the file is
deleted) on ConfigMap/Secret volumes. This deletion event breaks the
watch initiated and your code needs to handle re-establishing the monitor
every time the mounted file is updated.
Let’s talk about why this happens.
Resilient file reloads from disk
If you ever attempted to write production-ready code that reloaded files from disk when they’re updated, you should’ve asked these questions:
How do you make sure the file is completely written? Did the
IN_UPDATEevent (indicating a
write(2)call) finish writing the file, or the file is partially written and is in an inconsistent state to be consumed?
When watching multiple files that are updated together (think of an
tls.keypair), when is the right time to re-read both files and start using them? One of the files might be updated, but not the other one yet (which you’d expect it to happen if these files are written to disk by another process).
Atomic file writes on Kubernetes
When you implement configuration or secrets reloading in your program from a Secret/ConfigMap volume on Kubernetes, you expect the files to be atomically written in full (no half-written files observed by the reader) and multiple files on the same volume to be consistent as they appear on the Secret/ConfigMap object.
kubelet, which is the host agent running your containers, is responsible
for watching updates to referenced Secret/ConfigMap objects and updating the
mounted files on these volumes in an atomic and consistent way described above.
So how does the
kubelet do this? With
AtomicWriter, which handles
atomically projecting content for a set of files into a target directory. The
algorithm explained in detail here, but I’ll illustrate it shortly.
In a nutshell, the AtomicWriter makes use of things you can do atomically on POSIX kernels (specifically symbolic links) to create the illusion of atomic updates to the entire Secret/ConfigMap volume.
If you mount a Secret with data fields
password.txt into a
Pod, you’ll see these entries in your mount point directory:
drwxrwxrwt 3 root root 120 Sep 22 15:29 . drwxr-xr-x 1 root root 4096 Sep 22 15:29 .. drwxr-xr-x 2 root root 80 Sep 22 15:29 ..2022_09_22_15_29_04.2914482033 lrwxrwxrwx 1 root root 32 Sep 22 15:29 ..data -> ..2022_09_22_15_29_04.2914482033 lrwxrwxrwx 1 root root 19 Sep 22 15:29 password.txt -> ..data/password.txt lrwxrwxrwx 1 root root 19 Sep 22 15:29 username.txt -> ..data/username.txt
password.txt are called “user-visible files” and
they are symlinked to a directory called
..data/. In this case,
itself is also a symbolic link to a timestamped dir named
..2022_09_22_15_29_04.2914482033. This is the actual directory where the
regular (real) files you’re reading are located.
So, when you start an
inotify monitor on “user-visible files”, the default
behavior of the system call is to follow the symbolic links recursively and
watch the regular file at the end of the symbolic link chain (as that’s the file
that will probably get updated, but not on Kubernetes).
This is where the Kubernetes AtomicWriter implementation comes into the
picture: If there’s an update to the Secret/ConfigMap, kubelet will create a new
timestamped directory, write files to it, update
..data symlink to the new
timestamped directory (remember, it’s something you can do atomically,
and finally “delete” the old timestamped directory. It’s how the files from a
Secret/ConfigMap volume are always complete and consistent with one another.
This behavior is also why you only get a
IN_DELETE_SELF (file deleted)
event when the file on a Kubernetes Secret/ConfigMap volume you’re watching is
inotify system call offers an
IN_DONT_FOLLOW option to not follow the
symbolic links on the specified path, but if you use this you will never get an
inotify event when the real file itself is updated. This is because the
“user-visible file” like
password.txt is symlink to another symlink
..data/password.txt) and the user-visible file itself never changes and
always links to the same path.
The behavior gets a little bit more complicated when you mount data fields in Secrets/ConfigMaps individually as files (as opposed to mounting it all as a directory), but you can easily test that yourself.
When you start an inotify watch on an individual file, the
event indictes that the watch is now broken, and needs to be re-established.
Most high-level libraries around inotify will not handle that for you, so you
may be surprised that you only receive the deletion event only once and never
If you rely on reloading files on update while running on Kubernetes, I recommend that you:
- mount your ConfigMaps/Secrets as directories, not individual files
- start inotify watches on individual files, not directories (requires the point above)
- avoid using
IN_DONT_FOLLOWoption so that you can observe inotify events when a file is changed (user-visible files themselves are just symlinks, so they are not updated)
- handle inotify deletion events like
IN_DELETE_SELFas they’re the only events you’ll receive
- re-establish inotify watches when you receive deletion events as they’re now broken (but you still should close the broken watch)
- test your file reloading logic on Kubernetes
As far as I can tell, this is not widespread knowledge and the information is tucked away in some blog posts (1, 2, 3), so I’ve opened this issue to document this behavior in official Kubernetes documentation.
Thanks Shatil Rafiullah for reading drafts of this.