Files on Kubernetes Secret and ConfigMap volumes work in peculiar and undocumented ways when it comes to watching changes to these files with the inotify(7) syscall. Your typical file watch that works outside Kubernetes might not work as you expect when you run the same progam on Kubernetes.

On a normal filesystem, you start a watch on a file on disk with a library and expect to get an event like IN_MODIFY (file modified) or IN_CLOSE_WRITE (file opened for writing closed) when the file is changed. But these filesystem events never happen for files on Kubernetes Secret/ConfigMap volumes.

On Kubernetes, you only receive IN_DELETE_SELF event (as if the file is deleted) on ConfigMap/Secret volumes. This deletion event breaks the inotify watch initiated and your code needs to handle re-establishing the monitor every time the mounted file is updated.

Let’s talk about why this happens.

Resilient file reloads from disk

If you ever attempted to write production-ready code that reloaded files from disk when they’re updated, you should’ve asked these questions:

  1. How do you make sure the file is completely written? Did the IN_UPDATE event (indicating a write(2) call) finish writing the file, or the file is partially written and is in an inconsistent state to be consumed?

  2. When watching multiple files that are updated together (think of an tls.crt/tls.key pair), when is the right time to re-read both files and start using them? One of the files might be updated, but not the other one yet (which you’d expect it to happen if these files are written to disk by another process).

Atomic file writes on Kubernetes

When you implement configuration or secrets reloading in your program from a Secret/ConfigMap volume on Kubernetes, you expect the files to be atomically written in full (no half-written files observed by the reader) and multiple files on the same volume to be consistent as they appear on the Secret/ConfigMap object.

The kubelet, which is the host agent running your containers, is responsible for watching updates to referenced Secret/ConfigMap objects and updating the mounted files on these volumes in an atomic and consistent way described above.

So how does the kubelet do this? With AtomicWriter, which handles atomically projecting content for a set of files into a target directory. The algorithm explained in detail here, but I’ll illustrate it shortly.

In a nutshell, the AtomicWriter makes use of things you can do atomically on POSIX kernels (specifically symbolic links) to create the illusion of atomic updates to the entire Secret/ConfigMap volume.

If you mount a Secret with data fields username.txt and password.txt into a Pod, you’ll see these entries in your mount point directory:

drwxrwxrwt 3 root root  120 Sep 22 15:29 .
drwxr-xr-x 1 root root 4096 Sep 22 15:29 ..
drwxr-xr-x 2 root root   80 Sep 22 15:29 ..2022_09_22_15_29_04.2914482033
lrwxrwxrwx 1 root root   32 Sep 22 15:29 ..data -> ..2022_09_22_15_29_04.2914482033
lrwxrwxrwx 1 root root   19 Sep 22 15:29 password.txt -> ..data/password.txt
lrwxrwxrwx 1 root root   19 Sep 22 15:29 username.txt -> ..data/username.txt

Here, the username.txt and password.txt are called “user-visible files” and they are symlinked to a directory called ..data/. In this case, ..data itself is also a symbolic link to a timestamped dir named ..2022_09_22_15_29_04.2914482033. This is the actual directory where the regular (real) files you’re reading are located.

So, when you start an inotify monitor on “user-visible files”, the default behavior of the system call is to follow the symbolic links recursively and watch the regular file at the end of the symbolic link chain (as that’s the file that will probably get updated, but not on Kubernetes).

This is where the Kubernetes AtomicWriter implementation comes into the picture: If there’s an update to the Secret/ConfigMap, kubelet will create a new timestamped directory, write files to it, update ..data symlink to the new timestamped directory (remember, it’s something you can do atomically, and finally “delete” the old timestamped directory. It’s how the files from a Secret/ConfigMap volume are always complete and consistent with one another.

This behavior is also why you only get a IN_DELETE_SELF (file deleted) event when the file on a Kubernetes Secret/ConfigMap volume you’re watching is updated.

The inotify system call offers an IN_DONT_FOLLOW option to not follow the symbolic links on the specified path, but if you use this you will never get an inotify event when the real file itself is updated. This is because the “user-visible file” like password.txt is symlink to another symlink (..data/password.txt) and the user-visible file itself never changes and always links to the same path.

The behavior gets a little bit more complicated when you mount data fields in Secrets/ConfigMaps individually as files (as opposed to mounting it all as a directory), but you can easily test that yourself.

Impact

When you start an inotify watch on an individual file, the IN_DELETE_SELF event indictes that the watch is now broken, and needs to be re-established. Most high-level libraries around inotify will not handle that for you, so you may be surprised that you only receive the deletion event only once and never again.

If you rely on reloading files on update while running on Kubernetes, I recommend that you:

  1. mount your ConfigMaps/Secrets as directories, not individual files
  2. start inotify watches on individual files, not directories (requires the point above)
  3. avoid using IN_DONT_FOLLOW option so that you can observe inotify events when a file is changed (user-visible files themselves are just symlinks, so they are not updated)
  4. handle inotify deletion events like IN_DELETE_SELF as they’re the only events you’ll receive
  5. re-establish inotify watches when you receive deletion events as they’re now broken (but you still should close the broken watch)
  6. test your file reloading logic on Kubernetes

As far as I can tell, this is not widespread knowledge and the information is tucked away in some blog posts (1, 2, 3), so I’ve opened this issue to document this behavior in official Kubernetes documentation.

Thanks Shatil Rafiullah for reading drafts of this.