/home/ahmetb
  • Blog
  • About me
  • Tweets
  • GitHub
  • Talks

Blog

27 February 2025

Every pod eviction in Kubernetes, explained

Anyone who is running Kubernetes in a large-scale production setting cares about having a predictable Pod lifecycle. Having unknown actors that can terminate your Pods is a scary thought, especially when you’re running stateful workloads or care about availability in general.

There are so many ways Kubernetes terminates workloads, each with a non-trivial (and not always predictable) machinery, and there’s no page that lists out all eviction modes in one place. This article will dig into Kubernetes internals to walk you through all the eviction paths that can terminate your Pods, and why “kubelet restarts don’t impact running workloads” isn’t always true, and finally I’ll leave you with a cheatsheet at the end. Read More →

22 January 2025

So you wanna write Kubernetes controllers?

Any company using Kubernetes eventually starts looking into developing their custom controllers. After all, what’s not to like about being able to provision resources with declarative configuration: Control loops are fun, and Kubebuilder makes it extremely easy to get started with writing Kubernetes controllers. Next thing you know, customers in production are relying on the buggy controller you developed without understanding how to design idiomatic APIs and building reliable controllers.

Low barrier to entry combined with good intentions and the “illusion of working implementation1” is not a recipe for success while developing production-grade controllers. I’ve seen the real-world consequences of controllers developed without adequate understanding of Kubernetes and the controller machinery at multiple large companies. We went back to the drawing board and rewritten nascent controller implementations a few times to observe which mistakes people new to controller development make. Read More →

18 November 2024

Notes on OpenAI Kubernetes outage

Last week, OpenAI has suffered a several hours long outage and published a detailed postmortem about it. Highly recommend reading it. These technical reports are usually a gold mine for all large-scale Kubernetes users, as we all go through similar set of reliability issues running Kubernetes in production. Read More →

15 November 2024

Tale of a Kubernetes node-feature-discovery incident

This is the analysis of a low severity incident that took place in the Kubernetes clusters at the company I work at that taught me a lot about how to think about the off-the-shelf components we bring from the ecosystem into the critical path and operate at a scale much larger than these components are intended. Read More →

10 September 2024

Kubernetes CRD generation pitfalls

A quick code search query reveals at least 7,000 Kubernetes Custom Resource Definitions in the open source corpus,1 most of which are likely generated with controller-gen —a tool that turns Go structs with comments-based markers into Kubernetes CRD manifests, which end up being custom APIs served by the Kubernetes API server.

At LinkedIn, we develop our fair share of custom Kubernetes APIs and controllers to run workloads or manage infrastructure. In doing so, we rely on the custom resource machinery and controller-gen heavily to generate our CRDs. Read More →

28 December 2022

Why Kubernetes secrets take so long to update?

I’ve recently done a Twitter poll and only 20% of the participants accurately predicted that it takes Kubernetes 60-90 seconds to propagate changes to Secrets and ConfigMaps on the mounted volumes. So I want to take you on a journey in the codebase on how the mechanics of these volume types work and why it takes so long. Before going on this journey, I would answer the poll “nearly instantly” (like the majority 40% did). Read More →

22 September 2022

Pitfalls reloading files from Kubernetes Secret & ConfigMap volumes

Files on Kubernetes Secret and ConfigMap volumes work in peculiar and undocumented ways when it comes to watching changes to these files with the inotify(7) syscall. Your typical file watch that works outside Kubernetes might not work as you expect when you run the same progam on Kubernetes.

On a normal filesystem, you start a watch on a file on disk with a library and expect to get an event like IN_MODIFY (file modified) or IN_CLOSE_WRITE (file opened for writing closed) when the file is changed. But these filesystem events never happen for files on Kubernetes Secret/ConfigMap volumes. Read More →

16 June 2021

Did we market Knative wrong?

It has been over two years since we announced Knative. As the project and its community is going strong, I think we made some mistakes in the early positioning and messaging of Knative prevented the project from being a go-to addon for Kubernetes that’s adopted widely. Because I have never been a decision-maker for the Knative project and its messaging at Google, I can provide an outsider’s perspective despite having worked on different aspects of Knative during this time. Read More →

  • ««
  • «
  • 1
  • 2
  • 3
  • 4
  • 5
  • »
  • »»