At my current employer, we use Kubernetes to run hundreds of thousands of bare
metal servers, spread over hundreds of Kubernetes clusters. We use Kubernetes
beyond officially supported/tested scale limits by running more than 5,000
nodes and over a hundred thousand of pods in a single cluster.1 In these
large scale setups, expensive “list” calls on the Kubernetes API are the
achilles heel of the control plane reliability and scalability. In this article,
I’ll explain which list call patterns pose the most risk, and how recent and
upcoming Kubernetes versions are improving the list API performance.