A quick code search query reveals at least 7,000 Kubernetes Custom Resource Definitions in the open source corpus,1 most of which are likely generated with controller-gen —a tool that turns Go structs with comments-based markers into Kubernetes CRD manifests, which end up being custom APIs served by the Kubernetes API server.

At LinkedIn, we develop our fair share of custom Kubernetes APIs and controllers to run workloads or manage infrastructure. In doing so, we rely on the custom resource machinery and controller-gen heavily to generate our CRDs.

Table of Contents

Validate religiously

As a controller developer, you should only admit validated and complete custom resources into your API server.

Any resource that has illegal values or missing fields is begging for trouble to happen down the line. Your controllers should not have implicit defaults for resources. As the Kubernetes API conventions recommend:

In general we want default values to be explicitly represented in our APIs, rather than asserting that “unspecified fields get the default behavior”.

You cannot reliably compensate for a missing field in your controller in the long term, nor should you have to deal with an illegal value during reconciliation.

Explicit +required or +optional on every field

controller-gen has many different ways of marking a field “optional”:

  1. Go struct field has omitempty marker:

    type Car struct {
        Brand json:"brand,omitempty"`
    
  2. The struct field has the //+optional marker comment.

    type Car struct {
         //+optional
        Brand json:"brand"`
    
  3. The struct field has //+kubebuilder:validation:Optional marker comment.

    type Car struct {
         //+kubebuilder:validation:Optional
        Brand json:"brand"`
    

    …and typically you might think that’s it but:

  4. You have a package-level marker on the Go package that makes all fields “optional by default” (a feature I’m yet to find a use case for):

    //+kubebuilder:validation:Optional
    package v1beta1
    

This is simply far too many different ways to achieve something, and it offers too many ways to open your API up to more relaxed validation due to misconfiguration.

Up until controller-tools v0.16 (released last month) it was not possible to reliably mark a field as required. (Even if you specified the +required marker on the field, using the omitempty tag would silently turn the field into optional.)

For this reason alone, I strongly recommend upgrading to controller-tools v0.16+ and start explicitly specifying +required or +optional markers on every single field of your API. You may already find that your API was making some fields optional by mistake if you do this.

I recommend still having the package-level //+kubebuilder:validation:Required marker to so that all struct fields are required by default as a safety net.

Field Validation

Zero vs null pitfalls

A major pitfall in understanding CRD validation is that the Go type system allows for zero values to pass the +required check: Empty strings (""), zero numerics (0, 0.0), empty slices ([]), or empty maps ({}) are all valid values for their respective types.

The OpenAPI schema validation looks for the non-null presence of the field in the request payload.

This mistake will usually fly under the radar because you’re probably writing your tests in Go. When the Go JSON serializer turns your test object into a JSON payload, the request body will have "field": "" (because you don’t have omitempty on the field), and the server will accept this as a valid resource. Your tests will not fail.

If you really want to disallow zero values or empty strings for a field, use markers like:

  • +kubebuilder:validation:MinLength=1 for strings
  • +kubebuilder:validation:Minimum=1 for integers.

Nested fields are not always validated

Consider this Car custom resource type that has a required spec.brand field with an enum validation:

package v1beta1
//+kubebuilder:object:root=true
type Car struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec CarSpec `json:"spec,omitempty"`
}

type CarSpec struct {
    //+kubebuilder:validation:Enum=BMW;Porsche;McLaren
    //+required
    Brand string `json:"brand"`
}

It’s still possible to create a resource like the following and skip the the API server validation:

apiVersion: example.com/v1beta1
kind: Car
metadata:
    name: my-car

This is a valid object as far as the API server is concerned, but it’s not what you wanted. The Brand field was not validated because of how Open API schema validation works: An object field is only validated if it is specified on the request payload.

If the YAML payload listed above had included a spec: {} on the wire, it would have been validated and the request would have been rejected.

Markers aren’t always validated

While the controller-gen tool does extensive validation of Go structs you are authoring for small mistakes, it does not validate the markers it does not recognize:

type Car struct {
    //+kubebuilder:validation:enum=BMW;Porsche;McLaren
    Brand string `json:"brand,omitempty"`
}

If you can’t spot the error above, you’re one of the dozens of developers that thought controller-gen would’ve complained if you misspelled :Enum as :enum.

Do not rely on every validation markers to be strictly validated by controller-gen, and inspect the generated CustomResourceDefinition manifests.

Field Defaulting

Defaulting on nested structs

Let’s extend the Car API to define a Transmission.Type field that’s not required, but defaults to Automatic using the +kubebuilder:default marker:

type CarSpec struct {
    //+kubebuilder:validation:Enum=BMW;Porsche;McLaren
    //+required
    Brand        string        `json:"brand"`

    //+optional
    Transmission Transmission  `json:"transmission,omitempty"`
}

type Transmission struct {
    //+kubebuilder:default:=Automatic
    Type string `json:"type,omitempty"`
}

If you craft a request to create a Car resource and the request body lacks a transmission field like this:

spec:
  brand: BMW

the Transmission.Type field won’t be defaulted to Automatic —for the same reason listed above on how OpenAPI schema validation works on nested structs: Members of the Transmission type is defaulted only if the field has a non-null value in the request payload.

To avoid this pitfall, you can set a default the value on the Transmission field to empty object ({}), which will do the defaulting on its nested fields:

//+kubebuilder:default:={}
//+optional
Transmission Transmission `json:"transmission,omitempty"`

It is now possible to omit the spec.transmission field from the request body on the wire, and the resulting object will have transmission: {type: Automatic}.

Defaulting and validation at the same time

Let’s continue from the previous example, and make the spec.transmission.type field //+required like this

type CarSpec struct {
	//+kubebuilder:default:={}
	//+optional
	Transmission Transmission `json:"transmission,omitempty"`
}

type Transmission struct {
    //+kubebuilder:default:=Automatic
	//+required
	Type string `json:"type"`
}

If you run controller-gen crd to generate a CRD from this, you’ll see that it is explicitly failing:

The CustomResourceDefinition "cars.example.com" is invalid: spec.validation.openAPIV3Schema.properties[spec].properties[transmission].default.type: Required value

In this error, the API Server it tells you that the default value transmission: {} is not a valid value for that field using its OpenAPI schema validation, and refuses to accept the CustomResourceDefinition.

You can fix this by providing a more complete default value on the parent struct:

type CarSpec struct {
    //+kubebuilder:default:={type:Automatic}
    Transmission Transmission `json:"transmission,omitempty"`
    ...

However, this is not ideal because we ended up duplicating the default value "Automatic" twice: once on the CarSpec.Transmission.Type field, and once on the CarSpec.Transmission. This is potentially a maintenance nightmare. Let me know if you have a better solution to this problem. But this is the only way I know to make this work.

Explicit defaults for zeroable fields

Suppose you’re implementing the ReplicaSet API and its controller, and you have fields like .status.readyReplicas updated by the controller. Since a single controller is responsible for updating the status, you probably make a PATCH request.

However, if you calculate a patch when both the “before” and “after” objects have readyReplicas: 0, the resulting payload will not have a readyReplicas field.

As a result, the Kubernetes API machinery will not set a value for this field. and your status will be missing this field (which is not what you want, because your clients will expect this field to be present even when the value is 0). The field will remain non-existent in the status object until the controller updates it to a non-zero value, which might never happen.

That’s why you should consider explicitly configuring the default values on the controller-managed fields (where applicable) in case the controller sends a partial patch that never sets a value for the field:

type MyWorkloadStatus struct {
    //+kubebuilder:default:=0
    ReadyReplicas int32 `json:"readyReplicas"`

Functionally, this is not super critical but when you see a mix of empty values and 0 values in your kubectl get output, you now know why it might be happening.

Conclusion

controller-gen has its fair share of quirks. Just like any weakly-typed system, controller-gen ideally should be accompanied by a more robust static analysis tool or linter that can catch these mistakes before they’re committed to the repository. I think we’re lacking more tooling in this space (though some tools exist).

Comment-based markers pose a real risk of breaking the backwards compatibility on an API field (e.g. by making a field required, or removing an enum value). Human judgement goes only so far in getting these right.

It is maintained by a fairly small yet active developer community. If you use CRDs at your company, consider contributing to the project to make it better.

If you’re using controller-gen in your project, I hope this article helps you avoid some of the pitfalls we’ve learned. If you’re looking to make the ecosystem better, hopefully the problems inspire you to build static analysis tools that can catch these problems before they’re committed.


  1. This astounding number excludes most of the repos that also automatically generate the Go types (cloud providers, Crossplane, etc.). It’s fair to say we’re all probably overdoing CRDs a bit, but that’s a topic for another day. ↩︎