Deploying to Cloud Run with Go

I’m continuing with these articles on Cloud Run REST API that nobody really needs to read. This time, I’m back with a Go code walkthrough that shows how to deploy and manage services in Cloud Run through its Go API client library.

Cloud Run already offers deployment via mainstream method like CLI, web UI, IDEs and Terraform. So this article is dedicated for the <%0.1 of Cloud Run users (a.k.a. the ones keeping it real) out there who needs to use Cloud Run API with Go. Let’s begin.

You can find the code examples below as a complete program in this repository.

# Agenda

Checking if a Cloud Run service exists
Deploying a new Cloud Run service
Waiting for the service to become “ready”
Making a service public (IAM)
Releasing a new revision and traffic splitting
Deleting a service

The package we will mainly use is google.golang.org/api/run/v1.

go get -u google.golang.org/api/run/v1

We will utilize a small utility function that gives us a regional API endpoint for Cloud Run, as the default endpoint run.googleapis.com does not offer the Knative endpoints for service management:

func client(region string) (*run.APIService, error) {
	return run.NewService(context.TODO(),
		option.WithEndpoint(fmt.Sprintf("https://%s-run.googleapis.com", region)))
}

# Checking if a service exists

This involves querying the service and looking for a 404 Not Found status code.

func serviceExists(c *run.APIService, region, project, name string) (bool, error) {
	_, err := c.Namespaces.Services.Get(fmt.Sprintf("namespaces/%s/services/%s", project, name)).Do()
	if err == nil {
		return true, nil
	}
	// not all errors indicate service does not exist, look for 404 status code
	v, ok := err.(*googleapi.Error)
	if !ok {
		return false, fmt.Errorf("failed to query service: %w", err)
	}
	if v.Code == http.StatusNotFound {
		return false, nil
	}
	return false, fmt.Errorf("unexpected status code=%d from get service call: %w", v.Code, err)
}

# Deploying a new service

First, we need to initialize a Service object. This is the Go representation of Knative Service YAML manifest you see on Cloud Console. It can have very few fields (e.g. doesn’t need namespace as it is inferred by the API).

svc := &run.Service{
    ApiVersion: "serving.knative.dev/v1",
    Kind:       "Service",
    Metadata: &run.ObjectMeta{
        Name: name,
    },
    Spec: &run.ServiceSpec{
        Template: &run.RevisionTemplate{
            Metadata: &run.ObjectMeta{Name: name + "-v1"},
            Spec: &run.RevisionSpec{
                Containers: []*run.Container{
                    {
                        Image: "gcr.io/google-samples/hello-app:1.0",
                    },
                },
            },
        },
    },
}

Then we make an API call, that returns a populated a Service object that is not all that useful because it is missing many fields like status.url.

Note that this call succeeding does not mean the deployed application works fine, it only indicates that the API has objected the object. The readiness occurs asynchronously.

_, err = c.Namespaces.Services.Create("namespaces/"+project, svc).Do()
// TODO handle err

# Waiting for readiness

Knative services that have all their Revisions “ready” and the routes configured to serve traffic are shown on Kubernetes API as:

status:
  conditions:
  - lastTransitionTime: "2021-04-06..."
    status: "True"
    type: "Ready"
  - lastTransitionTime: "2021-04-06..."
    status: "True"
    type: "RoutesReady"

We need to wait both of these Ready and RoutesReady conditions to be True.
When something fails status: False will be set along with message field containing the error.
We will have status: Unknown (or missing condition) while the deployment is in progress.

Let’s write a Go method for this that checks for the status every few seconds and quits with a timeout based on given context:

func waitForReady(ctx context.Context, c *run.APIService, region, project, name, condition string) error {
	t := time.NewTicker(time.Second * 5)
	defer t.Stop()
	for {
		select {
		case <-ctx.Done():
			return ctx.Err()
		case <-t.C:
			svc, err := getService(c, region, project, name)
			if err != nil {
				return fmt.Errorf("failed to query service for readiness: %w", err)
			}
			for _, c := range svc.Status.Conditions {
				if c.Type == condition {
					if c.Status == "True" {
						return nil
					} else if c.Status == "False" {
						return fmt.Errorf("service could not become %q (status:%s) (reason:%s) %s",
							condition, c.Status, c.Reason, c.Message)
					}
				}
			}
		}
	}
}

You can use this method like:

err = waitForReady(ctx, c, region, project, name, "Ready")
// TODO handle err
err = waitForReady(ctx, c, region, project, name, "RoutesReady")
// TODO handle err

# Configuring access

To make the application publicly accessible or only to some service accounts we use the IAM endpoints that are available in global Run API endpoint:

gc, err := run.NewService(context.TODO())
// TODO handle err

Giving public access to all visitors looks like this:

_, err = gc.Projects.Locations.Services.SetIamPolicy(
    fmt.Sprintf("projects/%s/locations/%s/services/%s", project, region, name),
    &run.SetIamPolicyRequest{
        Policy: &run.Policy{Bindings: []*run.Binding{{
            Members: []string{"allUsers"},
            Role:    "roles/run.invoker",
        }}},
    },
).Do()
// TODO handle err

It might take a few seconds for the IAM changes to take effect, so don’t be surprised if you immediately query the service URL and get an HTTP 403.

# Releasing a new Revision and splitting traffic

To make an update to the deployment, you just need to update the Service and save it with a different Revision name via spec.template.metadata.name.

Cloud Run creates a new Revision under the covers and starts sending all the traffic to the latest ready revision. But here we will do custom traffic splitting.

The caveat here is that we need to retrieve the Service from the API first and modify it in memory and save it. This provides an optimistic concurrency control built into the API and it prevents the update call from succeeding if somebody else has updated the object since you queried it. So it is ideal to add retries around this, which I omitted here.

svc, err = getService(c, region, project, name)
// TODO handle err

svc.Spec.Template.Metadata.Name = name + "-v2"
svc.Spec.Template.Spec.Containers[0].Image = "gcr.io/google-samples/hello-app:2.0"
svc.Spec.Template.Spec.Containers[0].Env = []*run.EnvVar{{Name: "FOO", Value: "bar"}}
svc.Spec.Template.Spec.Containers[0].Resources.Limits = map[string]string{
    "cpu":    "2",
    "memory": "1Gi"}
// let's split traffic as v1=90% v2=10%
svc.Spec.Traffic = []*run.TrafficTarget{{
    RevisionName: name + "-v1",
    Percent:      90,
}, {
    RevisionName: name + "-v2",
    Percent:      10,
}}

_, err = c.Namespaces.Services.ReplaceService(
    fmt.Sprintf("namespaces/%s/services/%s", project, name), svc).Do()
// TODO handle err

// wait for the service to become ready and start serving the route changes
err = waitForReady(ctx, c, region, project, name, "Ready")
// TODO handle err
err = waitForReady(ctx, c, region, project, name, "RoutesReady")
// TODO handle err

# Deleting the service

op, err := c.Namespaces.Services.Delete(
    fmt.Sprintf("namespaces/%s/services/%s", project, name)).Do()
// TODO handle err

Here, you can check for op.Status="Success" here to see if the deletion is accepted, and the deletion will happen asynchronously and the Service object will eventually disappear from the API. I’m not implementing that here for brevity.

That’s it! Most up to date code will be in the repository as I probably won’t update this article.