Building a high-scale chat server on Cloud Run

In this blog, I will show you how to use WebSockets support to build a fleet of serverless containers that make up a chatroom server that can scale a high number of concurrent connections (250,000 clients).

Architecture Diagram

The point of this article is to illustrate WebSockets on Cloud Run and the scale you can reach by using serverless. Typically, a chat server app would be a better fit for VM-based compute platforms and would run for much cheaper.

# Architecture

Cloud Run runs and scales any containerized service application. Based on the load (connected clients), it will add more container instances or shut down unused ones. Therefore, our chat server has to be stateless.

For a chatroom service to work, all users must see the same list of messages, even though they might be connected to different container instances on the backend, behind the Cloud Run’s load balancing.

To synchronize data between the dynamic fleet of container instances behind a Cloud Run service, we will use the Redis PubSub protocol simply because it delivers new messages to any connected client over a persistent TCP connection. In a dynamic environment like Cloud Run, this is the perfect solution.

# Scale

Unlike most serverless platforms, Cloud Run can serve multiple clients concurrently from a single container instance. Currently 250 clients can connect to a single container instance (as long as it can handle the load.)

Any Cloud Run service, by default, can scale up to 1,000 instances. (However, by doing a quota increase request in the Console, you can get this number elevated.) This means we can support 250,000 clients simultaneously without having to worry about infrastructure and scaling!

NOTE

WebSockets are currently in public preview on Cloud Run. We’re working on increasing the concurrency limits per container instance.

In this architecture, Cloud Run helps us perfectly cap at 250 clients connecting to a particular container instance (and it will ad more instances as new connections arrive). In the backend, each container establishes only 1 connection to Redis, so as long as our Redis instance is capable of handling the number of container instances that connect to it, it should work. This is not a problem for Cloud Memorystore, since it supports to 65,000 connections per instance.

You can also distribute your chat backends around the globe by deploying to multiple regions and distribute the load between multiple Cloud Run services more evenly before hitting the instance limit.

# Deployment

For illustration purposes, we will use go-websocket-chat-demo from Heroku.

Create a Redis instance on Cloud Memorystore. Make sure to choose the VPC network you will use. After it’s created, note its IP address.
Create a VPC Connector. This will let our Cloud Run service connect to Redis over VPC network. After it’s created, note its name.

Deploy the application to Cloud Run, by specifying the VPC connector name and the Redis IP address by running the following command in the repository root:

gcloud beta run deploy chatservice --source=. \
     --vpc-connector=[VPC_CONNECTOR_NAME] \
     --set-env-vars REDIS_URL=redis://[REDIS_IP]:6379 \
     --max-instances=1000 \
     --concurrency=250 \
     --timeout=3600 \
     --allow-unauthenticated

After you deploy this, you will get the auto-scaling endpoint that your users can connect to.

The example code shows you how to build a stateless WebSockets backend that synchronizes data between using the Redis PubSub protocol.

There are several things that we’ve left out in this example. For instance:

Cloud Run requests are currently capped at 60 minutes. After the timeout, your clients (in this case, the browser) should automatically reconnect to the server instead of failing.
In this architecture, any container instance that joined the fleet will start getting only the new messages. If you’d like to implement loading prior messages when a new server starts, you need to use persistent storage, like a database.

# Price (back of the napkin estimation)

Serverless is by design more expensive than pre-provisioned VM-based compute. Cloud Run is no exception to that. Not having to predict the load, autoscaling, and not managing the infrastructure comes at a cost.

If you deploy this app with 128MB RAM and 1 vCPU today, it will cost (0.00002400 + (0.00000250/8)) * 60 * 60 = $0.0875 per hour per instance. ¹ This means if you have 1,000 instances actively running and serving 250K clients, it will cost $87/hour, which is $62.6K/month.

It is a steep price and it might be a better fit for loads that don’t need to run as many instances, or for short-term scaling needs like an unanticipated launch/marketing event.

In the long term, as your load becomes more predictable, it makes more sense to move to VM-based compute (such as GCE or GKE) as several mid-size virtual machines can handle the same load, potentially 50x cheaper.

# Further reading

Here are some more resources if you want to learn more about WebSockets on Cloud Run:

Sans the network egress costs, I’m ignoring those as the chat messages are typically short and text-based. Also ignoring the per-request cost as it’s not contributing much here. ↩︎