This year Google Cloud Next 2017 was a constant source of amusement for me. I just wanted to reflect on several talks I watched that you might find interesting if you are into DevOps and SRE topics.

DevOps/Kubernetes

Reliability

  • Ten common causes of downtime and how to avoid them: Great talk on how you should be thinking of designing your infrastructure for high availability and scale.
  • Metrics that matter: A short talk by the founder Google’s SRE team Ben Treynor about measuring the right metrics for user-facing services.
  • Designing reliable systems with cloud infrastructure: If you are new to the SRE mindset and have not read the free SRE book check this talk out. Paul Newson talks about how to architect your infrastructure on Google Cloud to achieve higher availability. Great walkthrough of capacity planning for an example distributed service and illustrations of traffic flow during various failure modes.

More

You can find videos for all 216 talks from the conference on YouTube.