A curated collection of SRE and DevOps resources, articles, and tools.
My Articles on Medium
Five whys you can’t do SRE and why you can
Published on DevOps Links, 2019
Day in the life of a Google Tech Lead Manager
2015
Google Cloud Blog - CRE Life Lessons Series
- Using deemed SLIs to measure customer reliability
- Introducing a new Coursera course on Site Reliability Engineering
- Understanding error budget overspend
- SRE vs. DevOps: competing standards or close friends?
- Defining SLOs for services with dependencies
- Consequences of SLO violations
- Know thy enemy: how to prioritize and communicate risks
- How release canaries can save your bacon
- The practicalities of dark launching
Official Resources
- SRE @ Google
- CRE @ Google - Customer Reliability Engineering program
Educational & Reference Materials
- Getting Started with SRE
- SRE, A Simple Overview (O’Reilly)
- The Calculus of Service Availability (ACM Queue)
- Borg, Omega and Kubernetes (ACM Queue)
- Weathering the Unexpected (ACM Queue)
- Canary Analysis Service (ACM Queue)
Tools & Utilities
- SLO Calculator - Calculate service availability percentages
- Awesome SRE - Curated collection of SRE resources
- Awesome Bare Metal - Resources for bare metal infrastructure