top of page
Writer's pictureJonathan Weems

What is an SRE?

Updated: Mar 2, 2023

Put simply, an SRE is responsible for the reliability of a website or web service. They help define what reliability is with support of the business, and implement the changes needed for measurement. SRE's also act upon the measurements to help ensure the targets are met.


Here are some of the ways SRE supports the business:

  • Helps identify performance issues the company is unaware of.

  • Reduces customer abandonment.

  • Improves the customer experience.

  • Reduced lost revenue from services being unavailable.


SRE or Site Reliability Engineering is a relatively new term. In 2003 Ben Treynor Sloss at Google was the first person to found an SRE team, but it has taken a long time for the wider industry to adopt the term.


A 2021 report by the DevOps Institute, of the companies that participated in the report, that 22% have adopted SRE practices.


It has been more common for large companies to have many teams of SRE engineers for their many products. But SRE doesn't have to be limited to behemoths, and relatively small technology companies can benefit from the principles of SRE without needing to have a dedicated team or even a dedicated person.


Any company that has a website or a web service which they rely upon for customers or lead generation can benefit from SRE practices.


For instance, you could be:

  • E-commerce business.

  • Parts supplier or manufacturer.

  • Running a Web Service that collect geographical data from vehicle.

  • Running a dating site.

If you offer a product or service via the internet, you should be looking to adopt SRE practices.


Fundamentally, SRE is a set of principles.


  • Operations is a software problem Use software approaches to solve operation problems

  • Manage with SLO (Service Level Objectives) Appropriately set targets, agree with stakeholders. SLO violations are blameless during postmortems.

  • Work to minimize toil Don't spend large amounts of time doing the same manual process over and over. Automate it.

  • Automate this year's job away Introduce a cap on toil to free up time for automation.

  • Move fast by reducing the cost of failure Reduce MTTR (Mean Time to Repair) to increase developer velocity.

  • Share ownership with developers SREs are focused on production and hold a holistic view of the system, but share a skillset and code ownership with the developers.

  • Use the same tooling regardless of job function Developers, QA, Platforms, Architecture and SRE should all have access to the same tools, unified codebase and share the behaviour.

Behaviour is a key word here, this relates to DevOps culture, which I will cover in a future article. Be sure to sign up to my mailing list to get notified.


SRE as a job role

As mentioned, SRE can also be a Job that would be focused on production, specifically:

  • Availability

  • Latency

  • Performance

  • Efficiency

  • Change Management

  • Monitoring

  • Emergency response

  • Service capacity planning

SRE also hold a holistic view of the stack, including:

  • Frontend

  • Backend

  • Libraries

  • Kernels

  • Storage

  • Physical hardware

No matter the size of your organization, you can start benefitting from SRE practices today. Whether you choose to just familiarize your team with the principles or set up a dedicated team.


Remember, the main objective is to improve the stability of your internet presence, that will improve customer experience and reduce lost revenue.


I will be taking a deeper dive into each of the principles in future articles, if you are interested, subscribe to my mailing list or book a free 45-minute call with myself.

Recent Posts

See All

Comments


bottom of page