Blog Post

Catchpoint’s SRE Report 2020 – The Highlights

Published
June 24, 2020
#
 mins read
By 

in this blog post

Our 2020 SRE Report is ready! We launched the SRE survey 2020 this January with the goal of understanding the current state of SRE. The survey covered a range of topics including:

  • Distribution of responsibilities among SREs and related teams
  • The role of synthetic monitoring in the SRE playbook
  • The prevalence of automation in the SRE toolkit

As we neared the end of the survey period, the SRE community was in the midst of a sudden change. SRE teams were forced to migrate to all-remote IT. We realized we would not be able to provide an accurate analysis without considering this shift in how SRE teams were operating in this new environment. This led to the launch of an addendum survey we called the SRE from Home Survey that gives special consideration to a remote and distributed workforce.

The 2020 SRE report includes survey results and analysis from both the “pre” work-from-home phase to the current scenario where SRE teams are working remotely. We received over 600 responses from SREs as well as individuals who identified as doing SRE-type work. The Catchpoint report offers one of the industry’s most unique perspectives on what it means to be an SRE in 2020. In this blog, we discuss some of the highlights from the report.

Survey Highlights

SRE teams are tasked with monitoring and managing application reliability, availability, performance, system efficiency, and providing a quick incident response. They work toward designing observable systems to prevent service disruptions instead of reacting to service disruptions. Although the common goal is to uphold reliability, the practices and components (tool stack, organization structure, etc.) involved vary. The work-from-home environment brings in new components into the mix such as employee experience and human wellness.

The survey questions revolved around how SREs have evolved while maintaining focus on the core responsibility of maintaining reliability. Let’s look at some interesting analysis from the survey results.

Observability Components Exist; Observability Does Not

We asked respondents about the tool categories that were relevant to the role of an SRE and 93 percent picked “Monitoring and Alerting” compared with 53 percent choosing observability.

Additionally, when asked about their key responsibilities, the majority ignored those aligned with the observability pillars (events, metrics, and tracing) highlighting the lack of true observability. The responses show that although there are components of observability in the SRE spectrum, there is a lack of observability as a practice.

Heavy Ops Workload Comes at a Cost

There has always been a blurred line when it comes to DevOps in the SRE community. According to Google, there should be an upper bound goal of 50% ops work and 50% dev, but this 50/50 split may just be a pipe dream. Based on the survey results, most of the SRE work is dominated by operations-type activities.

We asked, “What percent of your work is spent on development?” and only 14% said more than 50%.

In the pre-COVID survey, 75 percent said they are spending more time on Ops resulting in increased cost of owning and maintaining systems. Two and a half months into working from home, the results showed a net 10 percent increase in ops related responsibilities.

Shift to Remote Creates Opportunities and Challenges

The addendum survey results highlight an important fact – the future of SRE is remote and bright!

One of the highlights of our 2018 SRE survey was “If you’re looking to work remotely the SRE role may not be the role for you. While some SREs work remotely, 81% of SREs state all or most of their teamwork in an office.”

Interestingly, the post-pandemic environment has resulted in a major shift in SRE work. 50 percent of SREs believe they will be working remotely post COVID-19, as compared to only 20 percent prior to the pandemic.

Many respondents noted they were dealing with a new set of challenges while working from home. More than half of the respondents considered staying focused and having a good work/life balance a major challenge.

Summary

With over 600 respondents and between two surveys the SRE Report 2020 is a unique perspective into SRE life before the shift to nearly all-remote IT. We put together a report that is an honest and humane taking a look at the trends, status, and challenges facing today’s SRE pioneers.

This year’s SRE 2020 report highlights an objective that may be common among relevant practitioners, regardless of their title: designing observable systems to prevent service disruptions instead of reacting to them. The survey results provide insight into what observability means to the SRE community, the key metrics monitored, the alignment with DevOps, and how SREs are navigating the remote work environment.

Read more about the survey analysis, discover key recommendations, and view the complete survey results in the full report. Download the SRE Report 2020.

Our 2020 SRE Report is ready! We launched the SRE survey 2020 this January with the goal of understanding the current state of SRE. The survey covered a range of topics including:

  • Distribution of responsibilities among SREs and related teams
  • The role of synthetic monitoring in the SRE playbook
  • The prevalence of automation in the SRE toolkit

As we neared the end of the survey period, the SRE community was in the midst of a sudden change. SRE teams were forced to migrate to all-remote IT. We realized we would not be able to provide an accurate analysis without considering this shift in how SRE teams were operating in this new environment. This led to the launch of an addendum survey we called the SRE from Home Survey that gives special consideration to a remote and distributed workforce.

The 2020 SRE report includes survey results and analysis from both the “pre” work-from-home phase to the current scenario where SRE teams are working remotely. We received over 600 responses from SREs as well as individuals who identified as doing SRE-type work. The Catchpoint report offers one of the industry’s most unique perspectives on what it means to be an SRE in 2020. In this blog, we discuss some of the highlights from the report.

Survey Highlights

SRE teams are tasked with monitoring and managing application reliability, availability, performance, system efficiency, and providing a quick incident response. They work toward designing observable systems to prevent service disruptions instead of reacting to service disruptions. Although the common goal is to uphold reliability, the practices and components (tool stack, organization structure, etc.) involved vary. The work-from-home environment brings in new components into the mix such as employee experience and human wellness.

The survey questions revolved around how SREs have evolved while maintaining focus on the core responsibility of maintaining reliability. Let’s look at some interesting analysis from the survey results.

Observability Components Exist; Observability Does Not

We asked respondents about the tool categories that were relevant to the role of an SRE and 93 percent picked “Monitoring and Alerting” compared with 53 percent choosing observability.

Additionally, when asked about their key responsibilities, the majority ignored those aligned with the observability pillars (events, metrics, and tracing) highlighting the lack of true observability. The responses show that although there are components of observability in the SRE spectrum, there is a lack of observability as a practice.

Heavy Ops Workload Comes at a Cost

There has always been a blurred line when it comes to DevOps in the SRE community. According to Google, there should be an upper bound goal of 50% ops work and 50% dev, but this 50/50 split may just be a pipe dream. Based on the survey results, most of the SRE work is dominated by operations-type activities.

We asked, “What percent of your work is spent on development?” and only 14% said more than 50%.

In the pre-COVID survey, 75 percent said they are spending more time on Ops resulting in increased cost of owning and maintaining systems. Two and a half months into working from home, the results showed a net 10 percent increase in ops related responsibilities.

Shift to Remote Creates Opportunities and Challenges

The addendum survey results highlight an important fact – the future of SRE is remote and bright!

One of the highlights of our 2018 SRE survey was “If you’re looking to work remotely the SRE role may not be the role for you. While some SREs work remotely, 81% of SREs state all or most of their teamwork in an office.”

Interestingly, the post-pandemic environment has resulted in a major shift in SRE work. 50 percent of SREs believe they will be working remotely post COVID-19, as compared to only 20 percent prior to the pandemic.

Many respondents noted they were dealing with a new set of challenges while working from home. More than half of the respondents considered staying focused and having a good work/life balance a major challenge.

Summary

With over 600 respondents and between two surveys the SRE Report 2020 is a unique perspective into SRE life before the shift to nearly all-remote IT. We put together a report that is an honest and humane taking a look at the trends, status, and challenges facing today’s SRE pioneers.

This year’s SRE 2020 report highlights an objective that may be common among relevant practitioners, regardless of their title: designing observable systems to prevent service disruptions instead of reacting to them. The survey results provide insight into what observability means to the SRE community, the key metrics monitored, the alignment with DevOps, and how SREs are navigating the remote work environment.

Read more about the survey analysis, discover key recommendations, and view the complete survey results in the full report. Download the SRE Report 2020.

This is some text inside of a div block.

You might also like

Blog post

Catch frustration before it costs you: New tools for a better user experience

Blog post

Lessons from Microsoft’s office 365 Outage: The Importance of third-party monitoring

Blog post

Preparing for the unexpected: Lessons from the AJIO and Jio Outage