Blog Post

Achieving stability with agility in your CI/CD pipeline

Published
July 30, 2024
#
 mins read
By 

in this blog post

After reading this blog and watching our recent webinar on the same topic, it’s our hope you will be in a position to build better software, collaborate more effectively across the software development lifecycle, and understand how monitoring your entire DevOps lifecycle using Internet Performance Monitoring (IPM) can aid you in doing so.  

Who does this apply to? DevOps and AppDev practitioners, in particular those at more mature companies.  

Let’s face it. At an early startup, everyone does everything. You must be agile to succeed. As a result, you don’t necessarily need to be stable. You’re trying to acquire customers and that requires experimentation. But those companies grow up. You have customers and you can’t fail them. You can’t go down, you can’t afford to have major features break or have bad data, but nor can you stop innovating. That’s who we’re aiming this at.

When we talk about how to achieve stability with agility, essentially that’s the crux: how to keep innovating and acquiring new customers while at the same time keeping your existing customers happy.

Something we all strive for, but it isn’t always easy.

Using IPM to achieve stability with agility in CI/CD

Admittedly, my role as VP of Engineering (Sergey here) involves being responsible for Catchpoint’s IPM platform, so of course I am biased, but at the same time, Catchpoint very much falls into the mature category I have just described. We have a large customer base whom we can’t let down. We have a large infrastructure that has thousands of components and many, many dependencies (we’ll get more into that momentarily). Nonetheless, one of my jobs is to keep innovating and delivering new functionality and features. Here at Catchpoint, we have to be both agile and stable.  

5 IPM-centered CI/CD tips

I’d like to share a few tips from my own hard-won experience on how we strive to achieve this delicate balance - with IPM as an anchor tool:

#1 - Set up the observability you need to achieve stability

There have been times when I made the wrong decision. Times I chose agility over stability. It can be a tough choice.  

As engineering leaders, we always want to move as fast as possible. We want to minimize toil for our teams, and we want them to work on the most interesting projects for the highest reward. But moving too fast without the right systems set up to observe what’s going on inside our applications and platforms can cause massive issues.

The way to reduce toil is not just to do CI/CD, but to do it properly, you need to understand – and have a way to accurately measure - your whole system.  

#2 - Analyze ALL your dependencies (internal and external)

In our recent Internet Resilience Report, 77% of respondents said that third-party dependencies were extremely or highly critical to their Internet Resilience success.

The different teams involved in maintaining or developing your product are likely to be analyzing your internal dependencies, such as your databases, servers, storage, etc., but they’re less likely to be assessing the external dependencies – DNS, CDN, email services, etc.  

Additionally, almost no one thinks about the customer side. What if all your customers are in a single location and there’s a major ISP outage in that region, meaning none of your customers can reach your site.  

The question to repeatedly ask yourself: is your stability affected when any of these dependencies are impacted?

If so, you need to find a way to incorporate your third-party dependencies into your CI/CD playbooks and runbooks. IPM covers the entire Internet Stack. IPM will help you implement resilience where it’s most important. It will help you understand your dependencies for both your internal customers (your employees) and your external employees (your customers) so they are properly evaluated, considered, prioritized and monitored.  

A diagram of different types of softwareDescription automatically generated

#3 - Shift wide (i.e. left and right)

What does shift wide mean? Well, to back up for a moment… Shift left is the concept that by moving things earlier in the development process, you find problems faster and therefore it costs less.  

However, that’s not always the case; for example, for trivial changes such as changing the color of a button, it’s usually OK to test during production i.e. to shift right (more on that here).  

To shift wide (and encompass both), you need a ground truth i.e. a shared set of standardized measurements to be able to react quickly and fix issues if and where there’s a problem… which is where IPM comes in.

#4 - Use IPM as a common language at the center of your CI/CD pipeline

Now, let’s get down to the CI/CD pipeline in more detail.  

A diagram of a software development processDescription automatically generated

You can look across all its different phases and notice that ‘Monitor’ is right in the middle.  

Indeed, IPM specifically is a crucial stage of the CI/CD pipeline because it creates a ground truth or a common language that all the different teams responsible for different stages of your software lifecycle can use to communicate.

Your product management team, for example, can use your IPM data to specify what the performance of the system needs to be. They can then work with your DevOps or Ops team to create a monitoring strategy – before your software exists. Your developers can then use the same data to make sure their code works properly. The testers can use it for their system testing, and by the time it gets to operations, everyone is sure they’re on the same page and the entire system is working as intended.  

I call this the observable software development lifecycle (OSDLC).  

#5 - “As code” automation and integrations to reduce toil and save OpEx

Lastly, let’s talk about Observability as Code.  

After all, this is CI/CD, everything has to be automated into your workflow in order to truly be of benefit. Not only do we need automation, but it needs to be seamless and simple. In order to achieve that, you have to treat the observability you are implementing as part of your OSDLC as code.  

This way, you reduce toil for your teams, particularly when working at scale and potentially save on OpEx, removing the need for time and resources to be spent on manual, repetitive processes.

With Catchpoint, we have of course a number of ways to incorporate IPM into your CI/CD pipeline. Perhaps the most mature way I’ve seen customers run Observability as Code is to create new test configurations as new features are released using the REST API.  

This way, you tag deployments, run ad hoc testing to ensure things are working, trigger actions or other automations using Webhooks or emails, then send all your collected data to a data warehouse to combine it with other data sources to ultimately further your designated outcomes, such as business KPI reporting.  

It’s not if, but when

It all comes down to a simple question. It’s not a question of if an incident occurs, but when. What are you prepared for the impact to be?  

In the Internet Resilience Report we mentioned earlier, we found that 43% of companies estimate a total economic impact or loss of more than $1 million monthly due to Internet outages or degradations. Get your entire teams speaking the same language with IPM across the OSDLC and begin to get ahead of those outages before they impact your business.  

A graph of percentages and numbersDescription automatically generated with medium confidence
Internet Resilience Report 2024 (Catchpoint)

To find out more about how Catchpoint’s IPM platform can help you, talk to a Solution Engineer.

Further resources

Watch the related webinar, which goes into these concepts in more depth (including a sample CI/CD workflow): https://www.catchpoint.com/webinar/how-to-achieve-agility-with-stability

Find out how Catchpoint can help you across the DevOps lifecycle: https://www.catchpoint.com/application-experience/devops-lifecycle

After reading this blog and watching our recent webinar on the same topic, it’s our hope you will be in a position to build better software, collaborate more effectively across the software development lifecycle, and understand how monitoring your entire DevOps lifecycle using Internet Performance Monitoring (IPM) can aid you in doing so.  

Who does this apply to? DevOps and AppDev practitioners, in particular those at more mature companies.  

Let’s face it. At an early startup, everyone does everything. You must be agile to succeed. As a result, you don’t necessarily need to be stable. You’re trying to acquire customers and that requires experimentation. But those companies grow up. You have customers and you can’t fail them. You can’t go down, you can’t afford to have major features break or have bad data, but nor can you stop innovating. That’s who we’re aiming this at.

When we talk about how to achieve stability with agility, essentially that’s the crux: how to keep innovating and acquiring new customers while at the same time keeping your existing customers happy.

Something we all strive for, but it isn’t always easy.

Using IPM to achieve stability with agility in CI/CD

Admittedly, my role as VP of Engineering (Sergey here) involves being responsible for Catchpoint’s IPM platform, so of course I am biased, but at the same time, Catchpoint very much falls into the mature category I have just described. We have a large customer base whom we can’t let down. We have a large infrastructure that has thousands of components and many, many dependencies (we’ll get more into that momentarily). Nonetheless, one of my jobs is to keep innovating and delivering new functionality and features. Here at Catchpoint, we have to be both agile and stable.  

5 IPM-centered CI/CD tips

I’d like to share a few tips from my own hard-won experience on how we strive to achieve this delicate balance - with IPM as an anchor tool:

#1 - Set up the observability you need to achieve stability

There have been times when I made the wrong decision. Times I chose agility over stability. It can be a tough choice.  

As engineering leaders, we always want to move as fast as possible. We want to minimize toil for our teams, and we want them to work on the most interesting projects for the highest reward. But moving too fast without the right systems set up to observe what’s going on inside our applications and platforms can cause massive issues.

The way to reduce toil is not just to do CI/CD, but to do it properly, you need to understand – and have a way to accurately measure - your whole system.  

#2 - Analyze ALL your dependencies (internal and external)

In our recent Internet Resilience Report, 77% of respondents said that third-party dependencies were extremely or highly critical to their Internet Resilience success.

The different teams involved in maintaining or developing your product are likely to be analyzing your internal dependencies, such as your databases, servers, storage, etc., but they’re less likely to be assessing the external dependencies – DNS, CDN, email services, etc.  

Additionally, almost no one thinks about the customer side. What if all your customers are in a single location and there’s a major ISP outage in that region, meaning none of your customers can reach your site.  

The question to repeatedly ask yourself: is your stability affected when any of these dependencies are impacted?

If so, you need to find a way to incorporate your third-party dependencies into your CI/CD playbooks and runbooks. IPM covers the entire Internet Stack. IPM will help you implement resilience where it’s most important. It will help you understand your dependencies for both your internal customers (your employees) and your external employees (your customers) so they are properly evaluated, considered, prioritized and monitored.  

A diagram of different types of softwareDescription automatically generated

#3 - Shift wide (i.e. left and right)

What does shift wide mean? Well, to back up for a moment… Shift left is the concept that by moving things earlier in the development process, you find problems faster and therefore it costs less.  

However, that’s not always the case; for example, for trivial changes such as changing the color of a button, it’s usually OK to test during production i.e. to shift right (more on that here).  

To shift wide (and encompass both), you need a ground truth i.e. a shared set of standardized measurements to be able to react quickly and fix issues if and where there’s a problem… which is where IPM comes in.

#4 - Use IPM as a common language at the center of your CI/CD pipeline

Now, let’s get down to the CI/CD pipeline in more detail.  

A diagram of a software development processDescription automatically generated

You can look across all its different phases and notice that ‘Monitor’ is right in the middle.  

Indeed, IPM specifically is a crucial stage of the CI/CD pipeline because it creates a ground truth or a common language that all the different teams responsible for different stages of your software lifecycle can use to communicate.

Your product management team, for example, can use your IPM data to specify what the performance of the system needs to be. They can then work with your DevOps or Ops team to create a monitoring strategy – before your software exists. Your developers can then use the same data to make sure their code works properly. The testers can use it for their system testing, and by the time it gets to operations, everyone is sure they’re on the same page and the entire system is working as intended.  

I call this the observable software development lifecycle (OSDLC).  

#5 - “As code” automation and integrations to reduce toil and save OpEx

Lastly, let’s talk about Observability as Code.  

After all, this is CI/CD, everything has to be automated into your workflow in order to truly be of benefit. Not only do we need automation, but it needs to be seamless and simple. In order to achieve that, you have to treat the observability you are implementing as part of your OSDLC as code.  

This way, you reduce toil for your teams, particularly when working at scale and potentially save on OpEx, removing the need for time and resources to be spent on manual, repetitive processes.

With Catchpoint, we have of course a number of ways to incorporate IPM into your CI/CD pipeline. Perhaps the most mature way I’ve seen customers run Observability as Code is to create new test configurations as new features are released using the REST API.  

This way, you tag deployments, run ad hoc testing to ensure things are working, trigger actions or other automations using Webhooks or emails, then send all your collected data to a data warehouse to combine it with other data sources to ultimately further your designated outcomes, such as business KPI reporting.  

It’s not if, but when

It all comes down to a simple question. It’s not a question of if an incident occurs, but when. What are you prepared for the impact to be?  

In the Internet Resilience Report we mentioned earlier, we found that 43% of companies estimate a total economic impact or loss of more than $1 million monthly due to Internet outages or degradations. Get your entire teams speaking the same language with IPM across the OSDLC and begin to get ahead of those outages before they impact your business.  

A graph of percentages and numbersDescription automatically generated with medium confidence
Internet Resilience Report 2024 (Catchpoint)

To find out more about how Catchpoint’s IPM platform can help you, talk to a Solution Engineer.

Further resources

Watch the related webinar, which goes into these concepts in more depth (including a sample CI/CD workflow): https://www.catchpoint.com/webinar/how-to-achieve-agility-with-stability

Find out how Catchpoint can help you across the DevOps lifecycle: https://www.catchpoint.com/application-experience/devops-lifecycle

This is some text inside of a div block.

You might also like

Blog post

Monitoring in the Age of the Internet: DEM, IPM, and APM—What You Need to Know

Blog post

2024: A banner year for Internet Resilience

Blog post

AppAssure: Ensuring the resilience of your Tier-1 applications just became easier