Webinar

What’s required for modern observability in 2025?

In an era of rapid digital transformation, IT leaders and DevOps teams face mounting pressure to align IT metrics with business goals, simplify tool sprawl, and manage costs effectively.

Inspired by findings in The SRE Report 2025, this webinar breaks down what’s truly required for modern observability, equipping you with the insights needed to move beyond traditional monitoring approaches. We’ll explore the disconnect between business and IT priorities, examine the benefits and trade-offs of tool consolidation, and highlight emerging trends such as OpenTelemetry. Whether you’re leading strategy or on the front lines of implementation, you’ll leave with actionable insights to evolve your approach.

Key Takeaways:

Discover how to bridge the gap between IT and business by aligning metrics to deliver impactful results.
Explore the pros and cons of tool consolidation, balancing simplicity, cost savings, and flexibility.
Understand how OpenTelemetry simplifies telemetry collection and enables seamless tool integration for unified monitoring.
Learn the key elements of modern observability and evolve your monitoring tools and strategies.

Video Transcript

In an era of rapid digital transformation, IT leaders and DevOps teams face mounting pressure to align IT metrics with business goals, simplify tool sprawl, and manage costs effectively.

Key Takeaways:

Discover how to bridge the gap between IT and business by aligning metrics to deliver impactful results.
Explore the pros and cons of tool consolidation, balancing simplicity, cost savings, and flexibility.
Understand how OpenTelemetry simplifies telemetry collection and enables seamless tool integration for unified monitoring.
Learn the key elements of modern observability and evolve your monitoring tools and strategies.

Video Transcript

Jared H

00:00 - 01:31

Good morning, good afternoon, or good evening, depending on where you are tuning in from today. We wanna thank you for joining this Tech Strong learning experience with CatchPoint.

My name is Jared, and I'd like to welcome you to our Tech Strong learning program. A few notes for today.

This session is gonna be recorded. It is avail will be available on demand.

We're gonna send you a link to the recording after we conclude as well. Be sure to check out the handouts in the resource section on the left side of your screen for all of our additional comment, content that we're, providing you guys today.

Also take note that we we have a chat tab on the right side of your screen. Any, insights that you guys wanna share, tell us why you're tuning in today, why you joined, all of that good stuff.

And then any of your questions that you have throughout the presentation, put those in the q and a section. We we do want to engage with you.

We wanna hear where you're at, and and and any questions that we don't get to today. We want to, make sure that, you know, we will do our best to follow-up with you shortly after, our program concludes today.

So let's go ahead and get our our topic kicked off today. What is required for modern observe observability in 2025? We're joined with Leo Vaslu, former DevOps practitioner and author of the SRE report.

He's he's here with us from CatchPoint. We've also got Sergey, Katsev.

He's the VP of engineering at CatchPoint. We wanna thank Leo and Sergey for joining us today.

Leo, do you wanna go ahead and take it away, while I turn my camera and microphone off?

Leo Vasiliou

01:31 - 04:59

Yeah. Sounds like a plan, Jared.

Thank you for the introduction. And, I'll also just take a moment to say thank you to everyone who decided to give us some of your, precious, time today.

So if we go ahead and, hop right in, here's how this discussion will break down. We'll take a moment or two to set the stage for the session, what situations led us to why we felt the need to have this talk in the first place.

Then we'll go on a little exploration of how we got into these situations in the first place. Then we'll get into some tooling and tool chains.

Hint, goes way beyond just with the application. And then we'll wrap up by discussing some of the things we're hearing from our customers, some of the tactics people use to, to help with some of the problems that, excuse me, that we will discuss.

So without further ado, I'll just mention that since Sergey and I scheduled this webinar, we've seen a lot of other assets on, same or similar topics from analysts, from other companies in the space. So why this one instead of those other ones? Well, a lot of them focus on specific tools, and they try to speak to specific audiences.

So Sergei and I will try to be as tool neutral or as tool agnostic as possible, talk about some techniques and concepts, but also as authors and contributors of the SRE report, we feel like we have a good idea of what's top of mind for pains, problems, and challenges of practitioners and leaders alike. But this year, in our research, we see a rising disconnect between practitioners and their leadership.

So instead of focusing on just one that is practitioners or leaders, we're gonna try to speak to both audiences, idea being that it will hopefully help catalyze better conversations. So why why now? It comes down to two words, confusion and evolution.

People are confused as to what observability is, confusion between different tools, between different ranks, for example, between individual contributors, managers, directors. Even the other day, I was about to go into an event, and someone asked me a question that made me think, I'm not exactly sure we're on the same page.

But here's the kicker. What if I'm the one who's confused? So maybe as we go through the session, y'all can help me make that determination.

And then for evolution, we're talking about this because the contributing factors have evolved. They've changed so substantially, and the things that matter to achieving a goal or, trying to get to an outcome, for example, putting all the tech pieces together to deliver a great experience and that which can disrupt that from happening, we don't think about some of these things until there is an incident, until there is a problem.

And what's worse, if there is an incident, if you don't know where to look, kinda like that unknown unknowns, then you might have to start accelerating on how you think of a different, approach. So, Serge?

Sergey Katsev

04:59 - 09:28

Yeah. So, I guess we'll start with the traditional definition of observability.

And please forgive me. I may go on a little bit of a rant because I don't think this definition makes any sense.

You can't so the traditional definition, of course, as it says there on the slide is metrics, events, logs, and traces. But old metrics, some metrics, which metrics? Which are the important ones? Does an event tell you everything that you need to know? Obviously, the answer is no.

Right? And so defining it like this is like defining a car company as a company that sells rubber and metal or defining a software company as a company that sells compute and storage. Technically true, but the reality is for the car company, the value is it gives you the freedom to travel from point a to point b.

For a software company, the value is it makes your life easier in some way. Right? So the same thing for observability.

What is the value that observability in 2025 should be bringing to you? It needs to help you quickly solve a problem, and it needs to understand different components and how they all interrelate with each other. And so the if you would go on to the next slide, Leo, the as far as quickly solving a problem and understanding different components, the way I think about it is what kind of questions can observability actually answer? And here are some examples.

Right? So what is our most popular product? Are our customers happy? Those are examples of business side questions traditionally. On the right side, is my system resilient? Why does my server return this error? Those are examples of technology side questions.

Traditionally, there's been a disconnect. Right? Your, CFO might be asking some of the questions on the left.

Your site reliability engineers might be answering or asking some of the questions on the right. Never the never the two shall meet.

But the reality is observability is the thing that is supposed to make them meet. And so one of the reasons that we need modern observability, of course, right, is that back in the day I'll get to the GeoCities example in just a second.

But back in the day, if you had melt, you collected some metrics, they were probably good enough to tell what's going on with the system. But that's not true anymore because now things are complex.

And with complexity, we talked about different roles within a team. You start having specialists.

You start having differences between a junior engineer that maybe is focused on a specific module versus a senior engineer or an architect that is focused on the entire system or maybe a subsystem. And the reality is you don't have very many of that latter category.

Right? You probably have one or two or maybe a small handful of people that understand how the entire system works, and so you need tools to help you with that. That's what observability provides.

And then the second piece is you are getting all of these business questions. Right? And so as somebody navigates their career, they start first, they go, you know, depth, and then they go breadth in in technology.

And so they once they understand the full system, then they start thinking more about the the business objectives rather than the technology objectives. And so, again, observability helps with that.

And so back to GeoCities, if I may for a second. So this is a real website from GeoCities, which for anyone that doesn't know, was kind of one of the first web hosting companies on the Internet in the nineties.

And so I pulled this from the Internet archive. And, Leo, if you had to guess, what kind of dependencies did this website have?

Leo Vasiliou

09:28 - 09:38

I mean, besides, like, infrastructure. I don't know.

Maybe they were serving some some ads or something, or I don't even remember if ads were as.

Sergey Katsev

09:38 - 10:16

the reality is, yes, there was infrastructure, right, because, obviously, it had to run on the server. But here's the request map.

It's one dot. There were no external dependencies for the application itself.

Right? So you would load the images in. Everything would be served from that web server.

That's it. And now compare that to Visa, for example, or Netflix, or Macy's.

Right? And there's so much more complexity.

Leo Vasiliou

10:16 - 10:19

I feel like you're trying to make a point here.

Sergey Katsev

10:19 - 11:43

I'm trying to make a point. So first of all, none of these examples are, the modern ones anyway, are good or bad.

Right? They're examples of how the real world works as far as web applications. Right? And so these are, request maps that I pulled from from CatchPoint to help illustrate the point.

But the reality is as things get more complex, does your observability help you answer the right questions? Right? So with this Macy's example, you see that little purple circle next to the big purple circle? If first of all, what is it? I don't know. But if that goes down, what happens to the rest of the application? Do you have any way of figuring that out? Right? If something is down with the application itself, can you tell which of those circles, you know, there's a hundred of them, actually caused the problem? And then my my personal favorite is if nothing is wrong, are you sure that nothing is wrong? How do you know? Right? And so all of those are questions that, again, observability in 2025 should be able to answer.

Leo Vasiliou

11:43 - 13:53

I mean, that's kinda crazy when you think about it, Serge. So, you know, talking about the system, we don't build them just to build them.

We usually build them with some type of purpose, some type of goal. For example, that purpose may be, to fulfill some business requirement.

Right? Sell to our customers, make a little bit of money, right, selling your products and services. But the other thing that it makes me think about, Serge, is the sheer volume of data that is caused by that increased number of dots.

And, by the way, maybe, more dots we can start using as a highly technical term when we, have to explain this to a five year old. But, but, anyways, the data is so much it's becoming harder harder to manage all that telemetry from an insight and perspective, and we're kinda sort of talking about APM, right, application performance monitoring.

That these problems are so real that Gartner, this year, for the first time ever, made a magic quadrant for digital experience monitoring, Internet performance monitoring, maybe better stated, maybe better set as, like, an outside in perspective. And so, before we move to the next session, I'll just say, in addition to that rise in complexity leading to a rise in data, rise in cost, when we get to exploring the factors, we'll impact some of the tactics that we're hearing from our customers on how they manage that.

So please, stay tuned. So moving on to the next section, I think we did, a decent enough job just to frame the conversation.

So now let's get back, into it. What should observability do? What types of questions should it be able to answer? But then, well, why can't it do that? Why can't it answer those questions? Why are there competing views, the problems caused by this evolution? But before we could speak to that, we kinda need to take a step back and understand a foundational component, the idea fundamentally of that disconnect.

And there are three of them. Right? The age old business versus IT, agility versus stability, usually.

‍

Sergey Katsev

13:53 - 13:53

in the.

Leo Vasiliou

13:53 - 15:13

context of how much will it cost. The business wants to sail on a yacht, but they give IT money enough to buy only a dinghy.

Which one of those is gonna be more resilient and stay afloat on the open water? Seniority, as you mentioned, which we use to talk about the amount of experience, but also whether it's the correct experience. For example, junior engineers maybe focusing on specific components versus senior architects, engineers focusing on holistic systems and outputs.

And then the tool chains, it's application metrics, events, logs, traces, which probably would just say MELT, from here on out, if MELT were in fact observability, we wouldn't even be having that, this webinar. So having said that, let's flash some of the data from the SRE report that inspired us to have this talk in the first place.

Briefly reading the chart, what this is saying is individual contributors and team leads say they feel they have instrumented less observability than they should. Inversely, managers and directors say they have instrumented either the correct amount or more observability, and they should.

So that's a quick, read of what this semi complex chart is trying to say.

Sergey Katsev

15:13 - 15:41

Yeah. And and to me, there's two questions to ask here, of course, and they talks directly to the disconnect on the previous slide.

Number one, who is it that manages the budget? Number two, who is it that understands the level of complexity? Right? So, obviously, if you're managing the budget, you want less. However, if you actually understand how complex things are, you want more.

Leo Vasiliou

15:41 - 16:21

I know, well said. Well said.

Now if those points were not enough or, if the contrast in that previous chart was not stark enough, then take a look at this one. This question was essentially asking, do you do chaos engineering? Look at how sharp that contrast is.

Individual contributors saying, hell no. We don't do this.

Managers and directors saying, yeah. We do this because we're awesome.

So how can we even get better, how can we get better if we don't even agree on what our current state looks like?

Sergey Katsev

16:21 - 17:08

Right. And so to me, this this speaks to one of two things.

I'm not sure which. I'm not going to guess.

Maybe some of the attendees, want to, but it either speaks to companies that do checkbox experiments, meaning it's for compliance. Yeah.

We do a tabletop test, and so the manager knows that the tabletop test happens. But the individual contributors that know that the, that the company has all of this complexity don't consider it to be a real test.

Or, of course, the flip side of that is in large companies, there may be specific teams responsible for it and maybe an individual contributor that is in a silo on a particular team just doesn't know about it.

Leo Vasiliou

17:08 - 17:20

You know, give me give me a good idea for potential research for, for the next report. Right? Like, trying to validate or invalidate some of these, hypotheses.

Yeah. Sorry, Serge.

I didn't mean to,.

Sergey Katsev

17:20 - 17:20

absolutely.

Leo Vasiliou

17:20 - 17:23

So back over back over to you.

Sergey Katsev

17:23 - 20:49

Yeah. So, again, just to kind of resummarize, things are more complex, and there's disconnects about how complex they are.

And so as we've been talking about, observability is the thing that, in our opinion, in our humble opinion, ties all of these things together and answers the question. Right? So I'd wanted to call out some of the other ways that things have gotten more complex.

Right? So in this diagram here, it looks just like the, web request diagrams from from previously that we were showing for different websites. This one has infrastructure and locations of clients.

Right? So to really put a fine point on it, if you look here, well, that all those dots, that's where your customers are located. And by the way, now in a post pandemic world with everybody or lots of people working remotely, it's also where your developers are located.

And with complex applications, guess what? You're not really connecting to the actual server anymore. You're connecting to some edge.

Right? Maybe it's a worker function. Maybe it's a CDN.

Maybe it's some other piece of infrastructure that is closest to you, and then it goes and talks to the what's called the origin server. And so the the big question is, how do you know what you need to think about when you're evaluating your observability strategy? Right? A lot of, our customers or even engineers they work with will come and say, well, we just are going to deploy this on in the cloud.

Right? And they'll they'll name one of the cloud providers, So I don't need to worry about it. But the reality is if it's in the cloud, there's still DNS in between depending on the application.

Maybe it relies on NTP. Maybe it relies on email.

Right? There are still lots of different dependencies. And guess what? The cloud goes down a lot more often than anyone thinks.

And so I actually have my teams work through an exercise like this or occasionally with customers to make sure that we understand what the full data flow is. And we're not gonna spend the time here, but here's an example.

Right? Internal applications are applications that will impact the productivity of your team if something happens to them. Right? So if if you use, let's say, SharePoint or Salesforce and it goes down, well, your internal collaboration decreases, your sales team maybe can't make records of of their, sales conversations, etcetera.

Right? Not a good thing. Customer applications, well, that's what makes you money as a software company.

Right? So maybe it's your SaaS product. Maybe it's a integration with that product to provide support, for example, Zendesk, etcetera.

And and then the largest category I see are dependencies, and people always forget about those, DNS, the email, the CDM, the NTP, etcetera.

Leo Vasiliou

20:49 - 23:51

And, Serge, I'd like to just take a quick moment and, try to penetrate a little deeper into the people watching right now. It's one thing to see a slide and a webinar that says, here's an exercise I go through, But I challenge you to think that the value and impact of actually going through it, in a form of a role play or a workshop or something, just trust me when I say the value you'll get from that is way more than what I think is possible by looking at a slide in a webinar.

So, now that I've got that off my chest, let's, let's continue to, move on. Some of the nitty gritty, tooling and tool chains.

Now in the SRE report, for the last few years, we have researched the idea of tool sprawl for many years in the report. And even though the questions were worded completely differently, the conclusion has been the same.

The majority of respondents say there is no tool sprawl problem because if they think about it as, is the value we receive from our tool chain greater than its cost? If that answer is yes, there is no tools for a problem. In other words, unlike how I've heard a lot of people think about it, it is not just a count of the number of tools.

Right? Oh, we have four tools. Oh, we don't have a tools for our problem.

Oh, we have six tools. Oh, we have a tools for our problem.

No. Please, it does not work that way.

And so if you use multiple tools, that is absolutely okay. Alright.

Now one of the things we're hearing now this is this is not from, the research. This is just from our interactions with our customers, but we generally hear that more and more of our customers are getting their, infrastructure and network telemetry from their cloud providers.

And you very the people listening, you very well may be on different points of this evolutionary journey. For example, you might still be in your own data center or hybrid, any number of possible combinations.

So the main thing I would like to ask here is ask yourselves, how much technical debt does your org have from old tooling. Is that taking away from the overall value because of the legacy cost? For example, it's more expensive in the form of time to maintain that old tooling or respond to others.

Maybe because it's not as efficient as modern tooling, that has AI. And second, and maybe more importantly, what tools will you need to successfully continually evolve, which brings us to the tool chain disconnects.

So, Serge, I'd like to turn it over to you here, your thoughts on the, on this third disconnect.

Sergey Katsev

23:51 - 26:20

Yeah. Absolutely.

And, yeah, the the goal is, again, to answer the questions. Right? It's not to choose a specific tool.

There's a lot of, acronyms, vocabulary out there. Right? APM, NPM, synthetics, RUM, infrastructure monitoring, open telemetry.

To be honest with you, you need all of them if you have the right application that needs all of them. Right? So going through an exercise like what we presented a second ago is what's going to tell you which you need.

And the other way to look at it is that depending on which piece of the Internet stack you need to observe, the set of tools changes because it's not possible to use some of them to observe different pieces. Right? And so you really need to look at this as a holistic user experience journey, and that's what you are observing.

Right? If this, lady here is on her cell phone in the middle of, let's say, Midtown Manhattan in New York City and is trying to get to your website, well, what is between her and experience having a good website experience? Right? There's, of course, the the application itself, but there's so much more. Right? For example, her cell phone carrier in the middle of Midtown Manhattan may have congestion.

Right? A slightly different example, if you have an application for, let's say, a very, very, very small subset of of clients that need to connect to it from their home Internet and their home Internet is broken, guess what? They can't connect to your application. But if that is literally your business, maybe you need to monitor their home Internet.

Right? It it's a canned example, but the the point is you need to monitor what matters to the user experience, and that'll be everything from the application to all of the dependencies to the network.

Leo Vasiliou

26:20 - 26:26

Nice visual, by the way, Serge. I like, I like how it captures the essence.

Sergey Katsev

26:26 - 28:43

And so kind of tying it back to the, the exercise from earlier. Right? We kinda said, okay.

Some of these applications, you have a high level of control. Some of these applications, you have a low level of control.

Some of them, you have no control. It's somebody else's application.

Right? If you are using, I don't know, Salesforce, for example, maybe you have no control over it whatsoever, but it has an impact on you. If there's a problem with it, you need to know about it so that you can make alternate plans.

Right? Not to call out any particular vendor, just using it as an example. And so the the point is as you go from internal to external to dependencies, and as you go from high control to no control, you have fewer and fewer choices.

That's what I was saying before. So to to kind of really drill down on it and get really tactical for a second, if you have a high control application, so let's say this is this is your baby, this is what you're developing, what you're selling to customers, you kind of have whatever tools at your disposal that you want.

Right? You can use APM. You can use NPM to monitor the network.

You can use Internet synthetics to see the end to end experience. You can use something like endpoint monitoring to see what a particular laptop's experience with that application is.

You can use real real user monitor to collect telemetry of actual people connecting to your application. Right? The problem is your application isn't the only box in the Internet stack.

And so when you start thinking about all of the other boxes in the Internet stack, you start to see low control applications. And so you have an application here that you no longer have the ability to use APM.

You no longer have NPM because it doesn't run on your network. It runs on somebody else's.

You can't use endpoint. You can't use RAM.

The only option in a circumstance like this is Internet synthetics to monitor that full Internet stack.

‍

Leo Vasiliou

28:43 - 30:01

You know, Serge, I just wanna take a second to make sure that I'm picking up what you're putting down. And so when I hear that talk track, it makes me think of, okay.

So how is that gonna manifest? Right? If especially if if people watch and listen in our our, the actual strategist who have to implement, you know, their these monitoring observability frameworks. Well, I think here's an example.

High control versus low control manifests in your monitoring telemetry. So if you're monitoring, for example, right at the source where you're hosted, that's that kinda top half.

Things might look green. Things might look good.

Right? You're right next to, right next to the source in the cloud. But the experience of your users may be not so much, right, at the bottom.

And so what comes to mind is this idea maybe, again, just an example is, internal service level objectives, meaning those target versus external experience level objectives. Two totally separate things require separate tooling.

And, if you are using the wrong tooling, how the heck would you even know? And that's just one example of how I think it could, manifest.

Sergey Katsev

30:01 - 31:33

Yeah. Absolutely.

And to go back to this visual for a second, another couple of examples, real world scenarios that we have heard, oh, time and time again from customers. They go and they implement APM, and their application is perfect.

They have a dashboard that says there are no errors, but there's no traffic coming into the application. Right? Application's great.

Everything's working exactly as it should, but there's dependencies in between, and they don't know about them. They're blind to them.

Right? Another example, this is a real world example that really illustrates not only having to measure the individual, let's say, icons in this Internet stack, but actual user journeys. You have to think about what your customers are trying to accomplish, what value are they trying to get out of the thing.

And that is we've seen a ecommerce website. Everything was working perfectly.

Customers are coming in, adding things to the shopping cart, buying them. Everything's wonderful.

Except in a particular region in the world, everything works except the checkout button. So now customers are coming in.

They're adding things to the shopping cart. Guess what happens when they can't click the checkout button? They're they're going to your competitor.

Leo Vasiliou

31:33 - 32:05

Serge, there's a comment I think is interesting. All of my customers are b two b.

I'm not sure if that was said that that would not apply in that instance or if it would apply. Yeah.

I think it would still apply because, you're not in the same cloud sitting next to each other. So that's just my reaction.

So agree.

Sergey Katsev

32:05 - 32:23

with that. I agree with you.

You mentioned SLAs earlier. Right? With b two b, you're a lot more likely to have an SLA with your customer or rather your customer has an SLA with you.

So your service your service needs to work properly and all of the things that you depend on.

Leo Vasiliou

32:23 - 32:37

You know what? Now that you said that, it might even be more important because now you're talking penalties. On the line.

You you got money on the line versus just somebody at you on, you know, whatever Twitter x or whatever the heck it's called. Sorry.

So go ahead, Serge.

Sergey Katsev

32:37 - 34:11

Yeah. No.

The so that's it for that one. The other thing I wanted to mention as an engineer, I feel like I have to, is that modern observability has to integrate with your CICD.

Right? In other words, you can't be doing this completely separately. You can't be doing it manually.

It can't be an afterthought. It needs to be there from the planning stage all the way through.

Right? And so we actually have a webinar on how to do this fully. Feel free to go look it up.

Maybe somebody can post a link to it. But here's an example.

Right? From day one, when you're defining the functional requirements, guess what? The observability requirements should be there also because that's what's going to answer those business questions as you go all the way through. And then that lets, for example, your QA team already know how to automate things and what things matter.

Right? It helps your operations team monitor because, obviously, everything is already defined. We we've already said what's important, what's not.

It helps your development team automatically test release, test builds as they come up and then automatically promote them from environment to environment. Right? So this is something that really has evolved over the last couple of years, but now it's a very strong requirement.

Leo Vasiliou

34:11 - 41:00

Alrighty. Alrighty.

Before we work to wrap this up, let me just take a moment to say thank you very much for listening to, an extremely long joke, if you would. We're now getting to the climax or the crescendo or or the punchline of the joke.

So, let's, let's hop right in. First, here's what I would ask.

Feel free to scan the code to get the copy of the full research. No form fills, no email addresses, anything like that, or simply Google SRE report 2025.

And it's also one of the links in the, research, tabs there. Second, think which of these may help in your situation.

So this is this is us talking about what we hear from our our customers. Again, the things we're hearing, some obvious, some not so much.

So number one, start with ensuring the reliability and resilience of your most critical apps. Don't try to boil the ocean.

This one may be one of the obvious ones, but has anyone ever said to you, here's your blank check. So focus on what makes the most money for your customers or what impacts your workforce productivity the most if it's down or super slow.

Focus on a great experience. Now this one is not as obvious when I say the reason for this is, the volume of telemetry data from all of your internal stuffs is growing exponentially, and you really should not capture all the telemetry for the sake of capturing all the telemetry.

Instead, use an outside in experience perspective to then calibrate the what and how much of your internal data to collect. That is maybe if your internal stuff doesn't impact the experience when it goes into incident or maybe, if it doesn't impact it immediately, then maybe you can sample it less frequency frequently.

The other benefit, I think, is that by focusing on the experience based signals, you lessen the risk of being distracted by noise if you're trying to collect all the telemetry. AI.

Here's what I wanna say about AI because, we could do a whole webinar on this, and I'm sure there are many, many on this. So, what I wanna say about AI is from this year's report in an extremely rare alignment of all management levels, the desire to be trained on AI was universal.

So if FAI happens to be in the training enablement or knows the org, just that's the main thing I'd like to mention on AI. Rare alignment across all managerial levels.

Tools, tooling, tool chains. Remember, please, a high number of tools does not automatically mean tool sprawl.

Instead, whenever this comes up, maybe during budget time, it's always a cost to value ratio. If the cost is greater, then you have a tools sprawl problem.

If the value is greater, then you do not have a tools sprawl problem. Debt, remember that evolution slide? If you're not careful, you will accrue debt.

You probably already have some, so make sure it gets paid down. Yes.

It's hard, but it is worth it. The observable life cycle shift wide.

None of that argument of left versus right. Right? We don't we don't need any metric system in preproduction, and we don't need measuring in feet and inches in production.

Right? We want the same signals to make sure the right people are responding to the right things. Center of excellence, please make it functional, cross team members.

That is more than just IT folks, more than just IT, SRE, DevOps. Right? The people who sign the checks, the people who make the decisions, make sure they are part of that center of excellence.

And then last, just really an extension of center of excellence, you cannot fix this IT to business gap if you don't acknowledge that IT to business gap. So go back to some of the data we were talking about earlier when we're showing the contrast between, hey, what's the maturity of your reliability practices.

Right? So an extension of your center of excellence, and IT to business gap. Now here we go.

The first of two climax slides. Now, as a validation exercise, when, Sergio, now we're getting close to finishing up this deck.

I don't know what you wanna call it. We we AI ified what we, you know, wanted to talk about in this webinar.

Why does observability have so many definitions? That was the prompt. The term observability, you know, imagine Siri voice or something, Alexa voice.

I don't know. Stop.

Because it is used in different contexts. Right? So number one, there's the academic definition for the state of the inputs by looking at the outputs.

Number two, there's your MELT definition. Right? Alright? Now when we get to three, four, and five, that's 2025 and beyond.

That's where the context comes in. Right? The evolution of technology requires more complex, sophisticated approaches.

Different folks with different goals need the ability to observe different things, and this is critical. Observability is still about understanding the relationship of a set of outputs mapped back to a set of inputs.

It's just that those outputs and inputs are within the context. They may have different boundaries.

For example, my user's having a bad experience because of my endpoints or my stuff or my third party's endpoints, my third party stuff or something in the Internet stack, and being able to see how they relate to each other. And then back to complexity, it's more than just the application kinda tying in number four there.

It could be the business. Right? Business observability, Internet stack, the experience.

It doesn't matter. It is complex.

And so, team, Serge, any other, comments on this before we move to, the next slide?

Sergey Katsev

41:00 - 41:33

The only thing I will add is as far as complex systems, most teams have people that say, no. No.

No. I got this.

We don't need anything to explain it. We have a document from, you know, seven years ago that talks about how it works.

That doesn't it it never quite was good enough, but it really doesn't fly anymore. Right? Systems are evolving.

All of your dependencies are changing month to month, if not more frequently. And so you really need something that is evolving that viewpoint.

Jared H

41:33 - 41:34

as you go.

Leo Vasiliou

41:34 - 42:03

Agreed. Agreed.

And that brings us then to, the answer to the question everybody's been, waiting for. What's required for modern observability right now and beyond is context.

And if you were looking when we first had this slide, it was like, oh, Lee, you didn't have a a black legend. Well, this is it.

So, again, thank you so very much for your time, everyone. Sergei, any closing, comments for all of.

Sergey Katsev

42:03 - 42:04

us before before we see.

Leo Vasiliou

42:04 - 42:09

if there are any questions, we, or comments we need to, address?

Sergey Katsev

42:09 - 42:31

Yeah. Just the what is context when it comes, to observability? It's the ability to take different sources of data telemetry complexity and give you that answer to those business questions or those technical questions quickly.

Leo Vasiliou

42:31 - 42:52

Thank you for that, Sergei. And I guess we'll go ahead and, show the obligatory, obligatory thank you slide and then the obligatory, questions, comments, slide.

So, otherwise, thank you very much for, for your time.

Jared H

42:52 - 43:07

Well, we do have we do have, a few questions here if you guys want to take a moment and just answer those. One of them is for you, Leo.

On on the slide that you talked about, the preventative, can you elaborate more on what you meant on that?

Leo Vasiliou

43:07 - 44:34

Preventive. I think you I think you're referring the the evolution.

Yep. Yep.

So the evolution slide. So that was, you know, what we were trying to convey there was really capture the essence of evolution, kind of in, like, when your apps were monolithic on premise, then we started going to SaaS and, you know, you did your internal monitoring.

You reacted to alarms. And then proactive, you know, right, you started to try to be like, how can we get ahead of these things? We bought APM.

We bought, synthetics, what whatever it is, tracing to kinda troubleshoot issues. And so but then it's like this idea of being preventive.

And, actually, that's a good question because if somebody, if this person didn't know what we meant, I mean, somebody else in the webinar also probably didn't know what we meant. What we mean when we talk about preventive is you can't prevent a thing from actually happening.

That's not what we meant to say. What we're talking about is preventing the impact from that thing being acknowledged or realized.

Sometimes you will be able to do that, sometimes you won't depending on, you know, your tooling, which types of processes you have in place. If a tree falls in the wood, right, if you can fix it before the town and the valley floods, right, you're preventing the impact from that tree falling.

So that's what we mean when we talk about preventive is preventing the impact from, an incident.

Sergey Katsev

44:34 - 44:44

It it's increasing resilience. And in order to increase resilience, you need to understand the level of resilience of your system.

Jared H

44:44 - 44:53

That's a good point. Here's one I I I think both of you can answer this one.

So is this is this covered with event driven architecture?

Sergey Katsev

44:53 - 45:41

Yeah. So I'll I'll take that one.

My take is no. So event driven architecture, defines a little bit better what a piece of software is going to do.

But remember that, first of all, even if your component is fully defined, fully event driven, and you understand all of the events well, first of all, how do you understand all of the events? You probably have logs of the events to see what happened. Well, that's observability or I'm sorry.

That is a source of telemetry that can be added to your observability strategy, but you still have all of your third party dependencies.

Jared H

45:41 - 45:53

Okay. Well, surgery, this one surgery.

Sorry. This is this is also for you then.

How do you actually tie an observability tool into your CICD or development process?

Sergey Katsev

45:53 - 46:53

Yeah. So, the short answer is integrations.

Right? At at CatchPoint, we have tons of them, but a common thing we see people doing is you have a GitHub hook, for example, that, deploys your changes to a particular, testing server automatically runs some performance tests against those changes. And if the performance does not meet certain criteria, the change is automatically rejected.

Right? That that's one example. Another example is using a tool like Terraform, when you need to stand up a new service, right, and let's say, literally a new piece of infrastructure, our customers will use Terraform to spin it up, as a service, let's say, in the cloud, add their DNS roles, and then add catch point tests so that from day one, everything is being monitored in the correct way.

Right? So that's what we mean by tying it into the CICD process.

Leo Vasiliou

46:53 - 47:55

And, Serge, if I could add on to that, specifically just the performance comments. I'm a performance analytics chart and graph nerd.

I can't help it. Forgive me.

That goes back to what I was saying about shifting wide, specifically my, you know, bad humor comment when I was like, we don't want metrics measuring in metric system and pre prod and feet and inches in the, production environment. I ostensibly was talking about performance because it's very easy to use any number of tools to be like, oh, is it up or down? A a billion tools can do that.

But the performance and having a consistent yardstick for measurement across, that is actually I wanted to kind of overstate and overscore that because, that was essentially what I was saying is, you know, shift wide so that your response times don't vary wildly because you're using different tools. So, I felt that was important enough to, to add on.

Sergey Katsev

47:55 - 47:58

Thank you for that. Yep.

I agree.

Jared H

47:58 - 48:06

We don't have any more questions. So if you guys have any final thoughts or remarks before we close out, now is the time to share them.

Leo Vasiliou

48:06 - 48:17

I guess I'll go. I'll again say thank you very much for having us.

I hope, I hope everyone got a couple of nuggets out of the session. And, again, thank you.

Sergey Katsev

48:17 - 48:29

Yeah. Same.

I I hope that it was useful, and, we love feedback. So please connect on the regular channels.

Happy to answer questions offline, etcetera.

‍