Blog Post

Consolidation and Modernization in Enterprise Observability

Published
October 29, 2024
#
 mins read
By 

in this blog post

Organizations are seeing measurable benefits from investing in observability, including faster issue resolution, cost reduction, and improved business outcomes. However, challenges still remain, including rising costs, tool fragmentation, and the need for more comprehensive monitoring of internet dependencies and user experience. Let’s explore these challenges and the best practices organizations are adopting to address them.  

Rising concern over observability costs

Observability is too expensive. In 2022 during an earnings call, Datadog revealed that a customer was spending $65 million on their observability tools. This news triggered increased awareness of the raising costs of observability.

Jeremy Burton wrote “Some observability tool vendors say organizations should allocate up to 30% of their total infrastructure cost to monitoring and understanding the state of their IT system. That’s just nuts.”  

MELT myopia driving higher costs

One of the root causes of higher costs is the belief in the three pillars of observability (now MELT: infrastructure Metrics, Events, Logs, and code Traces) and the idea that everything needs to be monitored this way. Storage vendors, in particular, love logs as they drive significant storage needs.

I expect IT teams to abandon the tools-first approach for monitoring. Instead, I see teams adopting an approach that begins by determining the best approach to monitor specific applications or pieces of infrastructure based on the criticality of the application and the value of monitoring data.

The challenge of multiple APM tools

Many IT leaders face the challenge of managing multiple APM tools, each with its own set of features, dashboards, and data sources. This fragmentation can lead to inefficiencies, as teams spend valuable time correlating data from different tools to identify and resolve issues.  

The 2024 SRE report shows a significant percentage of large enterprises have more than five observability tools due to different needs and silos. The variety is often driven by the preference of application teams. In my experience, most large enterprises have a good mix of observability tools, including most of the top 5 APM platforms.  

A graph of different colored barsDescription automatically generated
2024 SRE Report, "How many monitoring or observability tools does your organization use?"

The future of APM: tool consolidation, OpenTelemetry, and cost savings

According to a recent survey by Elastic, a significant number of organizations are looking to consolidate their observability and monitoring tools to gain insights faster and improve collaboration between teams

One of the most compelling reasons for consolidating APM tools is the potential for cost savings. By reducing the number of tools and standardizing on a single platform, organizations can lower licensing costs, reduce training expenses, and streamline maintenance efforts.

OpenTelemetry (Otel) has emerged as a game-changer in the world of observability. As an open-source project, it provides a unified set of APIs, libraries, agents, and instrumentation to capture and export telemetry data (metrics, logs, and traces) from applications. The adoption of OpenTelemetry is gaining momentum, with many organizations recognizing its potential to standardize data collection and reduce vendor lock-in.  

This is Observability trend #1: organizations are making Otel a requirement for all new observability teams. It has the potential to break vendor lock-in, enable the consolidation of monitoring data, and give teams more flexibility.

Some teams are considering open-source technologies, but as the saying goes, open-source is free as a puppy. While it is effective for many use cases, in the case of observability, many teams quickly realize that the effort to build, configure, integrate, and maintain a full open-source stack can easily add up to a higher cost than an optimized commercial platform option.

Centralized observability teams

The increased complexity of observability and the growing scrutiny of spend and platform adoption are resulting in the centralization of observability decisions. This is Observability trend #2: enterprises are either creating a central operations team that owns observability or an architecture team that defines standards, processes, vendors, and governance.  

These central operations teams start by completing an inventory of applications, vendors, tools, reports, and critical needs. From here, they can define broad priorities, standards, and best practices. I expect this trend to accelerate in the coming months.

The growing need to monitor distributed, internet-centric services and dependencies.

While APM tools are an established component of observability, there is increased recognition that they’re very systems-centric, lacking visibility of the dozens of factors that impact applications in a world where applications are digital, cloud-centric, increasingly distributed, API-centric, and dependent on dozens of services, each of which is dependent on the performance and availability of the global Internet required to connect to them.

This need is accentuated by the recognition that what really matters is not system uptime or code efficiency but the real-world digital user experience, wherever in the world the user happens to be. In the case of services exposed via APIs, what matters is the experience the other system experiences across the Internet, wherever it may happen to be.  

Modern enterprises have recognized that customer experience is digital experience, and digital experience is customer experience. Even most off-line processes are supported and dependent on digital systems that define the quality of the experience.

For instance, for a rental company, what matters is not that the cluster reservation system has an average CPU utilization of 72% but that the customer in line at the rental counter doesn’t keep hearing, “Sorry, my computer is slow” after waiting 20 minutes in line.

As a result, we have observability trend #3: the requirement for Internet Performance Monitoring (IPM) as an essential component for an observability stack. EMA research recently published a report stating, “Internet Performance Monitoring tools have become just as important as application performance management, if not more so.

With the growing number of outages caused by elements of the Internet Stack and Internet-centric dependencies, it’s no surprise GigaOM wrote, “IT decision-makers must choose between bearing the full weight of these outages or investing in an IPM solution to navigate them intelligently.”

The formula for complete observability

The combination of these trends has led to a common scenario in mature enterprises: consolidating to an APM platform (often among the top three) paired with an IPM platform (like Catchpoint or Cisco ThousandEyes). APM provides an inside-out view, while IPM delivers an outside-in perspective. APM provides a system-centric view, while IPM provides a global digital experience + internet health view.

In these mature ITOps organizations, Dynatrace and Catchpoint are often the preferred combination that provides the operational insights these companies require to pursue resilience and optimal performance. These teams are characterized by being in a relatively advanced observability maturity stage where they aim to be not only proactive but also value-led and business-impact oriented.

View full size image

Modern organizations adopting this approach are achieving operational efficiencies, dramatically increased uptime, and an improvement in user experience that is clearly having a positive impact on the business. This also results in better alignment between IT and business strategy and recognition of the value IT operations teams bring to the table. Companies like SAP, IKEA, Akamai, and one of the leading financial services institutions in North America are examples of those finding success with this model.

Conclusion

As IT leaders navigate the complexities of IT operations, evolving monitoring and observability, consolidating APM tools, adopting OpenTelemetry, and implementing IPM with governance and best practices from a centralized team can provide significant advantages. By streamlining their observability strategies, organizations can achieve faster issue resolution, reduce costs, and drive better business outcomes.

Organizations are seeing measurable benefits from investing in observability, including faster issue resolution, cost reduction, and improved business outcomes. However, challenges still remain, including rising costs, tool fragmentation, and the need for more comprehensive monitoring of internet dependencies and user experience. Let’s explore these challenges and the best practices organizations are adopting to address them.  

Rising concern over observability costs

Observability is too expensive. In 2022 during an earnings call, Datadog revealed that a customer was spending $65 million on their observability tools. This news triggered increased awareness of the raising costs of observability.

Jeremy Burton wrote “Some observability tool vendors say organizations should allocate up to 30% of their total infrastructure cost to monitoring and understanding the state of their IT system. That’s just nuts.”  

MELT myopia driving higher costs

One of the root causes of higher costs is the belief in the three pillars of observability (now MELT: infrastructure Metrics, Events, Logs, and code Traces) and the idea that everything needs to be monitored this way. Storage vendors, in particular, love logs as they drive significant storage needs.

I expect IT teams to abandon the tools-first approach for monitoring. Instead, I see teams adopting an approach that begins by determining the best approach to monitor specific applications or pieces of infrastructure based on the criticality of the application and the value of monitoring data.

The challenge of multiple APM tools

Many IT leaders face the challenge of managing multiple APM tools, each with its own set of features, dashboards, and data sources. This fragmentation can lead to inefficiencies, as teams spend valuable time correlating data from different tools to identify and resolve issues.  

The 2024 SRE report shows a significant percentage of large enterprises have more than five observability tools due to different needs and silos. The variety is often driven by the preference of application teams. In my experience, most large enterprises have a good mix of observability tools, including most of the top 5 APM platforms.  

A graph of different colored barsDescription automatically generated
2024 SRE Report, "How many monitoring or observability tools does your organization use?"

The future of APM: tool consolidation, OpenTelemetry, and cost savings

According to a recent survey by Elastic, a significant number of organizations are looking to consolidate their observability and monitoring tools to gain insights faster and improve collaboration between teams

One of the most compelling reasons for consolidating APM tools is the potential for cost savings. By reducing the number of tools and standardizing on a single platform, organizations can lower licensing costs, reduce training expenses, and streamline maintenance efforts.

OpenTelemetry (Otel) has emerged as a game-changer in the world of observability. As an open-source project, it provides a unified set of APIs, libraries, agents, and instrumentation to capture and export telemetry data (metrics, logs, and traces) from applications. The adoption of OpenTelemetry is gaining momentum, with many organizations recognizing its potential to standardize data collection and reduce vendor lock-in.  

This is Observability trend #1: organizations are making Otel a requirement for all new observability teams. It has the potential to break vendor lock-in, enable the consolidation of monitoring data, and give teams more flexibility.

Some teams are considering open-source technologies, but as the saying goes, open-source is free as a puppy. While it is effective for many use cases, in the case of observability, many teams quickly realize that the effort to build, configure, integrate, and maintain a full open-source stack can easily add up to a higher cost than an optimized commercial platform option.

Centralized observability teams

The increased complexity of observability and the growing scrutiny of spend and platform adoption are resulting in the centralization of observability decisions. This is Observability trend #2: enterprises are either creating a central operations team that owns observability or an architecture team that defines standards, processes, vendors, and governance.  

These central operations teams start by completing an inventory of applications, vendors, tools, reports, and critical needs. From here, they can define broad priorities, standards, and best practices. I expect this trend to accelerate in the coming months.

The growing need to monitor distributed, internet-centric services and dependencies.

While APM tools are an established component of observability, there is increased recognition that they’re very systems-centric, lacking visibility of the dozens of factors that impact applications in a world where applications are digital, cloud-centric, increasingly distributed, API-centric, and dependent on dozens of services, each of which is dependent on the performance and availability of the global Internet required to connect to them.

This need is accentuated by the recognition that what really matters is not system uptime or code efficiency but the real-world digital user experience, wherever in the world the user happens to be. In the case of services exposed via APIs, what matters is the experience the other system experiences across the Internet, wherever it may happen to be.  

Modern enterprises have recognized that customer experience is digital experience, and digital experience is customer experience. Even most off-line processes are supported and dependent on digital systems that define the quality of the experience.

For instance, for a rental company, what matters is not that the cluster reservation system has an average CPU utilization of 72% but that the customer in line at the rental counter doesn’t keep hearing, “Sorry, my computer is slow” after waiting 20 minutes in line.

As a result, we have observability trend #3: the requirement for Internet Performance Monitoring (IPM) as an essential component for an observability stack. EMA research recently published a report stating, “Internet Performance Monitoring tools have become just as important as application performance management, if not more so.

With the growing number of outages caused by elements of the Internet Stack and Internet-centric dependencies, it’s no surprise GigaOM wrote, “IT decision-makers must choose between bearing the full weight of these outages or investing in an IPM solution to navigate them intelligently.”

The formula for complete observability

The combination of these trends has led to a common scenario in mature enterprises: consolidating to an APM platform (often among the top three) paired with an IPM platform (like Catchpoint or Cisco ThousandEyes). APM provides an inside-out view, while IPM delivers an outside-in perspective. APM provides a system-centric view, while IPM provides a global digital experience + internet health view.

In these mature ITOps organizations, Dynatrace and Catchpoint are often the preferred combination that provides the operational insights these companies require to pursue resilience and optimal performance. These teams are characterized by being in a relatively advanced observability maturity stage where they aim to be not only proactive but also value-led and business-impact oriented.

View full size image

Modern organizations adopting this approach are achieving operational efficiencies, dramatically increased uptime, and an improvement in user experience that is clearly having a positive impact on the business. This also results in better alignment between IT and business strategy and recognition of the value IT operations teams bring to the table. Companies like SAP, IKEA, Akamai, and one of the leading financial services institutions in North America are examples of those finding success with this model.

Conclusion

As IT leaders navigate the complexities of IT operations, evolving monitoring and observability, consolidating APM tools, adopting OpenTelemetry, and implementing IPM with governance and best practices from a centralized team can provide significant advantages. By streamlining their observability strategies, organizations can achieve faster issue resolution, reduce costs, and drive better business outcomes.

This is some text inside of a div block.

You might also like

Blog post

2024: A banner year for Internet Resilience

Blog post

SSL Monitoring, Trust, and McLOVIN

Blog post

Lessons from Microsoft’s office 365 Outage: The Importance of third-party monitoring