Blog Post

Adobe Experience Cloud Outage: The Impact of Relying on Third-party Services

Published
December 12, 2023
#
 mins read
By 

in this blog post

On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9.  

We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.

According to Adobe, Data Collection (Segment Publishing), Data Processing (Cross-Device Analytics, Analytics Data Processing), and Reporting Applications (Analysis Workspace, Legacy Report Builder, Data Connectors, Data Feeds, Data Warehouse, Web Services API) were all affected by the outage.  

Adobe Analytics reports that within the Experience Cloud, multiple services were down for several hours. The outages in various services started and ended at different times, with varying outage durations. Note that these times do not reflect when Adobe updated its status page or informed its customers about the outage.  

Click image to view full size

The cost of such a service being down for 18 hours adds up quickly – for Adobe and its customers. Both are impacted by lost revenue due to service disruptions and damaged brand reputation. On top of that, Adobe risks incurring SLA violations for millions of customers.  

Catchpoint's Internet Sonar was the first and only tool to detect this outage, significantly outperforming others like Thousand Eyes and Downdetector. This incident not only validates our claims about Internet Sonar but also underscores the importance of Catchpoint’s Internet Performance Monitoring (IPM) Platform to navigate the growing complexity and fragility of the Internet.  

Now that we understand the 'what' and the 'how much,' let's dive into the incident review.  

How we detected the outage

Catchpoint’s IPM Platform spotted the outage in at least three different ways:

  • Internet Sonar (service outage detection and correlation)  
  • Catchpoint Synthetic tests run by customers who rely upon Adobe
  • Catchpoint Professional Services running analysis on behalf of major retail and e-commerce customers

Here are our observations for each of the three areas above.

Internet Sonar (service outage detection and correlation)  

Internet Sonar monitors Adobe Tag Manager, a service in the Adobe Experience Cloud. On Friday, December 08, at 8.03 AM EST, Internet Sonar detected timeout errors from a large number of cities globally. Internet Sonar alerts notified customers at 8:20 AM EST once the failures were confirmed as significant incidents, not just short-term outliers.  

A screenshot of a computerDescription automatically generated
Click image to view full size

Click image to view full size

Internet Sonar quickly detected outages to Adobe Tag Manager as shown above. Sites with this tag were painfully slow, with some taking 100-200 seconds to load.  

Internet Sonar also performed intelligent correlation for failing Synthetic tests run by customers. The screenshot below shows the records page of a Catchpoint Synthetic test for a customer's service using Adobe Tag Manager, where Internet Sonar correlated the test failure with the Adobe Tag Manager outage.  

A screenshot of a computerDescription automatically generated
Click image to view full size


Internet Sonar enables users to answer the question, "Is my service experiencing issues due to a problem in my application or infrastructure, or is it one of the 3rd party services in the Internet Stack I rely upon to deliver my service?"  

Pro-active Monitoring of the Adobe Experience Platform

Many of Catchpoint's e-commerce customers who rely on Adobe also began experiencing multiple failures in the synthetic tests they run on the Catchpoint platform.    

Test failure #1: HTTP 404 Not Found

Root cause for the failures for Journey Optimizer: Request
https://auth.services.adobe.com/signin
returning an HTTP 404, resulting in login not going through.

An e-commerce customer dependent on Adobe extensively monitors the Adobe Experience platform.  Their synthetic test for "Adobe Journey Optimizer," part of Experience Cloud, showed significant impact:    

A graph on a white backgroundDescription automatically generated
Click image to view full size

HTTP Response: {"errorCode":"invalid_resource_id","errorMessage":"Could not find resource for id v:2,s,f,bg:eclogin,..."}  

A graph of different colored linesDescription automatically generated
Click image to view full size

Test failure #2: TCP Timeout Errors  

TCP Timeout errors for Launch.js JavaScript Request; URL initiated by DTM.js

HTTP Request: https://assets.adobedtm.com/a7d65461e54e/6e9802a06173/launch-43baf8381f4b.min.js  

A graph with a lineDescription automatically generated with medium confidence
Click image to view full size

Incident Start time: Dec 08, 2023 - 05:04:37 PT  
Status: Ongoing
Regions Impacted: Global  

Catchpoint Professional Services running analysis on behalf of our customers

Catchpoint Professional Services, which monitors and analyzes websites for major retail and e-commerce customers, noticed several failures attributed to the Adobe outage.  

Test failure #1: Test timeout caused by high connect time

We observed failures across multiple tests due to test timeout impacted by high connect time for assests.adobedtm.com  

Click image to view full size

Click image to view full size

We also observed no response from servers:  

A screenshot of a graphDescription automatically generated
Click image to view full size

Inserting image...
Click image to view full size

Test failure #2: Increase in wait time

We noticed the host chart showing an increase in wait time for requests from "assets.adobedtm.com"  

A graph with a lineDescription automatically generated
Click image to view full size

Waterfall data also showed 503 – Service Unavailable error for a specific request from Adobe:

Click image to view full size

Test failure #3: Test failure and performance degradation  

We also noticed test failures and performance degradation due to Adobe request failures.  

Inserting image...
Click image to view full size


A graph with numbers and linesDescription automatically generated
Click image to view full size

We also used WebPageTest (WPT) results. Note that only after the timeout of Adobe assets, the content on the page is displayed to users.

Click image to view full size

Real user monitoring (RUM) data revealed the impact on end users:  

A graph with lines and a lineDescription automatically generated with medium confidence
Click image to view full size

Why you need Internet Sonar

Imagine waiting to find out your service or site is down through negative posts on social media. Now, you don't have to.  

When it comes to outages like these, it is extremely important to have a tool that helps you answer the question, "Is it me or something else?" A tool capable of pinpointing the source of Internet disruptions at a glance, meaning no finger-point, no war rooms, just intelligent, trustworthy Internet health information to accelerate incident detection. That's the core concept behind Internet Sonar.  

What sets Internet Sonar apart:

  • Unparalleled worldwide and regional visibility leveraging Catchpoint's Global Observability Network with over 2600 nodes from more than 300 providers in 94 countries – with more being added all the time.
  • Hundreds of the most popular Internet services monitored, including Internet Infrastructure (CDN, DNS, Cloud), SaaS (email, SaaS, UCaaS, SECaS), and MarTech (Ad serving, Analytics, Video).
  • Real-time email alerts as well as webhook or API access for easy integration into any application.
  • Automatic, AI-powered data correlation with active monitoring for simple, real-time status information.  

As Steve McGhee, Reliability Advocate, SRE, Google Cloud, highlighted in his Conclusion for Catchpoint's 2023 SRE Report, there is a reason why experts never depend on a single solution, tool, or platform to accomplish their tasks in the best possible manner. "When it comes to skilled labor, or 'operations' perhaps," writes Steve, "you want teams to be able to reach for the right tool at the right time, not to be impeded by earlier decisions about what they think they might need in the future."

Watch this on-demand product demo video to learn more about Internet Sonar and look out for an upcoming blog post discussing best practices for monitoring the Adobe Product suite with Catchpoint.

On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9.  

We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.

According to Adobe, Data Collection (Segment Publishing), Data Processing (Cross-Device Analytics, Analytics Data Processing), and Reporting Applications (Analysis Workspace, Legacy Report Builder, Data Connectors, Data Feeds, Data Warehouse, Web Services API) were all affected by the outage.  

Adobe Analytics reports that within the Experience Cloud, multiple services were down for several hours. The outages in various services started and ended at different times, with varying outage durations. Note that these times do not reflect when Adobe updated its status page or informed its customers about the outage.  

Click image to view full size

The cost of such a service being down for 18 hours adds up quickly – for Adobe and its customers. Both are impacted by lost revenue due to service disruptions and damaged brand reputation. On top of that, Adobe risks incurring SLA violations for millions of customers.  

Catchpoint's Internet Sonar was the first and only tool to detect this outage, significantly outperforming others like Thousand Eyes and Downdetector. This incident not only validates our claims about Internet Sonar but also underscores the importance of Catchpoint’s Internet Performance Monitoring (IPM) Platform to navigate the growing complexity and fragility of the Internet.  

Now that we understand the 'what' and the 'how much,' let's dive into the incident review.  

How we detected the outage

Catchpoint’s IPM Platform spotted the outage in at least three different ways:

  • Internet Sonar (service outage detection and correlation)  
  • Catchpoint Synthetic tests run by customers who rely upon Adobe
  • Catchpoint Professional Services running analysis on behalf of major retail and e-commerce customers

Here are our observations for each of the three areas above.

Internet Sonar (service outage detection and correlation)  

Internet Sonar monitors Adobe Tag Manager, a service in the Adobe Experience Cloud. On Friday, December 08, at 8.03 AM EST, Internet Sonar detected timeout errors from a large number of cities globally. Internet Sonar alerts notified customers at 8:20 AM EST once the failures were confirmed as significant incidents, not just short-term outliers.  

A screenshot of a computerDescription automatically generated
Click image to view full size

Click image to view full size

Internet Sonar quickly detected outages to Adobe Tag Manager as shown above. Sites with this tag were painfully slow, with some taking 100-200 seconds to load.  

Internet Sonar also performed intelligent correlation for failing Synthetic tests run by customers. The screenshot below shows the records page of a Catchpoint Synthetic test for a customer's service using Adobe Tag Manager, where Internet Sonar correlated the test failure with the Adobe Tag Manager outage.  

A screenshot of a computerDescription automatically generated
Click image to view full size


Internet Sonar enables users to answer the question, "Is my service experiencing issues due to a problem in my application or infrastructure, or is it one of the 3rd party services in the Internet Stack I rely upon to deliver my service?"  

Pro-active Monitoring of the Adobe Experience Platform

Many of Catchpoint's e-commerce customers who rely on Adobe also began experiencing multiple failures in the synthetic tests they run on the Catchpoint platform.    

Test failure #1: HTTP 404 Not Found

Root cause for the failures for Journey Optimizer: Request
https://auth.services.adobe.com/signin
returning an HTTP 404, resulting in login not going through.

An e-commerce customer dependent on Adobe extensively monitors the Adobe Experience platform.  Their synthetic test for "Adobe Journey Optimizer," part of Experience Cloud, showed significant impact:    

A graph on a white backgroundDescription automatically generated
Click image to view full size

HTTP Response: {"errorCode":"invalid_resource_id","errorMessage":"Could not find resource for id v:2,s,f,bg:eclogin,..."}  

A graph of different colored linesDescription automatically generated
Click image to view full size

Test failure #2: TCP Timeout Errors  

TCP Timeout errors for Launch.js JavaScript Request; URL initiated by DTM.js

HTTP Request: https://assets.adobedtm.com/a7d65461e54e/6e9802a06173/launch-43baf8381f4b.min.js  

A graph with a lineDescription automatically generated with medium confidence
Click image to view full size

Incident Start time: Dec 08, 2023 - 05:04:37 PT  
Status: Ongoing
Regions Impacted: Global  

Catchpoint Professional Services running analysis on behalf of our customers

Catchpoint Professional Services, which monitors and analyzes websites for major retail and e-commerce customers, noticed several failures attributed to the Adobe outage.  

Test failure #1: Test timeout caused by high connect time

We observed failures across multiple tests due to test timeout impacted by high connect time for assests.adobedtm.com  

Click image to view full size

Click image to view full size

We also observed no response from servers:  

A screenshot of a graphDescription automatically generated
Click image to view full size

Inserting image...
Click image to view full size

Test failure #2: Increase in wait time

We noticed the host chart showing an increase in wait time for requests from "assets.adobedtm.com"  

A graph with a lineDescription automatically generated
Click image to view full size

Waterfall data also showed 503 – Service Unavailable error for a specific request from Adobe:

Click image to view full size

Test failure #3: Test failure and performance degradation  

We also noticed test failures and performance degradation due to Adobe request failures.  

Inserting image...
Click image to view full size


A graph with numbers and linesDescription automatically generated
Click image to view full size

We also used WebPageTest (WPT) results. Note that only after the timeout of Adobe assets, the content on the page is displayed to users.

Click image to view full size

Real user monitoring (RUM) data revealed the impact on end users:  

A graph with lines and a lineDescription automatically generated with medium confidence
Click image to view full size

Why you need Internet Sonar

Imagine waiting to find out your service or site is down through negative posts on social media. Now, you don't have to.  

When it comes to outages like these, it is extremely important to have a tool that helps you answer the question, "Is it me or something else?" A tool capable of pinpointing the source of Internet disruptions at a glance, meaning no finger-point, no war rooms, just intelligent, trustworthy Internet health information to accelerate incident detection. That's the core concept behind Internet Sonar.  

What sets Internet Sonar apart:

  • Unparalleled worldwide and regional visibility leveraging Catchpoint's Global Observability Network with over 2600 nodes from more than 300 providers in 94 countries – with more being added all the time.
  • Hundreds of the most popular Internet services monitored, including Internet Infrastructure (CDN, DNS, Cloud), SaaS (email, SaaS, UCaaS, SECaS), and MarTech (Ad serving, Analytics, Video).
  • Real-time email alerts as well as webhook or API access for easy integration into any application.
  • Automatic, AI-powered data correlation with active monitoring for simple, real-time status information.  

As Steve McGhee, Reliability Advocate, SRE, Google Cloud, highlighted in his Conclusion for Catchpoint's 2023 SRE Report, there is a reason why experts never depend on a single solution, tool, or platform to accomplish their tasks in the best possible manner. "When it comes to skilled labor, or 'operations' perhaps," writes Steve, "you want teams to be able to reach for the right tool at the right time, not to be impeded by earlier decisions about what they think they might need in the future."

Watch this on-demand product demo video to learn more about Internet Sonar and look out for an upcoming blog post discussing best practices for monitoring the Adobe Product suite with Catchpoint.

This is some text inside of a div block.

You might also like

Blog post

2024: A banner year for Internet Resilience

Blog post

SSL Monitoring, Trust, and McLOVIN

Blog post

Performing for the holidays: Look beyond uptime for season sales success