Blog Post

When AI tools fail: How to map your AI dependencies for proactive visibility

Published
March 4, 2025
#
 mins read
By 

in this blog post

AI platforms have experienced several service interruptions over the past few months.

A screenshot of a social media postAI-generated content may be incorrect.

We’ve all seen the memes fly when ChatGPT, Gemini or Perplexity go down. They’re funny at first, but then reality hits: if you rely on AI tools for work or business, these outages can grind your day to a halt. And it’s not just a glitch here or there— there’s a clear pattern of AI services failing across different platforms:

  • February 5-6, 2025: Google’s Gemini experienced a 23-hour disruption that affected the "Add File" and "Link File" functionalities within Gems. The outage prevented users from attaching files to their AI-driven workflows. Users had no workaround, leading to productivity loss for businesses relying on Gemini’s file-processing capabilities.
  • January 23, 2025: ChatGPT and several OpenAI APIs suffered elevated error rates, with users encountering "bad gateway" errors. Businesses relying on ChatGPT for automation, customer service, and content generation were left scrambling
  • January 23, 2025: Perplexity’s API experienced a major outage, causing timeouts and disruptions for applications relying on its AI capabilities.  
  • December 26, 2024: A host of OpenAI services (ChatGPT, Sora video creation, plus agents, realtime speech, batch, and DALL-E APIs) suffered error rates north of 90%.

  • June 4, 2024: On this day, multiple AI platforms, including OpenAI's ChatGPT, Anthropic's Claude, and Perplexity, experienced simultaneous outages. Users worldwide reported disruptions, leading to widespread discussions on social media platforms.

We’ve officially hit a point where our dependence on AI is no longer just a possibility; it’s an absolute. When these systems fail, we’re left scrambling. The real question is: how do you stay ahead of the next failure?

The revenue impact of AI outages

The numbers are big and growing bigger: in 2025, global AI investments are set to exceed $500 billion. For many companies, AI apps like ChatGPT aren’t optional anymore—they’re mission-critical. Gartner reports that 70% of enterprises now use large language models (LLMs) for everyday tasks like automated customer service, marketing personalization, and real-time data crunching.

When these AI systems go offline, it’s not just a minor inconvenience. In finance, a few hours of AI downtime could mean millions lost due to missed trades or undetected fraud. In eCommerce, chatbots and recommendation engines going dark mean abandoned shopping carts and fewer conversions, which translates to real money left on the table.

But the damage doesn’t stop at lost revenue. Companies increasingly rely on AI-powered automation to streamline workflows, meaning that outages force employees to revert to manual processes, significantly slowing down productivity. This is particularly evident in customer support, where AI chatbots handle vast volumes of inquiries. If an outage forces companies to fall back on human agents, call center queues expand, increasing response times and leading to diminished customer satisfaction.

If you’re concerned about the impact of AI outages on your business, now is the time to evaluate your AI dependencies and invest in tools that can help you stay ahead of disruptions.

The need for visibility: Mapping your AI dependencies  

Even with the best monitoring strategies in place, AI outages present a unique challenge: You may know something is broken, but not necessarily where or why. To accurately pinpoint issues, you need tools that enable you to get actionable insights into your AI dependencies—whether they originate in the application layer or the underlying Internet Stack.  

eCommerce AI dependencies: A case study

Consider an eCommerce company relying on an AI-powered chatbot for customer support. It relies on several key components to deliver a seamless shopping experience:

  • Front-end CDN: Ensures fast content delivery to users.
  • Distributed Hyperscaler: Acts as the origin server for dynamic content.
  • Search and Seller APIs: Retrieve relevant product data for users.
  • Chatbot Powered by OpenAI API: Handles customer inquiries and provides real-time support.

The chatbot is a critical part of the customer support workflow. When a shopper interacts with the chatbot, their request is forwarded to an external API, which then interacts with the OpenAI API to generate a response. This means the chatbot’s functionality is entirely dependent on the OpenAI API.

A white rectangular sign with black textAI-generated content may be incorrect.
Flow diagram depicting the interaction between a user, an external API, and OpenAI's API in an e-commerce chatbot system

If the OpenAI API experiences an outage, the chatbot fails, leaving customers without support. This not only frustrates users but can also lead to lost sales and damaged customer relationships.

How to map AI dependencies and stay ahead of outages

In the eCommerce example above, the chatbot’s dependency on the OpenAI API highlights the importance of mapping AI dependencies. When outages occur, knowing exactly where the failure lies can mean the difference between minutes of downtime and hours of lost revenue. By mapping your AI dependencies, you can quickly identify the root cause of outages, reducing downtime and minimizing revenue loss. Here's how:

#1 Visualize your AI dependencies

Start by creating a map of all the services and APIs your AI tools rely on. For example, if your chatbot depends on OpenAI’s API, you need to include it in your dependency map. Tools like Internet Stack Map can help you visualize these connections, making it easier to pinpoint where failures occur when an outage happens.

Internet Stack Map view

In the example above, the Internet Stack Map view of our eCommerce case study shows all other services are working as expected, except for the OpenAI API (highlighted in red), which impacts chatbot interactions.

#2 Customize your workflow

Every AI system is unique, so your dependency map should reflect your specific architecture. Identify key components like CDNs, DNS providers, and origin servers, and ensure they’re included in your map. This customization ensures you’re prepared to troubleshoot issues that are specific to your setup.

#3 Correlate data for faster insights

Use monitoring tools that combine synthetic testing with real-time outage data. By correlating this data, you can quickly determine whether an issue is with your AI provider (e.g., OpenAI) or your own infrastructure. This reduces the time spent diagnosing problems, helps you avoid unnecessary war rooms and saves you money.

Faster resolution, fewer disruptions

AI outages remind us how vulnerable we are in this interconnected world. When these systems fail, every minute counts—particularly if you’re losing revenue or driving customers away. That’s why Internet Stack Map, recently updated with a groundbreaking user interface, is a game-changer for incident response. It offers immediate clarity about what broke and where, shrinking your Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).  

See how Internet Stack Map can help you stay ahead of disruptions—schedule a demo today.

Learn more

AI platforms have experienced several service interruptions over the past few months.

A screenshot of a social media postAI-generated content may be incorrect.

We’ve all seen the memes fly when ChatGPT, Gemini or Perplexity go down. They’re funny at first, but then reality hits: if you rely on AI tools for work or business, these outages can grind your day to a halt. And it’s not just a glitch here or there— there’s a clear pattern of AI services failing across different platforms:

  • February 5-6, 2025: Google’s Gemini experienced a 23-hour disruption that affected the "Add File" and "Link File" functionalities within Gems. The outage prevented users from attaching files to their AI-driven workflows. Users had no workaround, leading to productivity loss for businesses relying on Gemini’s file-processing capabilities.
  • January 23, 2025: ChatGPT and several OpenAI APIs suffered elevated error rates, with users encountering "bad gateway" errors. Businesses relying on ChatGPT for automation, customer service, and content generation were left scrambling
  • January 23, 2025: Perplexity’s API experienced a major outage, causing timeouts and disruptions for applications relying on its AI capabilities.  
  • December 26, 2024: A host of OpenAI services (ChatGPT, Sora video creation, plus agents, realtime speech, batch, and DALL-E APIs) suffered error rates north of 90%.

  • June 4, 2024: On this day, multiple AI platforms, including OpenAI's ChatGPT, Anthropic's Claude, and Perplexity, experienced simultaneous outages. Users worldwide reported disruptions, leading to widespread discussions on social media platforms.

We’ve officially hit a point where our dependence on AI is no longer just a possibility; it’s an absolute. When these systems fail, we’re left scrambling. The real question is: how do you stay ahead of the next failure?

The revenue impact of AI outages

The numbers are big and growing bigger: in 2025, global AI investments are set to exceed $500 billion. For many companies, AI apps like ChatGPT aren’t optional anymore—they’re mission-critical. Gartner reports that 70% of enterprises now use large language models (LLMs) for everyday tasks like automated customer service, marketing personalization, and real-time data crunching.

When these AI systems go offline, it’s not just a minor inconvenience. In finance, a few hours of AI downtime could mean millions lost due to missed trades or undetected fraud. In eCommerce, chatbots and recommendation engines going dark mean abandoned shopping carts and fewer conversions, which translates to real money left on the table.

But the damage doesn’t stop at lost revenue. Companies increasingly rely on AI-powered automation to streamline workflows, meaning that outages force employees to revert to manual processes, significantly slowing down productivity. This is particularly evident in customer support, where AI chatbots handle vast volumes of inquiries. If an outage forces companies to fall back on human agents, call center queues expand, increasing response times and leading to diminished customer satisfaction.

If you’re concerned about the impact of AI outages on your business, now is the time to evaluate your AI dependencies and invest in tools that can help you stay ahead of disruptions.

The need for visibility: Mapping your AI dependencies  

Even with the best monitoring strategies in place, AI outages present a unique challenge: You may know something is broken, but not necessarily where or why. To accurately pinpoint issues, you need tools that enable you to get actionable insights into your AI dependencies—whether they originate in the application layer or the underlying Internet Stack.  

eCommerce AI dependencies: A case study

Consider an eCommerce company relying on an AI-powered chatbot for customer support. It relies on several key components to deliver a seamless shopping experience:

  • Front-end CDN: Ensures fast content delivery to users.
  • Distributed Hyperscaler: Acts as the origin server for dynamic content.
  • Search and Seller APIs: Retrieve relevant product data for users.
  • Chatbot Powered by OpenAI API: Handles customer inquiries and provides real-time support.

The chatbot is a critical part of the customer support workflow. When a shopper interacts with the chatbot, their request is forwarded to an external API, which then interacts with the OpenAI API to generate a response. This means the chatbot’s functionality is entirely dependent on the OpenAI API.

A white rectangular sign with black textAI-generated content may be incorrect.
Flow diagram depicting the interaction between a user, an external API, and OpenAI's API in an e-commerce chatbot system

If the OpenAI API experiences an outage, the chatbot fails, leaving customers without support. This not only frustrates users but can also lead to lost sales and damaged customer relationships.

How to map AI dependencies and stay ahead of outages

In the eCommerce example above, the chatbot’s dependency on the OpenAI API highlights the importance of mapping AI dependencies. When outages occur, knowing exactly where the failure lies can mean the difference between minutes of downtime and hours of lost revenue. By mapping your AI dependencies, you can quickly identify the root cause of outages, reducing downtime and minimizing revenue loss. Here's how:

#1 Visualize your AI dependencies

Start by creating a map of all the services and APIs your AI tools rely on. For example, if your chatbot depends on OpenAI’s API, you need to include it in your dependency map. Tools like Internet Stack Map can help you visualize these connections, making it easier to pinpoint where failures occur when an outage happens.

Internet Stack Map view

In the example above, the Internet Stack Map view of our eCommerce case study shows all other services are working as expected, except for the OpenAI API (highlighted in red), which impacts chatbot interactions.

#2 Customize your workflow

Every AI system is unique, so your dependency map should reflect your specific architecture. Identify key components like CDNs, DNS providers, and origin servers, and ensure they’re included in your map. This customization ensures you’re prepared to troubleshoot issues that are specific to your setup.

#3 Correlate data for faster insights

Use monitoring tools that combine synthetic testing with real-time outage data. By correlating this data, you can quickly determine whether an issue is with your AI provider (e.g., OpenAI) or your own infrastructure. This reduces the time spent diagnosing problems, helps you avoid unnecessary war rooms and saves you money.

Faster resolution, fewer disruptions

AI outages remind us how vulnerable we are in this interconnected world. When these systems fail, every minute counts—particularly if you’re losing revenue or driving customers away. That’s why Internet Stack Map, recently updated with a groundbreaking user interface, is a game-changer for incident response. It offers immediate clarity about what broke and where, shrinking your Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).  

See how Internet Stack Map can help you stay ahead of disruptions—schedule a demo today.

Learn more

This is some text inside of a div block.

You might also like

Blog post

When AI tools fail: How to map your AI dependencies for proactive visibility

Blog post

2024: A banner year for Internet Resilience

Blog post

Lessons from Microsoft’s office 365 Outage: The Importance of third-party monitoring