Learn

Gen AI Benchmark

Introduction

Generative AI (GenAI) has become a ubiquitous part of modern life, seamlessly integrating into how we work, plan vacations, and manage our daily routines. Its rapid rise to necessity is undeniable, and with increasing competition, performance becomes critical. Catchpoint has benchmarked the performance of leading GenAI tools, evaluating key metrics like response times and user authentication speeds to identify the top performers across various regions.

Testing Methodology

The performance benchmark was conducted against the following popular GenAI platforms:

ChatGPT
Google Gemini
H20.ai
watsonx.ai
Meta AI

To evaluate the end-to-end user experience, we monitored the following aspects for each AI tool:

Homepage load time
User authentication process
Time taken by each AI tool to respond to user questions

Testing timeframe‍

The tests were conducted between July 30, 2024, 00:00 EST, and August 8, 2024, 23:59 EST.
‍
The benchmark included measurements across multiple Internet Service Providers (ISPs) and cities in the following countries: Australia, Canada, India, Italy, Mexico, Singapore, South Africa, and the United States.

‍
Key findings

H2O.ai consistently ranks as the fastest across all pages and in most countries.
watsonx.ai has the slowest response time at 691 ms, making it 3x slower than H2O.ai.
Meta AI's TCP connect time in Italy is 10x slower compared to other countries.
Google Gemini experiences the slowest wait time in South Africa at 1301 ms, which is 12x slower than in Singapore or the United States.
ChatGPT has the longest user authentication time across all countries.

*The average response time (geometric mean calculation) in milliseconds for various AI tools to answer user queries.*

Network Performance (Response time)

Components of response time:

DNS Lookup Time: Time taken to resolve the domain name into an IP address.
Connect: Time to establish a connection to the server.
Secure Sockets Layer (SSL): Time taken to complete the SSL handshake with the server.
Wait: Time from when the request was sent to the server until the first response packet was received.
Page Load: Time to fully load the requested page.

These components together determine the overall response time, which is a key metric for evaluating network performance. Ensuring that the base request file is fetched promptly is critical for optimal performance.

Overall response time comparison

Key insights:

The chart above shows geometric mean response time for the homepage across the tested GenAI platforms.

H20.ai consistently demonstrates the fastest response time.
watsonx.ai has the slowest response time. High Connect and SSL times negatively contribute to the overall response times, compared to wait times.
Across all sites, we see that the highest contributor to the response time is wait time, except in the case of watsonx.ai.

Response times by country

The table below shows the average response time in milliseconds for the homepage across different countries and GenAI platforms.

Key findings by country:

watsonx.ai has the slowest network performance across multiple locations.
H20.ai shows the fastest response times in the majority of locations.
Meta AI is three times slower in several countries compared to its performance in the U.S. and Canada.

Component analysis

The table below breaks down the response time into individual components (Connect, SSL, Wait, Load) to help identify which part affects the overall performance.

Key insights:

Despite having relatively high connect and SSL times, H2O.ai still manages the fastest overall response time (216 ms). This suggests that its efficiency in wait and load times compensates for the slower connect and SSL phases, making it highly optimized for user interactions.
watsonx.ai’s connect and SSL times are significantly higher than the other tools, contributing to its high overall response time (691 ms). Even though it has a very low load time (5 ms), this efficiency is outweighed by the delays in establishing a secure connection.
Meta AI has a very high wait time (260 ms), which is significantly greater than other platforms. This suggests that its server is slower in processing requests and sending the first byte of data. Despite relatively low connect and SSL times, the high wait time pushes Meta AI’s overall response time to 474 ms.

Connect time

Connect time measures the duration taken to establish a transmission control protocol (TCP) connection with the server. This metric is crucial for understanding network latency. A high connect time can create a ripple effect, negatively impacting other metrics and ultimately leading to a poor user experience.

Connect time by country

The table below shows the average connect time in milliseconds for the homepage across various countries.

Key insights:

Despite experiencing high connect and SSL times, trace route data shows no signs of latency, indicating that the network path itself is functioning efficiently. The root cause of the delays seems to lie elsewhere.

watsonx.ai uses a CDN which should help reduce latency by distributing content closer to users. However, no issues were found in how users were mapped to the CDN’s edge servers.

Since the network path and CDN seem to be working efficiently, the likely cause of the high connect and SSL times lies at the server level. watsonx.ai’s servers might be employing advanced security features, which could be contributing to the delays during the connect and SSL phases.

Meta AI’s extremely high connect time In Italy

Meta AI's connect time in Italy was nearly 10 times higher than in other countries.
Upon further investigation, we discovered that during the testing timeframe, users from Italy were predominantly being mapped to servers located in the US. This geographical mismatch resulted in connect times being up to 7 times higher than normal.

Wait Time

Wait time refers to the duration from when the request is sent to the server until the first response packet is received.

Wait time by country

The following table shows the average wait times in milliseconds for the homepage across various countries.

Key insights

Google Gemini – High wait time in South Africa

As shown in the screenshot above, Google Gemini exhibits significantly high wait time in South Africa, where the time is 6x slower compared to other countries.

Cause: The increased wait time can be traced to a redirect that occurs when users in South Africa access Google Gemini. Before the base HTML is fetched, users are redirected to googlegemini.com/?hl=en.
Impact: Although this redirect occurs across all countries, the server processing time for the redirect request from South Africa is notably higher compared to other regions, as reflected in the response headers.

User authentication performance

This analysis examines the login response times across various GenAI tools (except Meta AI, which was excluded due to challenges in replicating the login flow).

*The average response time for user authentication*

Unlike ChatGPT, H20.ai delivers the fastest login performance, reflecting its optimised process with fewer redirects and more efficient server-side handling.

Login response times by country

*The above table shows the average response time for user authentication by country*

Key insights

ChatGPT consistently has slower response times across all countries, with particularly high delays observed in Singapore and South Africa.
H2O.ai is the fastest across most regions, delivering significantly quicker login response times, especially in Canada, the United States, and Mexico.

Reasons for performance differences:

Multiple redirects: Platforms like ChatGPT experience slower login times due to multiple redirects in their authentication processes. These redirects add additional delays before the base request is fetched, contributing to the extended login times.
H2O.ai’s streamlined login process: In contrast, H2O.ai completes user authentication with a single call, avoiding the delays caused by multiple redirects and offering faster performance.

*The above waterfall illustrates the high wait time experienced by Chatgpt due to multiple redirects in the user authentication process.*

GenAI Performance

In addition to network and user authentication tests, we measured the performance of the AI tools based on the time taken to generate a response to a query.

*The above chart represents the time taken for AI tools to generate a response for a query*

*The above table represents the country breakdown of time taken for the tested tools to generate a response*

H2O.ai consistently outperformed other platforms, showing the fastest response times across most of the monitored countries. With its strong performance, H2O.ai demonstrates high efficiency in generating responses.
watsonx.ai consistently showed the slowest response times across all countries, indicating a significant gap in its ability to quickly generate responses compared to other AI tools. Its performance is noticeably slower in regions like Mexico and Australia.
While not the fastest overall, Meta AI maintained consistent response times across regions, making it more predictable in performance compared to tools like ChatGPT or Google Gemini, which showed more variation.

How to get the most out of your GenAI tools

Our GenAI Platform Performance Benchmark Report underscores the critical role that performance plays in the success of GenAI tools. As companies increasingly integrate these platforms into their operations via APIs, ensuring optimal performance at every touchpoint becomes vital. That’s where Catchpoint’s Internet Performance Monitoring (IPM) comes in.

Why IPM is essential for comprehensive GenAI platform monitoring

As the demand for seamless AI integration grows, companies must monitor GenAI platforms to deliver fast, scalable, and reliable user experiences. IPM is uniquely positioned to meet this need by offering comprehensive visibility into all aspects of GenAI platform performance.

Proactive monitoring: Catchpoint IPM continuously measures the performance of your GenAI tools, providing crucial insights, and also immediately alerts you if a platform goes down. This is crucial as many of these platforms are still evolving and prone to outages. Early alerts help you address issues before they impact users, ensuring continuity and reliability.
Full Internet Stack visibility: GenAI platforms rely on complex interactions between multiple components in the Internet Stack, including DNS, network connectivity, API calls, and third-party services. IPM provides deep and wide visibility across the Internet Stack, ensuring that all components involved in GenAI interactions function optimally.
In-depth analysis: With products like Tracing and WebPageTest, IPM enables in-depth analysis of GenAI platform performance, helping to pinpoint bottlenecks in API interactions, authentication flows, or query response times. By integrating various telemetry sources, such as Internet Sonar or the Internet Stack Map, IPM provides a unified view that allows IT teams to detect and address potential issues before they affect your business.

What is ECN?

Explicit Congestion Notification (ECN) is a longstanding mechanism in place on the IP stack to allow the network help endpoints "foresee" congestion between them. The concept is straightforward… If a close-to-be-congested piece of network equipment, such as a middle router, could tell its destination, "Hey, I'm almost congested! Can you two guys slow down your data transmission? Otherwise, I’m worried I will start to lose packets...", then the two endpoints can react in time to avoid the packet loss, paying only the price of a minor slow down.

What is ECN bleaching?

ECN bleaching occurs when a network device at any point between the source and the endpoint clears or “bleaches” the ECN flags. Since you must arrive at your content via a transit provider or peering, it’s important to know if bleaching is occurring and to remove any instances.

With Catchpoint’s Pietrasanta Traceroute, we can send probes with IP-ECN values different from zero to check hop by hop what the IP-ECN value of the probe was when it expired. We may be able to tell you, for instance, that a domain is capable of supporting ECN, but an ISP in between the client and server is bleaching the ECN signal.

Why is ECN important to L4S?

ECN is an essential requirement for L4S since L4S uses an ECN mechanism to provide early warning of congestion at the bottleneck link by marking a Congestion Experienced (CE) codepoint in the IP header of packets. After receipt of the packets, the receiver echoes the congestion information to the sender via acknowledgement (ACK) packets of the transport protocol. The sender can use the congestion feedback provided by the ECN mechanism to reduce its sending rate and avoid delay at the detected bottleneck.

ECN and L4S need to be supported by the client and server but also by every device within the network path. It only takes one instance of bleaching to remove the benefit of ECN since if any network device between the source and endpoint clears the ECN bits, the sender and receiver won’t find out about the impending congestion. Our measurements examine how often ECN bleaching occurs and where in the network it happens.

Why is ECN and L4S in the news all of a sudden?

ECN has been around for a while but with the increase in data and the requirement for high user experience particularly for streaming data, ECN is vital for L4S to succeed, and major investments are being made by large technology companies worldwide.

L4S aims at reducing packet loss - hence latency caused by retransmissions - and at providing as responsive a set of services as possible. In addition to that, we have seen significant momentum from major companies lately - which always helps to push a new protocol to be deployed.

What is the impact of ECN bleaching?

If ECN bleaching is found, this means that any methodology built on top of ECN to detect congestion will not work.

Thus, you are not able to rely on the network to achieve what you want to achieve, i.e., avoid congestion before it occurs – since potential congestion is marked with Congestion Experienced (CE = 3) bit when detected, and bleaching would wipe out that information.

What are the causes behind ECN bleaching?

The causes behind ECN bleaching are multiple and hard to identify, from network equipment bugs to debatable traffic engineering choices and packet manipulations to human error.

For example, bleaching could occur from mistakes such as overwriting the whole ToS field when dealing with DSCP instead of changing only DSCP (remember that DSCP and ECN together compose the ToS field in the IP header).

How can you debug ECN bleaching?

Nowadays, network operators have a good number of tools to debug ECN bleaching from their end (such as those listed here) – including Catchpoint’s Pietrasanta Traceroute. The large-scale measurement campaign presented here is an example of a worldwide campaign to validate ECN readiness. Individual network operators can run similar measurement campaigns across networks that are important to them (for example, customer or peering networks).

What is the testing methodology?

The findings presented here are based on running tests using Catchpoint’s enhanced traceroute, Pietrasanta Traceroute, through the Catchpoint IPM portal to collect data from over 500 nodes located in more than 80 countries all over the world. By running traceroutes on Catchpoint’s global node network, we are able to determine which ISPs, countries and/or specific cities are having issues when passing ECN marked traffic. The results demonstrate the view of ECN bleaching globally from Catchpoint’s unique, partial perspective. To our knowledge, this is one of the first measurement campaigns of its kind.

Beyond the scope of this campaign, Pietrasanta Traceroute can also be used to determine if there is incipient congestion and/or any other kind of alteration and the level of support for more accurate ECN feedback, including if the destination transport layer (either TCP or QUIC) supports more accurate ECN feedback.

The content of this page is Copyright 2024 by Catchpoint. Redistribution of this data must retain the above notice (i.e. Catchpoint copyrighted or similar language), and the following disclaimer.

THE DATA ABOVE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS OR INTELLECTUAL PROPERTY RIGHT OWNERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THIS DATA OR THE USE OR OTHER DEALINGS IN CONNECTION WITH THIS DATA.
‍
We are happy to discuss or explain the results if more information is required. Further details per region can be released upon request.

September 18, 2024