True Availability of a Website

Availability, or Uptime, is one of the most important metrics for web performance monitoring, however it is often misunderstood and defined incorrectly.

Availability is simply the percentage of time your site, application, or service works successfully. The problems in defining Availability arise from defining what is considered “success” and what is considered “failure”. In the case an image request from a webserver, availability is clear – if the server responds with the image it is “Success”, and anything else is considered a failure. On the other side, for a webpage Success can have a more complicated meaning.

Imagine you build a simple webpage that utilizes the JQuery Library for a slideshow. You host the webpage at a web hosting company and you decide to host the JQuery library on a CDN provider. Your web hosting company promises 99.7% uptime (these promises are made in a ‘Service Level Agreement’ or SLA), and the CDN also promises 99.7% uptime, so it seems that there will be no impact on reliability by using the CDN

Not so fast! Let’s take a close look at what happens when either vendor fails. If your web hosting company is down, the webpage is unavailable -obvious failure. If the CDN is down or it fails to deliver the JQuery Library, your webpage is still reachable, but the slideshow will not work – and the webpage might be slow to render (browsers wait 2-3 seconds for a response from a server before canceling the request). In the eyes of an end user the webpage failed, therefore it is not available!

The webpage is truly available when both providers are available. Since the CDN and web host are not related, the True Availability of the webpage is 99.7% * 99.7% = 99.4% – which means that 0.6% of the time users will not be able to use your web page!

You can clearly see that there is a tradeoff between complexity and availability. In fact, each time you introduce a new link into the chain of events to serve a web page, the availability will go down.

So far it all seems nice and precise, and most likely you are thinking of determining the availability of your web-site by taking all the SLA’s and multiplying their individual availabilities. It is not that simple!

Let’s move for a moment from the world of mathematics into the world of human-usability, perceived quality and customer satisfaction. Most of the webpages do not require that all the requests they reference load properly. As matter of fact, for quite a few of these requests the user might not even notice if they loaded or not. For example: If Google Analytics tags are at the end of the page, and if they fail to load or post the data collected, they will not impact usability of the page. Therefore, defining which hosts impact your availability will change from page to page and company to company, depending on what they consider “unusable”.

Besides availability, another major concern with third parties is their performance, or response speed. Speed related issues are not counted as unavailability in SLAs and can occur more often than failures. The impact of slowness might not be as much of a problem for image tags or even iframes since they are not blocking calls – they don’t block the rendering of the rest of the tags. On the other hand, external JavaScript and CSS tags will block the rest of the content until they are completely loaded and executed, and therefore they can slow down the entire page – posing another problem for your webpage.

You probably want the benefits of having a faster website by using more hosts or a CDN, and you may want to have lower operational costs by outsourcing certain tasks to other vendors who can leverage economies of scale. You might also be getting revenue from vendors such as an ad network or content partner. So how can you deal with problems caused by third party vendors?

We suggest the following:

Start by defining what you consider as “un-usable” for your website. Do you consider a broken image as making the page un-usable? Do you consider a form not working/submitting as unusable?
Define which content can make the page unusable. – Advertising, JavaScript libraries, CSS, etc.
Host high impact content inline, or host it on the same server/host as the primary webpage and leverage caching as much as possible through the HTTP headers.
If content must reside on a third party provider opt first for iframes, if possible, to mitigate risks. Most adserver and widget solutions either provide iframe tags, or can be nested in iframe.
Develop your webpages so that:
All external JavaScripts delivered by other providers resides at the bottom of the page (right before the ).
Develop the server side of the webpage so that any high impact content served by third parties can be switched off (excluded from the output to the browser), or be served by the same host as the webpage as a backup.
Monitor your third parties 24×7 and if they go down, switch them off or re-route to your own servers to mitigate impact on your webpage!

Unfortunately you cannot move all content to bottom of the page, so you need to consider the value of the service provided by any third party and work out if it is worth the risk or not!

I hope this post sheds some light on the meaning of true availability and how to deal with third party content.

Drit

July 21, 2010

Catchpoint Team

Synthetic Monitoring

Network Reachability

CDN

SLA Management

Workforce Experience

SaaS Application Monitoring

Blog post

Traceroute InSession: A traceroute tool for modern networks

Blog post

The cost of inaction: A CIO’s primer on why investing in Internet Performance Monitoring can’t wait

Blog post

True Availability of a Website

You might also like

Traceroute InSession: A traceroute tool for modern networks

The cost of inaction: A CIO’s primer on why investing in Internet Performance Monitoring can’t wait

Mastering IPM: Key Takeaways from our Best Practices Series