Blog Post

Identifying Outages with Real User Monitoring

Published
November 17, 2016
#
 mins read
By 

in this blog post

Understanding the user experience is important for anybody with an online presence. Many analytics solutions are available to help organizations understand what page yields greater user engagement, what paths a user takes through a site, which pieces of content are most popular and how fast pages load. Real user monitoring (RUM) has grown in popularity in recent years with specifications such as navigation timing, resource timing, and user timing. These help organizations collect data from actual users providing insight into the user experience.

Information can be obtained from RUM that isn’t always available with synthetic monitoring, you cannot have synthetic agents in every city, every ISP, every browser version, every OS. What ISP is being used? What browser? What device? RUM provides a view into the user’s world and the data obtained can help organizations improve the digital experience.

But RUM should not be the only solution used to measure the digital experience as there are some things RUM can’t provide. RUM can only provide data for sites with active traffic. If your site has not launched yet or if there is an outage no information can be gathered. We believe synthetic and RUM should be used together to get a complete picture.

With RUM there can be a lot of noise. Have you ever been to a large sporting event or a concert and tried to access a mobile application or web page only to have it take forever or fail to load completely. The majority of the time these failures are not due to the application but are rather due to congestion on the network. Too many people trying to access data across the same route. Issues like these have the potential to skew RUM data and generate noise when trying to understand performance.  Nobody wants to receive an alert only to determine it was caused by noise.

When there’s an outage you want to know as soon as possible. For every second your site is unavailable that means money lost, damage to your brand and unhappy customers. The sooner you know something is going wrong, the better. Synthetic monitoring has traditionally been used to alert teams to outages and application issues to expedite troubleshooting.

RUM can only provide information when users are on the site, if users can’t access the site then no information can be captured and analyzed. But what can be analyzed is historical trends and patterns. There are patterns to how applications are accessed with peak traffic certain days of the week or certain times of day. Variations from these patterns can indicate a regional problem where synthetic tests may not be running.

Catchpoint’s Outage Analyzer uses predictive models based on statistical analysis of historical data to identify regional outages. A color-coded map quickly reveals whether traffic levels are as expected, if they have dropped compared to historical trends, and how widespread an outage is. For example, last month when Dyn was under a DDoS attack there would have been a noticeable drop in site visits as users weren’t able to resolve DNS. Failure to resolve DNS results in the site being unreachable and no data will be collected via RUM.

outageanalyzer

This information can help organizations conduct an analysis of the impact an outage had and determine whether or not action needs to be taken. Regional outages may be related to mother nature, human error or infrastructure issues – sometimes there is nothing you can do to prevent a failure but before you decide an outage was unavoidable you need to know there was actually an outage.

Understanding the user experience is important for anybody with an online presence. Many analytics solutions are available to help organizations understand what page yields greater user engagement, what paths a user takes through a site, which pieces of content are most popular and how fast pages load. Real user monitoring (RUM) has grown in popularity in recent years with specifications such as navigation timing, resource timing, and user timing. These help organizations collect data from actual users providing insight into the user experience.

Information can be obtained from RUM that isn’t always available with synthetic monitoring, you cannot have synthetic agents in every city, every ISP, every browser version, every OS. What ISP is being used? What browser? What device? RUM provides a view into the user’s world and the data obtained can help organizations improve the digital experience.

But RUM should not be the only solution used to measure the digital experience as there are some things RUM can’t provide. RUM can only provide data for sites with active traffic. If your site has not launched yet or if there is an outage no information can be gathered. We believe synthetic and RUM should be used together to get a complete picture.

With RUM there can be a lot of noise. Have you ever been to a large sporting event or a concert and tried to access a mobile application or web page only to have it take forever or fail to load completely. The majority of the time these failures are not due to the application but are rather due to congestion on the network. Too many people trying to access data across the same route. Issues like these have the potential to skew RUM data and generate noise when trying to understand performance.  Nobody wants to receive an alert only to determine it was caused by noise.

When there’s an outage you want to know as soon as possible. For every second your site is unavailable that means money lost, damage to your brand and unhappy customers. The sooner you know something is going wrong, the better. Synthetic monitoring has traditionally been used to alert teams to outages and application issues to expedite troubleshooting.

RUM can only provide information when users are on the site, if users can’t access the site then no information can be captured and analyzed. But what can be analyzed is historical trends and patterns. There are patterns to how applications are accessed with peak traffic certain days of the week or certain times of day. Variations from these patterns can indicate a regional problem where synthetic tests may not be running.

Catchpoint’s Outage Analyzer uses predictive models based on statistical analysis of historical data to identify regional outages. A color-coded map quickly reveals whether traffic levels are as expected, if they have dropped compared to historical trends, and how widespread an outage is. For example, last month when Dyn was under a DDoS attack there would have been a noticeable drop in site visits as users weren’t able to resolve DNS. Failure to resolve DNS results in the site being unreachable and no data will be collected via RUM.

outageanalyzer

This information can help organizations conduct an analysis of the impact an outage had and determine whether or not action needs to be taken. Regional outages may be related to mother nature, human error or infrastructure issues – sometimes there is nothing you can do to prevent a failure but before you decide an outage was unavoidable you need to know there was actually an outage.

This is some text inside of a div block.

You might also like

Blog post

Catch frustration before it costs you: New tools for a better user experience

Blog post

Lessons from Microsoft’s office 365 Outage: The Importance of third-party monitoring

Blog post

When SSL Issues aren’t just about SSL: A deep dive into the TIBCO Mashery outage