Blog Post

Measuring Perceived Performance

Published
October 21, 2016
#
 mins read
By 

in this blog post

Back in 2012 Steve Souders published a blog post with the intention of helping to move away from using window.onload() as an indicator of website speed. While onload sometimes still has its place in indicating when a page is complete, we can all agree that onload is not useful for measuring user experience and perception of when a page is loaded. The web performance community has come up with various techniques without settling for a single solution.

In this article, we will cover the various options for measuring perceived performance, why there is no one-size-fits-all solution, and what we can do about it.

The Measurement Options

Various attempts have been made for measuring user perception of performance, some of which include:

First paint is reported by the browser and tells you when the page starts changing. It doesn’t indicate completeness, though, and sometimes measures when nothing visible is painted.

Render start is a synthetic test measurement that detects when the page first changes from blank to displaying visible content.

DOM interactive indicates when the browser finishes building the DOM and can be used to approximate when the user is able to interact.

Speed index measures the average time for pixels on the visible screen to reach a “complete” state. It can approximate user perception, but since it doesn’t account for which content is important to a user, the average can be skewed by incomplete content that isn’t relevant to the user experience. It also currently does not work well for indicating perceived performance of soft-navigations in single page applications and can’t easily be explained in laymen’s terms, making it difficult to understand for many people.

Speed Index comes in two main varieties:

Speed Index measured externally with synthetic testing using video/filmstrip analysis.

Speed Index measured client-side with the resource timing API for visible objects.

[Above the fold time](http://assets.en.oreilly.com/1/event/62/Above the Fold Time_ Measuring Web Page Performance Visually Presentation.pdf) (AFT) indicates when visible content is finished rendering, but can be unreliable since single pixel changes can skew results. AFT has a similar problem to speed index in that it assumes all content in the frame has the same importance to the end user’s perception of performance.

Object rendering timing (AKA Hero Image timing) uses a resource on the page as a proxy for determining completeness in the eyes of a user. This method solves the problem that Speed Index and AFT have since the results can’t be skewed by page content that isn’t relevant to an end user. This can be done using various methods yielding mixed results depending on the page. Unfortunately, this method can be unmanageable if the site content changes frequently, and this issue gets magnified for sites with a lot of different page templates.

Critical resources identifies render-blocking CSS and JS requests.

Various other “visual performance metrics” exist with the same goals and challenges in mind.

What Is Being Used Today?

Many different companies have various ways of measuring the user experience that differ from each other; and rightly so, considering the applications and users’ expectations of the applications differ.

For example:

Etsy uses Render Start time and Speed Index in synthetic and load times in RUM.

Netflix uses domInteractive from the Navigation Timing API, but that can be problematic at times as described by Steve Souders.

Many others use hero images or variations depending on what the primary content on the page is for example, “time to first product image.” Similarly, many teams rely on a measurement for when the user first interacted by using metrics like “time to first click.” For example, Facebook uses “time-to-interact” to indicate when the user first interacted with the page.

Why Isn’t There a Standard?

The goal to standardize the measurement of user experience is important for the industry because it makes benchmarking easier and more meaningful, enables better collaboration for performance-centric improvements within organizations and the industry, and it makes it easier to report and act on the data provided by passive and active monitoring solutions.

Since many websites and applications are different in nature and in how they are constructed, the UX measurement will be different across most sites and applications. The business motivation behind collecting their metrics can also vary greatly. This is why adoption of the techniques described above hasn’t been consistent – the results aren’t always accurate, reliable, relatable, or understandable depending on the application being measured. For example, after researching user behavior, one might find that conversions occur once a specific amount of the page is rendered, so optimizing everything, ala speed index or AFT, might focus efforts in the wrong place.

Here’s What We Can Do

First, and maybe most important, looking for a single metric may help, but it also may not. Without doing the proper research to understand your end-users, you won’t know what is important to measure and how it should be measured. Additionally, one must understand the application being measured well enough to make the decision on what will work. There are different goals when trying to understand the end-user experience, so measuring different aspects of their experience is necessary. For example, “did the content that matters load and how long did it take?”, “when could the user interact with the page?”, “what resources impacted UX?”

Since there is no one-size-fits-all metric or methodology for collecting them, standardizing how these metrics are reported is more practical than standardizing the metrics themselves. This is especially true considering that user expectations vary depending on what type of application they are using and even what portion of the application they are interacting with. For example, the expectation for speed when searching for flights on a travel site is different than it is for loading the checkout page.

Considering that the measurements boil down to some common goals that are somewhat methodologically agnostic, mapping the values to common names is viable with User Timing “Standard Mark Names”. These currently include “mark_fully_loaded”, “mark_fully_visible”, “mark_above_the_fold”, “mark_time_to_user_action”. Conforming to these standard names has the added benefit that reporting across tools, such as synthetic and RUM solutions, becomes much easier, thereby making correlations a more simple task. This leads to buy-in from more teams in supporting the goal to improve experience for the end-user.

Lastly, major browser vendors (Chrome, IE/Edge, Safari, Firefox) historically have helped tremendously with the effort to improve user experience. This assistance would be beneficial yet again if the major browsers added support for metrics like Speed Index, Critical Resources, AFT, and an interface for the developer to specify user-perceived page completeness directly. Additionally, support from the W3C can help facilitate this process. This wouldn’t eliminate the need to research and understand what is important to measure and how to measure it, but it would go a big way in helping with the implementation for site owners.

Back in 2012 Steve Souders published a blog post with the intention of helping to move away from using window.onload() as an indicator of website speed. While onload sometimes still has its place in indicating when a page is complete, we can all agree that onload is not useful for measuring user experience and perception of when a page is loaded. The web performance community has come up with various techniques without settling for a single solution.

In this article, we will cover the various options for measuring perceived performance, why there is no one-size-fits-all solution, and what we can do about it.

The Measurement Options

Various attempts have been made for measuring user perception of performance, some of which include:

First paint is reported by the browser and tells you when the page starts changing. It doesn’t indicate completeness, though, and sometimes measures when nothing visible is painted.

Render start is a synthetic test measurement that detects when the page first changes from blank to displaying visible content.

DOM interactive indicates when the browser finishes building the DOM and can be used to approximate when the user is able to interact.

Speed index measures the average time for pixels on the visible screen to reach a “complete” state. It can approximate user perception, but since it doesn’t account for which content is important to a user, the average can be skewed by incomplete content that isn’t relevant to the user experience. It also currently does not work well for indicating perceived performance of soft-navigations in single page applications and can’t easily be explained in laymen’s terms, making it difficult to understand for many people.

Speed Index comes in two main varieties:

Speed Index measured externally with synthetic testing using video/filmstrip analysis.

Speed Index measured client-side with the resource timing API for visible objects.

[Above the fold time](http://assets.en.oreilly.com/1/event/62/Above the Fold Time_ Measuring Web Page Performance Visually Presentation.pdf) (AFT) indicates when visible content is finished rendering, but can be unreliable since single pixel changes can skew results. AFT has a similar problem to speed index in that it assumes all content in the frame has the same importance to the end user’s perception of performance.

Object rendering timing (AKA Hero Image timing) uses a resource on the page as a proxy for determining completeness in the eyes of a user. This method solves the problem that Speed Index and AFT have since the results can’t be skewed by page content that isn’t relevant to an end user. This can be done using various methods yielding mixed results depending on the page. Unfortunately, this method can be unmanageable if the site content changes frequently, and this issue gets magnified for sites with a lot of different page templates.

Critical resources identifies render-blocking CSS and JS requests.

Various other “visual performance metrics” exist with the same goals and challenges in mind.

What Is Being Used Today?

Many different companies have various ways of measuring the user experience that differ from each other; and rightly so, considering the applications and users’ expectations of the applications differ.

For example:

Etsy uses Render Start time and Speed Index in synthetic and load times in RUM.

Netflix uses domInteractive from the Navigation Timing API, but that can be problematic at times as described by Steve Souders.

Many others use hero images or variations depending on what the primary content on the page is for example, “time to first product image.” Similarly, many teams rely on a measurement for when the user first interacted by using metrics like “time to first click.” For example, Facebook uses “time-to-interact” to indicate when the user first interacted with the page.

Why Isn’t There a Standard?

The goal to standardize the measurement of user experience is important for the industry because it makes benchmarking easier and more meaningful, enables better collaboration for performance-centric improvements within organizations and the industry, and it makes it easier to report and act on the data provided by passive and active monitoring solutions.

Since many websites and applications are different in nature and in how they are constructed, the UX measurement will be different across most sites and applications. The business motivation behind collecting their metrics can also vary greatly. This is why adoption of the techniques described above hasn’t been consistent – the results aren’t always accurate, reliable, relatable, or understandable depending on the application being measured. For example, after researching user behavior, one might find that conversions occur once a specific amount of the page is rendered, so optimizing everything, ala speed index or AFT, might focus efforts in the wrong place.

Here’s What We Can Do

First, and maybe most important, looking for a single metric may help, but it also may not. Without doing the proper research to understand your end-users, you won’t know what is important to measure and how it should be measured. Additionally, one must understand the application being measured well enough to make the decision on what will work. There are different goals when trying to understand the end-user experience, so measuring different aspects of their experience is necessary. For example, “did the content that matters load and how long did it take?”, “when could the user interact with the page?”, “what resources impacted UX?”

Since there is no one-size-fits-all metric or methodology for collecting them, standardizing how these metrics are reported is more practical than standardizing the metrics themselves. This is especially true considering that user expectations vary depending on what type of application they are using and even what portion of the application they are interacting with. For example, the expectation for speed when searching for flights on a travel site is different than it is for loading the checkout page.

Considering that the measurements boil down to some common goals that are somewhat methodologically agnostic, mapping the values to common names is viable with User Timing “Standard Mark Names”. These currently include “mark_fully_loaded”, “mark_fully_visible”, “mark_above_the_fold”, “mark_time_to_user_action”. Conforming to these standard names has the added benefit that reporting across tools, such as synthetic and RUM solutions, becomes much easier, thereby making correlations a more simple task. This leads to buy-in from more teams in supporting the goal to improve experience for the end-user.

Lastly, major browser vendors (Chrome, IE/Edge, Safari, Firefox) historically have helped tremendously with the effort to improve user experience. This assistance would be beneficial yet again if the major browsers added support for metrics like Speed Index, Critical Resources, AFT, and an interface for the developer to specify user-perceived page completeness directly. Additionally, support from the W3C can help facilitate this process. This wouldn’t eliminate the need to research and understand what is important to measure and how to measure it, but it would go a big way in helping with the implementation for site owners.

This is some text inside of a div block.

You might also like

Blog post

Catch frustration before it costs you: New tools for a better user experience

Blog post

Preparing for the unexpected: Lessons from the AJIO and Jio Outage

Blog post

Demystifying API Monitoring and Testing with IPM