Water droplet rippling outward

Feedback wanted: An experimental responsiveness metric

Feedback wanted: An experimental responsiveness metric

An update on our plans for measuring responsiveness on the web.

Earlier this year, the Chrome Speed Metrics Team shared some of the ideas we were considering for a new responsiveness metric. We want to design a metric that better captures the end-to-end latency of individual events and offers a more holistic picture of the overall responsiveness of a page throughout its lifetime.

We've made a lot of progress on this metric in the last few months, and we wanted to share an update on how we plan to measure interaction latency as well as introduce a few specific aggregation options we're considering to quantify the overall responsiveness of a web page.

We'd love to get feedback from developers and site owners as to which of these options would be most representative of the overall input responsiveness of their pages.

Measure interaction latency #

As a review, the First Input Delay (FID) metric captures the delay portion of input latency. That is, the time between when the user interacts with the page to the time when the event handlers are able to run.

With this new metric we plan to expand that to capture the full event duration, from initial user input until the next frame is painted after all the event handlers have run.

We also plan to measure interactions rather than individual events. Interactions are groups of events that are dispatched as part of the same, logical user gesture (for example: pointerdown, click, pointerup).

To measure the total interaction latency from a group of individual event durations, we are considering two potential approaches:

  • Maximum event duration: the interaction latency is equal to the largest single event duration from any event in the interaction group.
  • Total event duration: the interaction latency is the sum of all event durations, ignoring any overlap.

As an example, the diagram below shows a key press interaction that consists of a keydown and a keyup event. In this example there is a duration overlap between these two events. To measure the latency of the key press interaction, we could use max(keydown duration, keyup duration) or sum(keydown duration, keyup duration) - duration overlap:

A diagram showing interaction latency based on event durations

There are pros and cons of each approach, and we'd like to collect more data and feedback before finalizing a latency definition.

The event duration is meant to be the time from the event hardware timestamp to the time when the next paint is performed after the event is handled. But if the event doesn't cause any update, the duration will be the time from event hardware timestamp to the time when we are sure it will not cause any update.

For keyboard interactions, we usually measure the keydown and keyup. But for IME, such as input methods for Chinese and Japanese, we measure the input events between a compositionstart and a compositionend.

Aggregate all interactions per page #

Once we're able to measure the end-to-end latency of all interactions, the next step is to define an aggregate score for a page visit, which may contain more than one interaction.

After exploring a number of options, we've narrowed our choices down to the strategies outlined in the following section, each of which we're currently collecting real-user data on in Chrome. We plan to publish the results of our findings once we've had time to collect sufficient data, but we're also looking for direct feedback from site owners as to which strategy would most accurately reflect the interaction patterns on their pages.

Aggregation strategies options #

To help explain each of the following strategies, consider an example page visit that consists of four interactions:

InteractionLatency
Click120 ms
Click20 ms
Key press60 ms
Key press80 ms

Worst interaction latency #

The largest, individual interaction latency that occurred on a page. Given the example interactions listed above, the worst interaction latency would be 120 ms.

Budgets strategies #

User experience research suggests that users may not perceive latencies below certain thresholds as negative. Based on this research we're considering several budget strategies using on the following thresholds for each event type:

Interaction typeBudget threshold
Click/tap100 ms
Drag100 ms
Keyboard50 ms

Each of these strategies will only consider the latency that is more than the budget threshold per interaction. Using the example page visit above, the over-budget amounts would be as follows:

InteractionLatencyLatency over budget
Click120 ms20 ms
Click20 ms0 ms
Key press60 ms10 ms
Key press80 ms30 ms

Worst interaction latency over budget #

The largest single interaction latency over budget. Using the above example, the score would be max(20, 0, 10, 30) = 30 ms.

Total interaction latency over budget #

The sum of all interaction latencies over budget. Using the above example, the score would be (20 + 0 + 10 + 30) = 60 ms.

Average interaction latency over budget #

The total over-budget interaction latency divided by the total number of interactions. Using the above example, the score would be (20 + 0 + 10 + 30) / 4 = 15 ms.

High quantile approximation #

As an alternative to computing the largest interaction latency over budget, we also considered using a high quantile approximation, which should be fairer to web pages that have a lot of interactions and may be more likely to have large outliers. We've identified two potential high-quantile approximation strategies we like:

  • Option 1: Keep track of the largest and second-largest interactions over budget. After every 50 new interactions, drop the largest interaction from the previous set of 50 and add the largest interaction from the current set of 50. The final value will be largest remaining interaction over budget.
  • Option 2: Compute the largest 10 interactions over budget and choose a value from that list depending on the total number of interactions. Given N total interactions, select the (N / 50 + 1)th largest value, or the 10th value for pages with more than 500 interactions.

Measure these options in JavaScript #

The following code example can be used to determine the values of the first three strategies presented above. Note that it's not yet possible to measure the total number of interactions on a page in JavaScript, so this example doesn't include the average interaction over budget strategy or the high quantile approximation strategies.

const interactionMap = new Map();

let worstLatency = 0;
let worstLatencyOverBudget = 0;
let totalLatencyOverBudget = 0;

new PerformanceObserver((entries) => {
for (const entry of entries.getEntries()) {
// Ignore entries without an interaction ID.
if (entry.interactionId > 0) {
// Get the interaction for this entry, or create one if it doesn't exist.
let interaction = interactionMap.get(entry.interactionId);
if (!interaction) {
interaction = {entries: []};
interactionMap.set(entry.interactionId, interaction);
}
interaction.entries.push(entry);

const latency = Math.max(entry.duration, interaction.latency || 0);
worstLatency = Math.max(worstLatency, latency);

const budget = entry.name.includes('key') ? 50 : 100;
const latencyOverBudget = latency - budget;
worstLatencyOverBudget =
Math.max(latencyOverBudget, worstLatencyOverBudget);

// If this event adds additional latency, update the total over budget.
const newLatency = latency - (interaction.latency || 0);
if (newLatency > 0) {
totalLatencyOverBudget += newLatency;
}

// Set the latency on the interaction so future events can reference.
interaction.latency = latency;

// Log the updated metric values.
console.log({
worstLatency,
worstLatencyOverBudget,
totalLatencyOverBudget,
});
}
}
// Set the `durationThreshold` to 50 to capture keyboard interactions
// that are over-budget (the default `durationThreshold` is 100).
}).observe({type: 'event', buffered: true, durationThreshold: 50});

Caution: There are currently a few bugs in Chrome that affect accuracy of the reported interaction timestamps. We are working to fix these bugs as soon as possible, and we recommend developers test these strategies in Chrome Canary to get the most accurate results.

Feedback #

We want to encourage developers to try out these new responsiveness metrics on their sites, and let us know if you discover any issue.

Please also email any general feedback on the approaches outlined here to the web-vitals-feedback Google group with "[Responsiveness Metrics]" in the subject line. We're really looking forward to hearing what you think!

Last updated: Improve article