Feedback wanted: An experimental responsiveness metric

An update on our plans for measuring responsiveness on the web.

Hongbo Song
Hongbo Song

Earlier this year, the Chrome Speed Metrics Team shared some of the ideas we were considering for a new responsiveness metric. We want to design a metric that better captures the end-to-end latency of individual events and offers a more holistic picture of the overall responsiveness of a page throughout its lifetime.

We've made a lot of progress on this metric in the last few months, and we wanted to share an update on how we plan to measure interaction latency as well as introduce a few specific aggregation options we're considering to quantify the overall responsiveness of a web page.

We'd love to get feedback from developers and site owners as to which of these options would be most representative of the overall input responsiveness of their pages.

Measure interaction latency

As a review, the First Input Delay (FID) metric captures the delay portion of input latency. That is, the time between when the user interacts with the page to the time when the event handlers are able to run.

With this new metric we plan to expand that to capture the full event duration, from initial user input until the next frame is painted after all the event handlers have run.

We also plan to measure interactions rather than individual events. Interactions are groups of events that are dispatched as part of the same, logical user gesture (for example: pointerdown, click, pointerup).

To measure the total interaction latency from a group of individual event durations, we are considering two potential approaches:

  • Maximum event duration: the interaction latency is equal to the largest single event duration from any event in the interaction group.
  • Total event duration: the interaction latency is the sum of all event durations, ignoring any overlap.

As an example, the diagram below shows a key press interaction that consists of a keydown and a keyup event. In this example there is a duration overlap between these two events. To measure the latency of the key press interaction, we could use max(keydown duration, keyup duration) or sum(keydown duration, keyup duration) - duration overlap:

A
diagram showing interaction latency based on event durations

There are pros and cons of each approach, and we'd like to collect more data and feedback before finalizing a latency definition.

Aggregate all interactions per page

Once we're able to measure the end-to-end latency of all interactions, the next step is to define an aggregate score for a page visit, which may contain more than one interaction.

After exploring a number of options, we've narrowed our choices down to the strategies outlined in the following section, each of which we're currently collecting real-user data on in Chrome. We plan to publish the results of our findings once we've had time to collect sufficient data, but we're also looking for direct feedback from site owners as to which strategy would most accurately reflect the interaction patterns on their pages.

Aggregation strategies options

To help explain each of the following strategies, consider an example page visit that consists of four interactions:

Interaction Latency
Click 120 ms
Click 20 ms
Key press 60 ms
Key press 80 ms

Worst interaction latency

The largest, individual interaction latency that occurred on a page. Given the example interactions listed above, the worst interaction latency would be 120 ms.

Budgets strategies

User experience research suggests that users may not perceive latencies below certain thresholds as negative. Based on this research we're considering several budget strategies using on the following thresholds for each event type:

Interaction type Budget threshold
Click/tap 100 ms
Drag 100 ms
Keyboard 50 ms

Each of these strategies will only consider the latency that is more than the budget threshold per interaction. Using the example page visit above, the over-budget amounts would be as follows:

Interaction Latency Latency over budget
Click 120 ms 20 ms
Click 20 ms 0 ms
Key press 60 ms 10 ms
Key press 80 ms 30 ms

Worst interaction latency over budget

The largest single interaction latency over budget. Using the above example, the score would be max(20, 0, 10, 30) = 30 ms.

Total interaction latency over budget

The sum of all interaction latencies over budget. Using the above example, the score would be (20 + 0 + 10 + 30) = 60 ms.

Average interaction latency over budget

The total over-budget interaction latency divided by the total number of interactions. Using the above example, the score would be (20 + 0 + 10 + 30) / 4 = 15 ms.

High quantile approximation

As an alternative to computing the largest interaction latency over budget, we also considered using a high quantile approximation, which should be fairer to web pages that have a lot of interactions and may be more likely to have large outliers. We've identified two potential high-quantile approximation strategies we like:

  • Option 1: Keep track of the largest and second-largest interactions over budget. After every 50 new interactions, drop the largest interaction from the previous set of 50 and add the largest interaction from the current set of 50. The final value will be largest remaining interaction over budget.
  • Option 2: Compute the largest 10 interactions over budget and choose a value from that list depending on the total number of interactions. Given N total interactions, select the (N / 50 + 1)th largest value, or the 10th value for pages with more than 500 interactions.

Measure these options in JavaScript

The following code example can be used to determine the values of the first three strategies presented above. Note that it's not yet possible to measure the total number of interactions on a page in JavaScript, so this example doesn't include the average interaction over budget strategy or the high quantile approximation strategies.

const interactionMap = new Map();

let worstLatency = 0;
let worstLatencyOverBudget = 0;
let totalLatencyOverBudget = 0;

new PerformanceObserver((entries) => {
  for (const entry of entries.getEntries()) {
    // Ignore entries without an interaction ID.
    if (entry.interactionId > 0) {
      // Get the interaction for this entry, or create one if it doesn't exist.
      let interaction = interactionMap.get(entry.interactionId);
      if (!interaction) {
        interaction = {latency: 0, entries: []};
        interactionMap.set(entry.interactionId, interaction);
      }
      interaction.entries.push(entry);

      const latency = Math.max(entry.duration, interaction.latency);
      worstLatency = Math.max(worstLatency, latency);

      const budget = entry.name.includes('key') ? 50 : 100;
      const latencyOverBudget = Math.max(latency - budget, 0);
      worstLatencyOverBudget = Math.max(
        latencyOverBudget,
        worstLatencyOverBudget,
      );

      if (latencyOverBudget) {
        const oldLatencyOverBudget = Math.max(interaction.latency - budget, 0);
        totalLatencyOverBudget += latencyOverBudget - oldLatencyOverBudget;
      }

      // Set the latency on the interaction so future events can reference.
      interaction.latency = latency;

      // Log the updated metric values.
      console.log({
        worstLatency,
        worstLatencyOverBudget,
        totalLatencyOverBudget,
      });
    }
  }
  // Set the `durationThreshold` to 50 to capture keyboard interactions
  // that are over-budget (the default `durationThreshold` is 100).
}).observe({type: 'event', buffered: true, durationThreshold: 50});

Feedback

We want to encourage developers to try out these new responsiveness metrics on their sites, and let us know if you discover any issue.

Email any general feedback on the approaches outlined here to the web-vitals-feedback Google group with "[Responsiveness Metrics]" in the subject line. We're really looking forward to hearing what you think!