rviscomi.dev

Breaking Up with Long Tasks or: how I learned to group loops and wield the yield

Rick Viscomi — Thu, 02 Jan 2025 13:32:01 +0000

This post originally appeared in the Web Performance Calendar on December 31, 2024. Try the demo.

Everything, On the Main Thread, All at Once

Arrays are in every web developer’s toolbox, and there are a dozen ways to iterate over them. Choose wrong, though, and all of that processing time will happen synchronously in one long, blocking task. The thing is, the most natural ways are the wrong ways. A simple for..of loop that processes each array item is synchronous by default, while Array methods like forEach and map can ONLY run synchronously. You almost certainly have a loop like this waiting to be optimized right now.

What’s the problem with long tasks, anyway? Every long task is a liability for an unresponsive user experience. If the user interacts with the page at just the right (or wrong) time, the browser won’t be able to handle that interaction until the task completes, which contributes to its input delay and slow Interaction to Next Paint (INP) performance. You can think of them like potholes on a road, forcing drivers to dodge them or risk damaging their cars—an unpleasant experience either way. Likewise, long tasks create unresponsive UIs, which can frustrate users and impact business metrics. They’re especially problematic when they’re not just coinciding with a user interaction, but in response to one. It’s no longer a matter of poor timing, because every click necessarily becomes a slow click.

Synchronously processing large arrays is one of the easiest ways to introduce long tasks. Even if the unit of work performed on each item in the array is reasonably fast, that time scales up linearly with the number of items. For example, if a CPU can complete one unit of work in 0.25 ms, and there are 1,000 units, the total processing time will be 250 ms, creating a long task and exceeding the threshold for a fast and responsive interaction. The key to breaking up the long task is to use the repetition to your advantage: each iteration of the loop is an opportunity to interrupt the processing and update the UI as needed.

Optimizing interaction responsiveness

Interrupting a task to allow the event loop to continue turning is known as yielding. There are a few ways to yield, with the classic approach being setTimeout with a delay of 0 ms, or the more modern alternative: scheduler.yield. It’s not currently supported in all browsers, so production-ready use cases will need a polyfill or fall back to setTimeout. In both cases, the trick to making the loop asynchronous is to use async/await. But there’s a catch.

If you’re using an Array method like forEach or map, you’ll quickly realize that this doesn’t work:

function handleClick() {
  items.forEach(async (item) => {
    await scheduler.yield();
    process(item);
  });
}

forEach doesn’t care if your callback function is asynchronous, it will plow through every item in the array without awaiting the yield. And it doesn’t matter which approach you use scheduler.yield or setTimeout. Apparently, this trips up a lot of developers, with this StackOverflow question having been viewed 2.4 million times since it was asked in 2016. The solution is in the top answer: switch to using a for..of loop instead.

async function handleClick() {
  for (const item of items) {
    await scheduler.yield();
    process(item);
  }
}

Instead of a monolithic long task blocking the click handler, now we’ve spread the work out into smaller tasks, responding to the interaction instantly. Problem solved, right?

Before we get into the major problem with this approach, you might have noticed the third most upvoted answer on that StackOverflow question, which recommends using the reduce method. In case you were tempted to cling to your functional programming tendencies and use reduce to break up the long task, think again.

function handleClick() {
  items.reduce(async (promise, item) => {
    await promise;
    await scheduler.yield();
    process(item);
  }, Promise.resolve());
}

This approach passes a promise along from one iteration to the next, which we can await before processing the next item. However, the issue with this is that reduce still plows through the entire array, synchronously queuing up each microtask. It’s not until the promises are fulfilled that it starts processing the items. In other words, even though the actual processing happens asynchronously, the amount of overhead is still enough to make the click handler slow.

Yielding within a for..of loop seems like the best way to achieve responsive interactions, but the problem is that we’re yielding on EVERY iteration of the loop. Let’s see what happens in browsers that don’t support scheduler.yield:

async function handleClick() {
  for (const item of items) {
    await Promise(resolve => setTimeout(resolve, 0));
    process(item);
  }
}

With setTimeout, the job takes over 2 minutes to complete! Compare that with scheduler.yield, which completes in about 1 second. The huge disparity comes down to the fact that these are nested timeouts. Unlike tasks deferred with scheduler.yield, browsers introduce a 4 ms gap between nested timeouts. But that’s not to say that using scheduler.yield on every iteration comes without a cost. Both approaches introduce some overhead, which can be mitigated with batching.

Optimizing total processing time

Batching is processing multiple iterations of the loop before yielding. The interesting problem is knowing when to yield. Let’s say you yield after processing every 100 items in the array. Did you solve the long task problem? Well, that depends on the CPU speed and how much time the average item takes to process, and both of those factors will vary depending on the client’s machine.

Rather than batching by number of items, a much better approach would be to batch items by the time it takes to process them. That way you can set a reasonable batch duration, say 50 ms, and yield only when it’s been at least that long since the last yield.

const BATCH_DURATION = 50;
let timeOfLastYield = performance.now();

function shouldYield() {
  const now = performance.now();
  if (now - timeOfLastYield > BATCH_DURATION) {
    timeOfLastYield = now;
    return true;
  }
  return false;
}

async function handleClick() {
  for (const item of items) {
    if (shouldYield()) {
      await scheduler.yield();
    }
    process(item);
  }
}

And here are the results with setTimeout:

The choice of batch duration is a tradeoff between minimizing the amount of time a user would spend waiting if they interacted with the page during the batch processing and the total time to process everything in the array. If you chunk up the work into 100 ms batches, that’s fewer interruptions and faster throughput, but at worst that’s also 100 ms of possible input delay, which is already half the budget for a fast interaction. On the other hand, with 10 ms batches, the worst case input delay is almost negligible, but more interruptions and slower throughput.

Your primary goal should be to unblock the interaction so that it feels responsive. That could just mean yielding so that you can update the UI with the first few items, or kicking off a loading animation. How often you yield during the rest of the processing time will depend on what your second priority is. Maybe nothing can be shown to the user until the entire array is processed, so your secondary goal should be to finish as quickly as possible. In that case you’ll want to go with a higher batch duration. Or maybe it’s ok to do the work in the background, but the UI should remain as smooth and responsive as possible. That lends itself to a smaller batch duration. When in doubt, 50 ms can be a good compromise, but it’s always a good idea to profile different approaches and pick what works best for your app.

We could stop there, but there’s one more thing that you might want to consider: frame rate. If you look closely at the screenshots above, you’ll notice thin green markers roughly corresponding to the paint cycle. These are custom timings using performance.mark to show when a requestAnimationFrame callback runs. There’s a curious difference in the frame rates of scheduler.yield and setTimeout.

Optimizing smoothness

To reiterate, if the work needs to be completed as quickly as possible, you should minimize the number of yields. But there are plenty of instances where it’s more important to provide visual feedback to the user that something is happening, like a progress indicator. Even if you’re not showing any progress to the user, you might still want to keep the frame rate reasonably fast to avoid janky animations or scrolling behavior. That’s where the preferential priority of scheduler.yield starts getting in the way.

Surprisingly, for batch durations under 100 ms, the frame rate is relatively flat around 10 FPS. However, setTimeout follows the expected curve, where more frames are painted as the batch duration decreases, approaching 60 FPS. Tasks scheduled with scheduler.yield are given preferential treatment, so even if you don’t do any batching at all, the browser will prioritize it over the next paint—but only up to a point.

With no batching, the average time between frames is 120 ms, far from the 16 ms you get with tasks scheduled with setTimeout. This means your frame rate will be a lame 8 FPS. If you’re cool with that, you can skip the rest of this section. But I know there are some people who can’t stand the thought of a laggy UI, so here are some tips.

const BATCH_DURATION = 1000 / 30; // 30 FPS
let timeOfLastYield = performance.now();

function shouldYield() {
  const now = performance.now();
  if (now - timeOfLastYield > BATCH_DURATION) {
    timeOfLastYield = now;
    return true;
  }
  return false;
}

async function handleClick() {
  for (const item of items) {
    if (shouldYield()) {
      await new Promise(requestAnimationFrame);
      await scheduler.yield();
    }
    process(item);
  }
}

First, change the batch duration to align with your desired frame rate. When it’s time to yield, before calling scheduler.yield, await a promise that resolves in a requestAnimationFrame callback. This effectively prevents any more work from happening until a frame is painted, ensuring a much smoother UI.

One gotcha is that the rAF callback won’t be fired as long as the tab is in the background. We can make a few adjustments to handle this edge case.

const BATCH_DURATION = 1000 / 30; // 30 FPS
let timeOfLastYield = performance.now();

function shouldYield() {
  const now = performance.now();
  if (now - timeOfLastYield > (document.hidden ? 500 : BATCH_DURATION)) {
    timeOfLastYield = now;
    return true;
  }
  return false;
}

async function handleClick() {
  for (const item of items) {
    if (shouldYield()) {
      if (document.hidden) {
        await new Promise(resolve => setTimeout(resolve, 1));
        timeOfLastYield = performance.now();
      } else {
        await Promise.race([
          new Promise(resolve => setTimeout(resolve, 100)),
          new Promise(requestAnimationFrame)
        ]);
        timeOfLastYield = performance.now();
        await scheduler.yield();
      }
    }
    process(item);
  }
}

The first change is to the shouldYield function, which now checks the page visibility. If the document is hidden, we can afford to yield in larger batches of 500 ms. Even though there is no user to experience a slow interaction, this still introduces a long task that could block the page from becoming visible if the user returns before the work is completed. document.hidden will continue to be true until the visibilitychange event can be handled, so we still need to yield periodically.

The second change is to the way we yield when the document is visible. We need to make sure that we’re not dependent on the rAF callback, so we can race it against a 100 ms timeout, borrowing from Vercel’s await-interaction-response approach. The 100 ms timeout will be throttled to 1000 ms while the tab is backgrounded, but after that, the timeout will fire and work can resume. Resetting the timeOfLastYield is good so that the first backgrounded batch can run for the full 500 ms.

The final change is to the way we yield when the document is hidden. We want the visibilitychange event to fire, but scheduler.yield will always preempt it, delaying the page from becoming visible until the work is completed. That might be worth more investigation because it feels like a bug, but we can work around it by switching to a timeout-based approach. As long as the document is hidden, work will be done in 500 ms batches with an additional 500 ms delay between each batch, adding up to the 1000 ms delay for throttled timeouts. That way, if the user returns before the work is completed, the visibility state will be updated and the regular batching logic will kick back in.

If all of this feels overly complicated, that’s probably because it is. If your application can withstand pausing array iteration while the tab is in the background, then you should skip this last part for the sake of simplicity. In any case, this was a fun exercise in pushing the limits of yielding.

Try it out

If you’d like to try out the different yielding strategies, you can use this demo. That’s also what I used to make the screenshots in this post.

Hopefully this was a useful overview of the “yield in a loop” problem and how I’d go about solving it. Feel free to let me know if I got something wrong, or if you know of a better way I’d love to hear about it. Good luck out there!

The post Breaking Up with Long Tasks or: how I learned to group loops and wield the yield appeared first on rviscomi.dev.

A faster web in 2024

Rick Viscomi — Fri, 10 Nov 2023 21:05:59 +0000

Note: This blog post is a companion to a presentation I gave at DevFest NYC on November 11, 2023.

The web is getting faster. In fact, according to HTTP Archive, more websites than ever before are passing the Core Web Vitals assessment, which looks at three metrics that represent different aspects of page performance: loading speed, interaction responsiveness, and layout stability.

Earlier this week, the Chrome team published a retrospective on the Web Vitals program that details some of the browser-level and ecosystem improvements that got us to this point. In the post, the Chrome team reported a savings of 10,000 years worth of waiting thanks to these improvements to Core Web Vitals.

So with 2024 around the corner, I wanted to take a closer look at what it’s going to take to carry this momentum forward and continue making the web even faster.

But there’s a catch. The metric we use to measure interaction responsiveness is changing in 2024. And this new metric is finding a lot of responsiveness issues that have been flying under the radar.

Will we be able to meet this new challenge? Will we be able to do so while keeping pace with the performance improvements of 2023? I think so, but we’re going to need to learn some new tricks.

Why care about web performance

This is a question I often take for granted. I’ve spent the last 11 years working on and advocating for web performance, and sometimes I naively assume that everyone—in my bubble, at least—gets it too.

If we’re going to continue making the web faster, we’re going to need more developers and business leaders to buy in to the idea that performance is a virtue worth doing something about.

So let’s talk about the “why” of web performance.

Last week, I had the chance to go to the performance.now() conference in Amsterdam. It’s become an annual pilgrimage for many of us in the web performance industry to convene and talk about pushing the web faster. One of the co-chairs and presenters at the conference was Tammy Everts, who perfectly summed up the answer to this question in the slide pictured above.

In 2016, Tammy published a book called Time is Money in which she lists a few reasons why a site owner might want to care about optimizing web performance:

Bounce rate
Cart size
Conversions
Revenue
Time on site
Page views
User satisfaction
User retention
Organic search traffic
Brand perception
Productivity
Bandwidth/CDN savings
Competitive advantage

Drawing from decades of experience and volumes of case studies and neuroscience research, Tammy makes the case that all of these things can be positively influenced by improving a site’s performance.

Tammy also worked with Tim Kadlec to create WPO stats, a site that catalogs years of web performance case studies directly linking web performance improvements to better business outcomes.

For example, in one case study, a Shopify site improved loading performance and layout stability by 25% and 61%, and saw a 4% decrease in bounce rate and 6% increase in conversions. In another case study, the Obama for America site improved performance by 60% and saw a corresponding increase in conversions of 14%. There are dozens of examples just like these.

Happy users make more money. If you think about the typical conversion funnel, fewer and fewer users make it deeper into the funnel. Optimizing performance effectively “greases the funnel” to drive conversions by giving users a more frictionless experience.

That’s the business impact, but even more fundamentally, performance is about the user experience.

How we’re doing

The modern web is the fastest it’s ever been, using Google’s measure of performance: Core Web Vitals. To put that in perspective, it’s helpful to look at how we got here.

At the start of 2023, 40.1% of websites passed the Core Web Vitals assessment for mobile user experiences. Since then, we’ve seen steady growth. As of September 2023, we’re at 42.5% of websites passing the Core Web Vitals assessment, an improvement of 2.4 percentage points, or 6.0%. This is a new high, representing an incredible amount of work by the entire web ecosystem.

This might seem like a glass half full / half empty situation. You could celebrate the positive story of nearly half of all websites having measurably good performance. Another equally valid way to look at it is that more than half of websites are not meeting the bar for performance.

We can have it both ways! It’s amazing that the web has improved so much, and at the same time we can push ourselves to continue this momentum into 2024.

Keeping pace

So, can we keep up the current pace of improvement and convert another 6% of websites to pass the assessment? I’d like to think that we can, but everything is about to change with the metric we use to assess page responsiveness.

Earlier this year, I wrote a blog post announcing that Interaction to Next Paint (INP) will become Google’s new responsiveness metric in the Core Web Vitals assessment, replacing First Input Delay (FID) in March 2024.

This is a very good change, as INP is much more effective at catching instances of poor responsiveness. As a result though, many fewer websites have good INP scores compared to FID, especially among mobile experiences.

In the Performance chapter of the 2022 Web Almanac, I wrote about what the Core Web Vitals pass rates would look like in a world with INP instead of FID.

For mobile experiences, only 31.2% of sites would pass the assessment, a drop of 8.4 percentage points (21.2%) from the FID standard. That was based on data from June 2022. How are we looking now?

Things are actually looking much better! The gap is all but closed on desktop, and mobile experiences are only trailing by 6 points (14.2%).

But the fact remains: pass rates will drop substantially once INP takes effect.

While it might seem like a step backwards at first, keep in mind that INP is giving us a much more accurate look at how real users are experiencing interaction responsiveness. Nothing about how the web is actually experienced is changing with INP—only our ability to measure it. In this case, a drop in the pass rates does not actually mean that the web is getting slower.

So I’m still optimistic that we’re going to see continued improvements in performance throughout 2024. We’re just going to have to recalibrate our expectations against the new baseline when INP hits the scene.

Regaining lost ground

FID is the oldest metric in the Core Web Vitals. It first appeared in the Chrome UX Report dataset in June 2018. As of today, only 5.8% of websites have any FID issues whatsoever on either desktop or mobile. So I think it’s fair to say that for the most part we haven’t really had to worry about interaction responsiveness.

INP challenges us to overcome five years of inertial complacency. To do it, we’re going to have to flex some web performance muscles we may not have used in a while, if ever. We’re going to have to break out some new tools.

A long task shown in the Chrome DevTools Performance panel
Source: Optimize Long Tasks on web.dev

We’re going to have to get very comfortable with this.

This is what a long task looks like in the Performance panel of Chrome DevTools. The red striping shows the amount of the task that exceeds the 50ms budget, making it a “long” task. If a user were to attempt to interact with the page at this time, the long task would block the page from responding, creating what the user (and the INP metric) would perceive to be a slow interaction.

The long task is broken up, as shown in the Chrome DevTools Performance panel
Source: Optimize Long Tasks on web.dev

The solution to this problem might require a web performance technique you’ve never tried before: breaking up long tasks. The same amount of work will get done eventually, but by adding yield points between major chunks of work, the page will be able to more quickly respond to any user interactions that happen during the task.

Chrome is experimenting with a couple of APIs in origin trials to help address problematic long tasks. The first is the scheduler.yield() API, which is designed to give developers more control over breaking up long tasks. It ensures that the work happens continuously, without other tasks cutting in.

Knowing which long tasks to break up is its own science. To help with this, Chrome is also experimenting with the Long Animation Frames API. Similar to the Long Tasks API, which reports when long tasks happen and for how long, the Long Animation Frames API reports on long rendering updates, which can be comprised of multiple tasks. Crucially, it also exposes much more actionable attribution info about the tasks, including the script source down to the character position in code.

Similar to tracking INP performance in analytics tools, developers could use the Long Animation Frames API to track why the INP was slow. In aggregate, this data can narrow down the root causes of common performance issues, saving developers from optimizing by trial and error.

These APIs aren’t stable yet, but they offer powerful new functionality to complement the existing suite of tools to optimize responsiveness. Even though it might feel like we’re playing catch-up just getting the pass rates back to where they were in a FID-centric assessment, the web is actually getting faster in the process!

The weakest link

It might seem like responsiveness will be the new bottleneck when INP takes over, but that’s actually not the case. Loading performance, as measured by the Largest Contentful Paint (LCP) metric, is and will still be the weakest link in the Core Web Vitals assessment.

Passing the Core Web Vitals assessment requires a site to be fast in all three metrics. So in order to continue the pace of improvement, we need to be looking at the metrics that need the most help.

54.2%

This is the percentage of websites with good LCP on mobile, compared to 64.1% and 76.0% for INP and CLS, according to HTTP Archive as of the September 2023 dataset.

As long as web performance has been a thing, developers have been talking about loading performance. Since the days of simple HTML applications, we’ve built up a lot of institutional knowledge around traditional techniques like backend performance and image optimization. But web pages have evolved a lot since then. They’ve become ever more complex with an increasing number of third party dependencies, richer media, and sophisticated techniques to render content on the client. Modern problems require modern solutions.

In 2022, Philip Walton introduced a new way of breaking down the time spent in LCP: the time to start receiving content on the client (TTFB), the time to start loading the LCP image (resource load delay), the time to finish loading the LCP image (resource load time), and the time until the LCP element is rendered (element render delay). By measuring which of these diagnostic metrics are slowest, we could focus our efforts on the optimizations that would most effectively improve LCP performance.

Conventional wisdom says that if you want your LCP image to appear sooner, you should optimize the image itself. This includes things like using a more efficient image format, caching it longer, resizing it smaller, and so on. In terms of the LCP diagnostic metrics, these things would only improve the resource load time. What about the rest?

Earlier I mentioned that last week I was at the performance.now() conference. Another one of the presenters was Estela Franco, who I collaborated with to share some brand new data sourced from real Chrome users on where this LCP time is typically spent.

Estela Franco presenting Chrome data on where LCP time is spent (November 2023)
Photo credit: Rick Viscomi

The photo above shows Estela’s slide with the LCP diagnostics as a percentage of the mean LCP time. Here’s the same data presented in milliseconds:

LCP score	Mean TTFB	Mean load delay	Mean load duration	Mean render delay
Good	410	400	80	230
Needs improvement	1,020	1,350	260	490
Poor	2,330	3,670	580	990

Breakdown of mean LCP diagnostic performance, grouped by LCP score (October 2023)
Source: Chrome 119 beta internal data

What’s perhaps most surprising about this data is that the resource load time (load duration) is already the fastest LCP diagnostic. The slowest part is actually the resource load delay. Therefore, the biggest opportunity to speed up slow LCP images is to load them sooner. To reiterate, the problem is less about how long the image takes to load, it’s that we’re not loading it soon enough.

Browsers are usually pretty good about discovering images in the markup and loading them reasonably quickly. So why is this an issue? Developers aren’t making LCP images discoverable.

I also wrote about the LCP discoverability problem in the 2022 Web Almanac. In it, I reported that 38.7% of mobile pages that have an image LCP are not making it statically discoverable. Even if we look at the latest data from HTTP Archive, this figure is still at 36.0%.

A big part of the problem continues to be lazy loading. I first wrote about the negative performance effects of LCP lazy loading in 2021. Lazy loading is more than just the native loading=lazy attribute; developers can also use JavaScript to dynamically set the image source. Last year I reported that 17.8% of pages with LCP images lazy load them in some way. According to the latest data from HTTP Archive, we’ve improved slightly to 16.8% of pages. It’s not impossible to have a fast LCP if you lazy load the image, but it definitely doesn’t help. LCP images should never be lazy loaded.

To be clear: lazy loading is good for performance, but only for non-critical content. Everything else, including LCP images, must be loaded as eagerly as possible.

A totally different problem is client-side rendering. If the only markup you’re sending to the client is a

container that gets rendered by JavaScript, the browser can’t load the LCP image until it’s eventually discovered in the DOM. A better (if controversial) solution is to switch to a server-side rendering model.

We also need to contend with LCP images declared in CSS background styles. For example, background-image: url("cat.gif“). These images will not be picked up by the browser’s preload scanner, and so they won’t get the benefit of loading as early as possible. Using a plain old element will get the job done.

In each of these cases, it’s also possible to use declarative preloading to make the images explicitly discoverable. In its simplest form, the code looks like this:

<link rel="preload" as="image" href="cat.gif">

Browsers will start loading the image sooner, but as long as its rendering is dependent on JavaScript or CSS, you may just be shifting from a load delay problem to a render delay problem. Eliminating these dependencies by putting the directly in HTML is the most straightforward way to avoid this delay.

New tricks

So far all of these LCP recommendations are basically to dismantle some of the complexities we’ve introduced into our applications: LCP lazy loading, client-side rendering, and LCP background images. There are also some relatively new, additive techniques we could use to improve performance or even to avoid these delays altogether.

In last year’s Web Almanac, I reported that 0.03% of pages use fetchpriority=high on their LCP images. This attribute hints to the browser that it should be loaded higher than its default priority. Images in Chrome are typically low priority by default, so this can give them a meaningful boost.

A lot has changed since last year! In the most recent HTTP Archive dataset, 9.25% of pages are now using fetchpriority=high on their LCP images. This is a massive leap, primarily due to WordPress adopting fetchpriority in version 6.3.

There are also a couple of techniques you can use to effectively get instant navigations: leveraging the back/forward cache and speculative loading.

When a user hits the back or forward buttons, a previously visited page is resumed. If the page was stored in the browser’s in-memory back/forward cache (also referred to as the bfcache) then it would appear to be loaded instantly. That LCP image will already be loaded and any of the JavaScript needed to render it will have already run. But not all pages are eligible for the cache. Things like unload listeners or Cache-Control: no-store directives currently* make pages ineligible for Chrome’s cache, even if those event listeners are set by third parties.

In the year since I last reported on bfcache eligibility for the Web Almanac, unload usage dropped from 17% to 12% of pages, and no-store usage dropped less significantly from 22% to 21%. So more pages are becoming eligible for this instant loading cache, which benefits all Core Web Vitals metrics.

The other instant navigation technique is called speculative loading. Using the experimental Speculation Rules API, developers can hint to the browser that an entire page should be prerendered if there’s a high likelihood that the user will navigate there next. The API also supports prefetching, which is a less aggressive way to improve loading performance. The drawback is that it only loads the document itself and none of its subresources, so it’s less likely to deliver on the “instant navigation” promise than prerender mode.

Here’s an example of speculative loading in action, from the MDN docs:

<script type="speculationrules">
  {
    "prerender": [
      {
        "source": "list",
        "urls": ["next3.html", "next4.html"]
      }
    ]
  }
script>

Both of these optimizations leverage different kinds of prerendering. With the bfcache, previously visited pages are preserved in memory so that revisiting them from the history stack can happen instantly. With speculative loading, the user doesn’t need to have ever visited the page for it to be prerendered. The net effect is the same: instant navigations.

The way forward

As more and more developers become aware of the challenges and opportunities to improve performance, I’m hopeful that we can see the continued growth of sites passing the Core Web Vitals assessment that we saw to date in 2023.

The first hurdle to clear is to even know that your site has a performance problem. PageSpeed Insights is the easiest way to run an assessment of your site, using public Core Web Vitals data from the Chrome UX Report. Even if you’re currently passing the assessment, pay close attention to your Interaction to Next Paint (INP) performance, as that will become the new standard for responsiveness in March 2024. You can also monitor your site’s performance using the Core Web Vitals report in Google Search Console. An even better way to understand your site’s performance is to measure it yourself, which enables you to get more granular diagnostic information about why it may be slow.

The next hurdle is to be able to invest time, effort, and maybe some money in improving performance. To do this, your organization first needs to care about web performance.

If your site has poor INP performance, there’s probably going to be a learning curve to start making use of all of the unfamiliar documentation, techniques, and tools to optimize long tasks. First Input Delay (FID) has given us something of a false sense of security when it comes to interaction responsiveness, but now we have an opportunity to find and fix the issues that would have otherwise been frustrating our users.

And even though INP is new and shiny, we can’t forget that Largest Contentful Paint (LCP) is the weakest link in the Core Web Vitals assessment. More sites struggle with LCP than any other metric. The way we’ve been building web apps over the years has changed, and so we need to adapt our optimization techniques accordingly by focusing beyond making images faster.

In lieu of the 2023 edition of the Web Almanac, I hope this post helps to demonstrate some of the progress we’ve seen this year and the room for improvement. The web is 6% faster, and that’s certainly worth celebrating. But most sites are still not fast—yet.

If we maintain the current rate of change of 6% per year, in 2026 more than half of sites will have good Core Web Vitals on mobile. So here’s my challenge. Let’s continue pushing our sites, our CMSs, our JavaScript frameworks, and our third party dependencies faster. Let’s continue to be advocates for better (if not instant) performance best practices in the web community. Here’s to the next 6% in 2024!

This post draws from the work of many people in the web performance community to whom I owe my thanks, including: Tammy Everts, Tim Kadlec, Estela Franco, Philip Walton, Mateusz Krzeszowiak, Annie Sullivan, Addy Osmani, Patrick Meenan, Jeremy Wagner, Barry Pollard, Brendan Kenny, and Felix Arntz.

The post A faster web in 2024 appeared first on rviscomi.dev.

You probably don’t need http-equiv meta tags

Rick Viscomi — Thu, 27 Jul 2023 04:03:41 +0000

Until recently, I just assumed you could put anything equivalent to an HTTP header in an http-equiv meta tag, and browsers would treat it like the header itself. Maybe you thought the same thing—why wouldn’t you, with a name like that.

But as it turns out, there are actually very few standard values that you can set here. And some values don’t even behave the same way as their header equivalents! What’s going on here and how are we supposed to use this thing?

Let’s take this as an example:

<meta
    http-equiv="X-UA-Compatible"
    content="IE=edge">

SERIOUSLY, WHAT DOES THIS DO? Why is it that if you load up any three random websites, one of them is bound to have this? And what does Internet Explorer have to do with anything anymore?

I could go on:

<meta
    http-equiv="content-type"
    content="text/html; charset=UTF-8">

Is this even necessary? It sure looks important—I wouldn’t want my web page to not be parsed as text/html.

Look, I know http-equiv meta tags of all things are not what most people get too worried about. It’s easy to copy-paste boilerplate markup from one project to the next because of some unquestioned folklore about what meta tags all HTML documents need. And if it works, it works, right?

Sure, but I’d argue that having a deeper understanding of what our code does and how to use it properly and effectively makes us all better developers. We can save ourselves the trouble of reaching for the wrong tool at first, only to find out later after burning time on debugging that maybe the http-equiv meta tag doesn’t do what we thought it does after all.

After a lot of researching and testing, I think I’m finally starting to get it. In this post I’ll share what the HTML spec says about http-equiv, how sites are actually using it in the wild, and argue why you probably* don’t need http-equiv meta tags.

If you’d like to skip right to the takeaways, I’ve put together a cheatsheet with all of my http-equiv keyword recommendations.

*Unless…

I’ll start by giving my best arguments for needing http-equiv. I can break it down into two use cases: the response headers are hard or impossible to configure, and there might be tags added at runtime.

The first argument is about simplicity. If you’re deploying a static site somewhere like GitHub Pages, you don’t have control over the server or its response headers. If you need to set a header, your only choice is to use http-equiv or to migrate your site somewhere else.

The other argument is more about flexibility. You might not know what you need until the page is already running on the client. Maybe a third party needs to add the http-equiv meta tag for some feature to work.

These reasons don’t apply equally to all http-equiv use cases, though. For example, some use cases unlock features that require server-side logic to work anyway, while others are only applicable when parsed directly from the static HTML.

You really need to understand what each value does in order to be sure that you’re using http-equiv correctly. So let’s go back and see where it all started and how it’s supposed to be used today.

A brief history of `http-equiv`

In 1994, Roy Fielding proposed a new HTML element:

HTTP-EQUIV

This attribute binds the element to an HTTP response header. It means that if you know the semantics of the HTTP response header named by this attribute, then you can process the contents based on a well-defined syntactic mapping, whether or not your DTD tells you anything about it. HTTP header names are not case sensitive. If not present, the attribute NAME should be used to identify this metainformation and it should not be used within an HTTP response header.

…

HTTP servers can read the content of the document HEAD to generate response headers corresponding to any elements defining a value for the attribute HTTP-EQUIV. This provides document authors a mechanism (not necessarily the preferred one) for identifying information which should be included in the response headers for an HTTP request.

…

One example of an inappropriate usage for the META element is to use it to define information that should be associated with an already existing HTML element, e.g.
<meta
  name="Title"
  content="The Etymology of Dunsel">
A second example of inappropriate usage is to name an HTTP-EQUIV equal to a response header that should normally only be generated by the HTTP server. Example names that are inappropriate include Server, Date, and Last-Modified—the exact list of inappropriate names is dependent on the particular server implementation. It is recommended that servers ignore any META elements which specify http-equivalents which are equal (case-insensitively) to their own reserved response headers.
https://www.w3.org/MarkUp/html-spec/Elements/META.html

This is useful context to understand what the original intent of http-equiv was and was not. It wasn’t meant to replace more semantic HTML elements like title. It also wasn’t for HTTP headers that would have otherwise been more appropriately set by the server.

Unlike their actual usage today, http-equiv meta tags were initially intended to be read by the server so that it can set the corresponding response headers. Nowadays though, they’re read by the user agent to parse and handle the document accordingly. The HTML spec calls these pragma directives.

Today, rather than permissively supporting any and all HTTP headers, the only standard keywords (specced values of the http-equiv attribute) are, in their entirety:

Keyword	Standard	Conforming
`content-language`
`content-type`
`default-style`
`refresh`
`set-cookie`
`x-ua-compatible`
`content-security-policy`

Standard http-equiv keywords, according to the HTML spec. (Source)

That’s a pretty short list! Not only that, but two of them are actually non-conforming, meaning that using them is actively discouraged or even completely ignored by the browser. That leaves us with only five conforming http-equiv keywords.

So, given that we’re all disciplined web developers, you wouldn’t expect to find anything improper in what people actually use this for, right?

Let’s look at the data.

For the rest of this post, I’ll be sharing stats from the June 2023 crawl of the public HTTP Archive dataset. Jump to the Methodology section for the queries and more info on the results.

Fun fact: the title used in Fielding’s example is “The Etymology of Dunsel”. Dunsel is a fictional word from the Star Trek universe meaning useless, superfluous, or unnecessary. It’s an ominously fitting description for a lot of today’s http-equiv usage, as you’ll see in the results below.

`http-equiv` adoption

Of the 17,389,897 websites in HTTP Archive’s June 2023 crawl, 11,722,086—67%— of them contain an http-equiv meta tag. That’s a huge proportion of the web, on par with a behemoth third party resource like Google Analytics.

Visualization of all observable websites (gray) and all websites that contain an http-equiv meta tag (red). Each pixel represents 100 websites.

Let’s dig deeper and see what the most popular http-equiv keywords are:

Rank	Keyword	Sites
1	`x-ua-compatible`	6,469,282
2	`content-type`	4,570,525
3	`origin-trial`	4,136,699
4	`content-language`	487,072
5	`cache-control`	441,570
6	`etag`	438,906
7	`x-wix-published-version`	438,666
8	`x-wix-application-instance-id`	438,665
9	`x-wix-meta-site-id`	438,665
10	`pragma`	390,770
11	`expires`	387,003
12	`accept-ch`	242,017
13	`content-style-type`	237,535
14	`x-dns-prefetch-control`	201,511
15	`content-script-type`	197,005
16	`imagetoolbar`	137,625
17	`content-security-policy`	100,552
18	`cleartype`	99,083
19	`refresh`	35,880
20	`keywords`	30,526
21	`last-modified`	19,515
22	`page-enter`	13,937
23	`x-xrds-location`	13,698
24	`description`	12,848
25	`msthemecompatible`	12,797
26	`encoding`	11,908
27	`x-rim-auto-match`	10,837
28	`reply-to`	9,788
29	`language`	9,292
30	`x-frame-options`	7,173
31	`content-location`	7,128
32	`copyright`	6,708
33	`window-target`	5,494
34	`page-exit`	5,335
35	`title`	5,050
36	`x-ua-compatiable`	4,867
37	`pics-label`	4,223
38	`screenorientation`	3,302
39	`mobile-agent`	3,255
40	`audience`	2,863
41	`access-control-allow-origin`	2,693
42	`cache`	2,632
43	`author`	2,535
44	`dc.description`	2,031
45	`robots`	1,757
46	`distribution`	1,592
47	`p3p`	1,553
48	`vary`	1,449
49	`x-webkit-csp`	1,426
50	`revisit-after`	1,335
51	`default-style`	1183
52	`charset`	1124
53	`x-xss-protection`	1017
54	`vw96.object type`	969
55	`x-content-type-options`	951
56	`no-cache`	931
57	`resource-type`	886
58	`referrer-policy`	830
59	`”content-language”`	808
60	`cache-directive: no-cache`	765
61	`pragma-directive: no-cache`	765
62	`x-content-security-policy`	726
63	`generator`	718
64	`pragram`	706
65	`date`	602
66	`lang`	553
67	`content-encoding`	534
68	[no value]	530
69	`x-pjax-version`	513
70	`”content-type”`	511
71	`strict-transport-security`	505
72	`delegate-ch`	492
73	`apple-mobile-web-app-capable`	475
74	`site-enter`	471
75	`creation-date`	469
76	`content-type-script`	451
77	`expire`	436
78	`keyword`	428
79	`”x-ua-compatible”`	427
80	`“expires”`	424
81	`onion-location`	416
82	`“cache-control”`	413
83	`set-cookie`	406
84	`“pragma”`	398
85	`rating`	398
86	`x-pjax-js-version`	369
87	`x-pjax-csp-version`	369
88	`x-pjax-css-version`	369
89	`”cache-control”`	360
90	`accept-encoding`	357
91	`permissions-policy`	356
92	`x-clacks-overhead`	346
93	`theme-color`	336
94	`x-ua-textlayoutmetrics`	333
95	`classification`	328
96	`site-exit`	299
97	`pragma-directive`	294
98	`format-detection`	279
99	`cache-directive`	277
100	`x-creyle-projectid`	274
	`…`	274
137	`content-security-policy-report-only`	114

Ordered list of the 100 most popular http-equiv values.
(HTTP Archive, June 2023. View full results. View query.)

The emoji in the last two columns indicate whether the value is either standard or conforming. It’s easy to see at a glance that there are a lot of non-standard values that are in use on thousands of websites.

The two most popular values do happen to be standard and conforming. But are they being used correctly? And are they actually necessary?

Let’s explore a few of the most interesting results.

Obsolete keywords

Rank	Keyword	Sites
1	`x-ua-compatible`	6,469,282
13	`content-style-type`	237,535
15	`content-script-type`	197,005
16	`imagetoolbar`	137,625
18	`cleartype`	99,083
22	`page-enter`	13,937
25	`msthemecompatible`	12,797
30	`x-frame-options`	7,173

A few of the obsolete keywords used in http-equiv.
(HTTP Archive, June 2023)

Many of the top http-equiv keywords are bygone features of the Internet Explorer era. Official support for Internet Explorer, whose latest major version (IE 11) was released in October 2013, ended in June 2022.

Internet Explorer adoption (Statcounter)

As of July 2023, IE adoption is at an all-time low. According to Statcounter, 0.2% of web traffic comes from users on IE.

So the question is, if Microsoft won’t even support IE users, why should you?

`x-ua-compatible`

The most popular keyword is x-ua-compatible. It’s standard, it conforms, but the spec is quite clear that it should have no effect in modern browsers:

In practice, this pragma encourages Internet Explorer to more closely follow the specifications.

For meta elements with an http-equiv attribute in the X-UA-Compatible state, the content attribute must have a value that is an ASCII case-insensitive match for the string “IE=edge“.

User agents are required to ignore this pragma.
WHATWG HTML spec

The spec also requires that the value be exactly IE=edge, so let’s see if the sites abide:

Value	Sites	Standard
`ie=edge`	5,047,117
`ie=edge,chrome=1`	1,258,286
`ie=emulateie7`	27,752
`ie=7;ie=9;ie=10;ie=11`	23,192
`ie=10`	22,676
`ie=9`	21,842
`chrome=1`	18,033
`ie=9,chrome=1`	12,401
`ie=9;ie=8;ie=7;ie=edge`	12,045
`ie=11`	10,671
`ie=8`	9,953
`ie=edge;chrome=1`	7,666

Top content values for x-ua-compatible.
(HTTP Archive, June 2023. View query.)

The 11 content values listed above make up 99% of the x-ua-compatible usage. The most popular one is ie=edge, which is the only standard value.

What’s the point, though? Are you testing your website in IE? Are you gracefully degrading down to decades-old HTML, CSS, and JavaScript? Is your website even remotely presentable to the 1 out of every 500 users on IE? No, your site is almost certainly not IE-compatible, and this meta tag isn’t a magic wand to make it so.

Modern web pages should not need x-ua-compatible.

For a more detailed history of x-ua-compatible, check out Almost (Standards) Doesn’t Count by Jay Hoffmann.

`content-style-type`, `content-script-type`

Everyone knows

To answer to Šime’s question, I had to trawl through the Chromium source. Here are the implemented keywords that I found:





Keyword Supported Standard
default-style 
refresh 
set-cookie 
content-language 
x-dns-prefetch-control 
x-frame-options 
accept-ch 
delegate-ch 
content-security-policy 
content-security-policy-report-only 
origin-trial 
content-type (source) 
http-equiv keywords implemented by Chromium browsers and their levels of support and standardization. (Source)



So the supported, non-standard keywords are: x-dns-prefetch-control, accept-ch, delegate-ch, and origin-trial.



It’s interesting to see that some keywords are implemented, but only to warn developers when found:




set-cookie triggers an error



x-frame-options triggers an error



content-security-policy-report-only logs a friendlier message




Chromium is not the only engine, and other browsers may handle http-equiv keywords differently. If you’d like to contribute keyword support for other browsers, please reach out in the comments, and I’d be happy to include it here.



Cheatsheet



If you take away one thing from this post, have it be this cheatsheet with my condensed recommendations for each keyword. You can refer to this list if you’re ever unsure whether you need a given http-equiv meta tag.



Keyword Recommendation
accept-ch  Use the Accept-CH HTTP header instead
cache-control  Use the Cache-Control HTTP header instead
cleartype  You don’t need it
content-language  Use the lang attribute instead
content-security-policy  Use the Content-Security-Policy HTTP header instead
content-security-policy-report-only  Use the Content-Security-Policy-Report-Only HTTP header instead
content-script-type  You don’t need it
content-style-type  You don’t need it
content-type  Use the Content-Type HTTP header instead, or the charset meta tag in the first 1024 bytes
default-style  Use modern CSS instead
delegate-ch  Use the Delegate-CH HTTP header instead
etag  Use the ETag HTTP header instead
expires  Use the Expires HTTP header instead
imagetoolbar  You don’t need it
last-modified  Use the Last-Modified HTTP header instead
msthemecompatible  You don’t need it
origin-trial  Prefer the HTTP header if you can, otherwise the meta tag is fine
page-enter  You don’t need it
pragma  Use the Cache-Control HTTP header instead
refresh  Use HTTP 3xx for redirects
 Use it for reloads as a noscript fallback
set-cookie  Use the Set-Cookie HTTP header instead
x-dns-prefetch-control  Use it if you have legitimate security or performance concerns
x-frame-options  Use the Content-Security-Policy HTTP header instead
x-ua-compatible  You don’t need it
x-wix-application-instance-id  Use generator meta tags instead
x-wix-meta-site-id  Use generator meta tags instead
x-wix-published-version  Use generator meta tags instead
Cheatsheet of all the http-equiv keywords explored in this post and my recommended actions.



If you don’t see the keyword you’re looking for in this list, chances are you’re not gonna need it. You’re almost always better off setting the HTTP header directly where possible. But just to be sure, test it out in a modern browser. You can also check the HTML spec—it’s a rapidly evolving living standard—or your favorite web developer documentation site for more info.



Based on all of my reading of the spec, analysis of the data, and interpretation of the Chromium source code, it’s clear to me that there’s a lot of unnecessary usage of the http-equiv meta tag. I hope you’re convinced that you probably don’t need most of these tags anymore, and you can use this new knowledge to write cleaner, more modern HTML.



Please reach out to me in the comments if there’s anything in this post that I can improve. I’m eager to continue building my understanding of how this all works and I’d be happy to update this post accordingly.



Appendix: Methodology



For all queries, I used the June 2023 crawl of the public HTTP Archive dataset. The queries do not distinguish between client type or root/secondary pages. For example, if http-equiv is used only on a site’s mobile secondary page, I count that site as using http-equiv. If a site uses it on all four combinations of desktop/mobile and root/secondary pages, the site is counted once towards the overall stats.



Popular website builders like the WordPress CMS make up about a third of the dataset and have a disproportionate effect on the stats. This is ok, as I’m trying to measure adoption across the whole web, regardless of whether the site owner added the tags themselves or their CMS did it.



Warning: these queries process between 6 and 14 TB each. Run at your own expense.



Querying http-equiv adoption



Show query

Keyword	Recommendation
`accept-ch`	Use the `Accept-CH` HTTP header instead
`cache-control`	Use the `Cache-Control` HTTP header instead
`cleartype`	You don’t need it
`content-language`	Use the `lang` attribute instead
`content-security-policy`	Use the `Content-Security-Policy` HTTP header instead
`content-security-policy-report-only`	Use the `Content-Security-Policy-Report-Only` HTTP header instead
`content-script-type`	You don’t need it
`content-style-type`	You don’t need it
`content-type`	Use the `Content-Type` HTTP header instead, or the `charset` meta tag in the first 1024 bytes
`default-style`	Use modern CSS instead
`delegate-ch`	Use the `Delegate-CH` HTTP header instead
`etag`	Use the `ETag` HTTP header instead
`expires`	Use the `Expires` HTTP header instead
`imagetoolbar`	You don’t need it
`last-modified`	Use the `Last-Modified` HTTP header instead
`msthemecompatible`	You don’t need it
`origin-trial`	Prefer the HTTP header if you can, otherwise the meta tag is fine
`page-enter`	You don’t need it
`pragma`	Use the `Cache-Control` HTTP header instead
`refresh`	Use HTTP 3xx for redirects Use it for reloads as a `noscript` fallback
`set-cookie`	Use the `Set-Cookie` HTTP header instead
`x-dns-prefetch-control`	Use it if you have legitimate security or performance concerns
`x-frame-options`	Use the `Content-Security-Policy` HTTP header instead
`x-ua-compatible`	You don’t need it
`x-wix-application-instance-id`	Use `generator` meta tags instead
`x-wix-meta-site-id`	Use `generator` meta tags instead
`x-wix-published-version`	Use `generator` meta tags instead



Origin trials and tribulations
Rick Viscomi — Wed, 05 Jul 2023 04:18:13 +0000

Origin trials are a way for developers to get early access to experimental web platform features. They’re carefully controlled “beta tests” run by browsers to ensure that the feature works and is worth more time on implementation and standardization. Check out Getting started with origin trials to learn more.



What’s interesting to me is seeing how sites are using origin trials across the web, with the help of the public HTTP Archive dataset. By detecting and extracting these origin trial tokens, we can decode them to understand more about: which experimental features the sites are enrolling in, when the trial expires, whether the functionality can be added by a third party, who that third party is, and whether the origin trial also extends to all subdomains of a site. There’s a lot of info packed into a token, and lots we can learn about how these origin trials are (mis)used on the web.



Which origin trials are used the most?



The first question worth answering is to find out what the most popular origin trials are. The results will be extremely volatile over time because origin trials are ephemeral by nature and heavily influenced by third party adoption. With that in mind, it’s still useful to get a baseline understanding of what origin trials are out in the wild.



At any given time you can browse the complete list of active origin trials for Chrome, Edge, and Firefox. Safari doesn’t have a program for origin trials.



So let’s see what’s being used the most as of today.



Feature Websites
PrivacySandboxAdsAPIs 3,697,720
WebViewXRequestedWithDeprecation 1,254,718
AttributionReportingCrossAppWeb 1,191,638
InterestCohortAPI 36,990
CoepCredentialless 8,591
PendingBeaconAPI 1,006
SendFullUserAgentAfterReduction 577
FedCmAutoReauthn 410
InstalledApp 371
PermissionsPolicyUnload 329
BackForwardCacheNotRestoredReasons 324
Top 11 features used by mobile pages as of June 2023. (Source: HTTP Archive)



So, what do these features do? Let’s look at each of the top 11 one by one. And let’s count it down in reverse order—why not.



The eleventh most used origin trial is BackForwardCacheNotRestoredReasons, found on 324 sites. This feature lists the reasons why a user didn’t get a page served from the back/forward cache (bfcache). I’m particularly excited about this one, because the bfcache is very effective at giving users the feeling of instant navigations. But eligibility can vary by user, and it’s otherwise impossible for a site owner to understand why they’re not seeing the bfcache restores that they’d expect. Unfortunately, tokens on 302 sites are configured incorrectly (wrong origin). That plus the 6 tokens that have expired means that only 16 sites are actually able to successfully collect data from the API—yikes! It’s expiring next month, so time is running out.



The tenth most used origin trial is PermissionsPolicyUnload, found on 329 sites. This one lets site owners disallow all scripts from running unload event handlers. It’s related to the previous origin trial, because a page with an unload handler is ineligible for the bfcache. It recently expired in June, so it’s not working anymore anyway, but it had a similar configuration issue in which 304 sites had an invalid origin. So any performance A/B tests they were hoping to run should not have worked and, given that the trial is expired, it’s too late to rerun them.



Aside: Looking more closely at the previous two origin trials, it seems what happened was that many TLD variations of google.com (.ca, .cl, .co.in, .co.jp…) incorrectly reused the origin trial token that was explicitly activated for the google.com origin. Let this be a lesson: always validate your origin trial tokens!



The ninth most used origin trial feature is the InstalledApp feature, which is found on 371 sites. It allows sites to determine whether the user has installed their corresponding app, using getInstalledRelatedApps(). The trial ended in January 2020, so all of these sites can save a few bytes by removing the expired tokens from the markup.



The eighth most used origin trial is FedCmAutoReauthn, on 410 sites. This is part of the Federated Credentials Management API (think “Sign in with…”) and the experimental feature is a streamlined re-authentication UX. In all but 14 cases, sites are inheriting this origin trial via a third party accounts.google.com script.



Number seven is SendFullUserAgentAfterReduction coming in at 577 sites. This feature helps sites migrate any of their dependencies off of the full User Agent (UA) string by delaying the newer, reduced UA string format. The UA string is being discouraged for browser/feature detection in favor of the User-Agent Client Hints API.



At number six we have PendingBeaconAPI, found on 1,006 sites. Per the origin trial page, it “allows website authors to specify one or more beacons (HTTP requests) that should be sent reliably when the page is being unloaded.” Oddly, even though the trial is active until September 19 (Chrome 115), all but one site are inheriting an expired origin trial token from a third party ad.doubleclick.net script. The only other site? Also expired. Maybe that’s intentional though, as the PendingBeacon API explainer doc warns that the API is being replaced with fetchLater() after feedback. 



The number five most used origin trial is CoepCredentialless, used on 8,591 sites. COEP, which stands for Cross-Origin-Embedder-Policy, enables cross-origin isolation. All but two instances are inherited from a third party itch.io script, and for the first time, 100% of the tokens pass validation checks!



Despite expiring nearly two years ago, the fourth most used origin trial is InterestCohortAPI, used on 36,990 sites. The major driver for its continued popularity seems to be a third party script from adroll.com, used by 36,607 sites, and a script by criteo.net on 321 sites. A wild airhorner.com also appears as one of the holdouts.



Ok now we’re getting into some serious levels of adoption. At number three we have AttributionReportingCrossAppWeb, on 1,191,638 sites. This is an extension of the Attribution Reporting API, which enables measurements like ad conversions in a privacy-preserving way without third-party cookies. This experiment allows attribution events on mobile web to be joinable with events in Android’s Privacy Sandbox. Only two third-party origins are responsible for its popularity: googletagmanager.com (1,184,189 sites), and googleadservices.com (9,380 sites), with some overlapping sites having both.



The second most used origin trial is WebViewXRequestedWithDeprecation, on 1,254,718 sites, which permits WebView requests to continue using the legacy X-Requested-With header while it’s being deprecated. Again, with lots of overlap, the top third-party origins to be using it are doubleclick.net (1,254,708 sites) and googlesyndication.com (1,253,735 sites).



And finally, the most used origin trial is PrivacySandboxAdsAPIs, used on 3,697,720 sites. It’s a collection of APIs to facilitate advertising: FLEDGE, Topics, Fenced Frames, and Attribution Reporting. A whopping 399,172 sites contain an expired token for this feature with the biggest offenders being criteo.com (225,711 sites), criteo.net (225,707 sites—hmm), and s.pinimg.com (171,817).



For reference, here’s the query used to generate all of these stats:


SELECT
  feature,
  COUNT(DISTINCT page) AS pages,
  COUNT(DISTINCT IF(is_expired_token, page, NULL)) AS expired,
  COUNT(DISTINCT IF(is_invalid_subdomain, page, NULL)) AS invalid_subdomain,
  COUNT(DISTINCT IF(is_invalid_third_party, page, NULL)) AS invalid_3p,
  COUNT(DISTINCT IF(source = 'meta', page, NULL)) AS meta_source,
  CAST(MAX(expiry) AS DATETIME) AS most_recent_expiry,
  APPROX_TOP_COUNT(origin, 1) AS top_origin
FROM
  `httparchive.scratchspace.origin_trials`
GROUP BY
  feature
ORDER BY
  pages DESC



Jump to the setup section below to see how the origin_trials scratch table was created.



How many pages directly or indirectly include an origin trial?


SELECT
  COUNT(DISTINCT page) AS pages
FROM
  `httparchive.scratchspace.origin_trials`



This is a quick and easy one to answer. 3,720,272 mobile sites include an origin trial as of June 2023. That’s about 22% of the 16,563,413 sites in the dataset.


SELECT
  COUNT(DISTINCT page) AS pages
FROM
  `httparchive.scratchspace.origin_trials`
WHERE
  NET.REG_DOMAIN(page) = NET.REG_DOMAIN(origin)



If we only look at sites that explicitly self-register for an origin trial, we find there are 10,427 of them. Since we’re only looking at sites under the same domain as the registrant on the origin trial, this will include lots of subdomain variants and “tenant” subdomains. For example, itch.io shows up 8,589 times in this list, facebook.com 155 times, and pinterest.com 68 times.



So what features are individual sites enabling for themselves?


SELECT
  feature,
  COUNT(DISTINCT page) AS pages,
  COUNT(DISTINCT IF(is_expired_token, page, NULL)) AS expired
FROM
  `httparchive.scratchspace.origin_trials`
WHERE
  NET.REG_DOMAIN(page) = NET.REG_DOMAIN(origin)
GROUP BY
  feature
ORDER BY
  pages DESC



Feature Websites
CoepCredentialless 8,591
PrivacySandboxAdsAPIs 368
SendFullUserAgentAfterReduction 280
PrivateNetworkAccessNonSecureContextsAllowed 210
UnrestrictedSharedArrayBuffer 146
InstalledApp 161
DisableDifferentOriginSubframeDialogSuppression 116
SmsReceiver 103
AllowSyncXHRInPageDismissal 43
Launch Handler 42
PriorityHintsAPI 39
Top 11 features directly used by mobile pages as of June 2023. (Source: HTTP Archive)



As mentioned earlier, itch.io is responsible for much of the CoepCredentialless usage, so that’s really an outlier. For the rest, no more than a few hundred sites are enrolling in any given first-party origin trial.



I won’t go through each feature again, but I do want to call out that a few of them look quite old. That raises a related question: what percent of first-party sites include an expired token?


SELECT
  COUNT(DISTINCT IF(is_expired_token, page, NULL)) / COUNT(DISTINCT page) AS pct_expired
FROM
  `httparchive.scratchspace.origin_trials`
WHERE
  NET.REG_DOMAIN(page) = NET.REG_DOMAIN(origin)



17%, or about one in five sites, sign up for an origin trial token and keep it around past its expiration.



Which third parties are injecting the most invalid tokens?



For a third party to make use of an origin trial, it needs to dynamically inject the token in a meta[http-equiv="Origin-Trial“] tag. Two main things can go wrong with this:




The token is expired



The token doesn’t have the thirdParty flag set




Tokens are intentionally short-lived. When they expire, they should be removed along with any experimental functionality.


SELECT
  origin,
  COUNT(DISTINCT page) AS pages,
  CAST(APPROX_QUANTILES(expiry, 1000)[OFFSET(500)] AS DATETIME) AS median_expiry
FROM
  `httparchive.scratchspace.origin_trials`
WHERE
  is_expired_token
GROUP BY
  origin
ORDER BY
  pages DESC



Origin Websites
https://criteo.net:443 226,027
https://criteo.com:443 225,711
https://s.pinimg.com:443 171,825
https://adroll.com:443 36,607
https://teads.tv:443 7,935
https://ladsp.com:443 4,079
https://ad.doubleclick.net:443 1,005
https://www.googletagmanager.com:443 848
https://doubleclick.net:443 681
https://googletagservices.com:443 671
https://googlesyndication.com:443 664
Top 11 origins injecting expired tokens as of June 2023. (Source: HTTP Archive)



We saw earlier that Criteo, an adtech company, was responsible for the expired PrivacySandboxAdsAPIs tokens. So it’s no surprise to see it topping the list here. But it is interesting to note that half of their tokens have been expired since November 2022.



s.pinimg.com is an image sharing hostname from Pinterest. Again, almost all of its expired tokens are for the PrivacySandboxAdsAPIs feature, with a median expiration date of April 2023.



We also saw adroll.com earlier as the main driver of the expired InterestCohortAPI feature.



Injecting an expired token isn’t the worst thing. Presumably, it lived long enough in production to be useful. Invalid third party tokens are something else, though.


SELECT
  origin,
  COUNT(DISTINCT page) AS pages
FROM
  `httparchive.scratchspace.origin_trials`
WHERE
  is_invalid_third_party
GROUP BY
  origin
ORDER BY
  pages DESC



Origin Websites
https://doubleclick.net:443 1,254,708
https://googlesyndication.com:443 1,253,735
https://themoneytizer.com:443 5,501
https://www.google.com:443 303
https://facebook.com:443 203
https://airbnb.com:443 87
https://m.youtube.com:443 52
https://pinterest.com:443 20
https://brands-id.shortlyst.com:443 16
https://m.redbus.in:443 15
Top 10 origins injecting invalid third party tokens as of June 2023. (Source: HTTP Archive)



These are the most prevalent third parties injecting origin trial tokens that will never work on a given site. Actually, let’s clarify one major assumption: the origin encoded in the token is assumed to be the same one injecting the token into its host page. It’s also possible that someone else (a fourth party?) is mishandling the origin’s token. For example, if I clone example.com onto my own site, all of their meta tag tokens will be invalidly served from rviscomi.dev.



Setting that aside, doubleclick.net (Google) and googlesyndication.com (also Google) are the two biggest origins that omit the thirdParty flag. In both cases, they’re missing the flag on their third party WebViewXRequestedWithDeprecation token.



Is that a big deal? I hope not. It means the X-Requested-With header would be unexpectedly stripped from WebView requests. Maybe in some cases that’s a load bearing header, but it doesn’t seem like a big deal.



I do worry about the origin trials more in my neck of the woods, like PermissionsPolicyUnload and BackForwardCacheNotRestoredReasons, which I highlighted earlier as being served by the wrong Google TLDs. At worst, someone might give up on them because they don’t seem to work or have any positive effect, all due to a misconfiguration.



Setting up the data table



Since I know I’ll be repeatedly querying the HTTP Archive dataset for the same origin trial info, the first thing I’ll do is preprocess the data and save it to a temporary table.


CREATE TEMP FUNCTION DECODE_ORIGIN_TRIAL(token STRING) RETURNS STRING DETERMINISTIC AS (
  REGEXP_EXTRACT(SAFE_CONVERT_BYTES_TO_STRING(SAFE.FROM_BASE64(token)), r'({".*)')
);

CREATE TEMP FUNCTION PARSE_ORIGIN_TRIAL(token STRING)
RETURNS STRUCT<
  token STRING,
  feature STRING,
  origin STRING,
  expiry TIMESTAMP,
  is_subdomain BOOL,
  is_third_party BOOL
> AS (
  STRUCT(
    token,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.feature') AS feature,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.origin') AS origin,
    TIMESTAMP_SECONDS(CAST(JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.expiry') AS INT64)) AS expiry,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.isSubdomain') = 'true' AS is_subdomain,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.isThirdParty') = 'true' AS is_third_party
  )
);


CREATE OR REPLACE TABLE `httparchive.scratchspace.origin_trials` AS

WITH valid_pages AS (
  SELECT
    page
  FROM
    `httparchive.all.requests`
  WHERE
    date = '2023-06-01' AND
    client = 'mobile' AND
    is_main_document AND
    NET.REG_DOMAIN(page) = NET.REG_DOMAIN(url)
),

ranks AS (
  SELECT
    rank,
    page
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2023-06-01' AND
    client = 'mobile'
),

origin_trials AS (
  SELECT
    rank,
    page,
    'meta' AS source,
    PARSE_ORIGIN_TRIAL(TRIM(token)).*
  FROM
    `httparchive.all.pages`,
    UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta,
    UNNEST(SPLIT(JSON_VALUE(meta, '$.content'), ',')) AS token
  JOIN
    valid_pages
  USING
    (page)
  WHERE
    date = '2023-06-01' AND
    client = 'mobile' AND
    is_root_page AND
    LOWER(JSON_VALUE(meta, '$.http-equiv')) = 'origin-trial'
UNION ALL
  SELECT
    rank,
    page,
    'header' AS source,
    PARSE_ORIGIN_TRIAL(header.value).*
  FROM
    `httparchive.all.requests`,
    UNNEST(response_headers) AS header
  JOIN
    valid_pages
  USING
    (page)
  JOIN
    ranks
  USING
    (page)
  WHERE
    date = '2023-06-01' AND
    client = 'mobile' AND
    is_root_page AND
    is_main_document AND
    LOWER(header.name) = 'origin-trial'
)

SELECT
  *,
  feature IS NULL AS is_invalid_token,
  expiry < CURRENT_TIMESTAMP() AS is_expired_token,
  NET.REG_DOMAIN(page) != NET.REG_DOMAIN(origin) AND is_third_party IS NOT TRUE AS is_invalid_third_party,
  NET.REG_DOMAIN(page) = NET.REG_DOMAIN(origin) AND NET.HOST(page) != NET.HOST(origin) AND is_subdomain IS NOT TRUE AS is_invalid_subdomain
FROM
  origin_trials



Warning: this query processes 1.97 TB, which costs about $10.



I’d love for more people to get comfortable writing their own queries over the HTTP Archive dataset, so let’s walk through what this query does.



Custom functions


CREATE TEMP FUNCTION DECODE_ORIGIN_TRIAL(token STRING) RETURNS STRING DETERMINISTIC AS (
  REGEXP_EXTRACT(SAFE_CONVERT_BYTES_TO_STRING(SAFE.FROM_BASE64(token)), r'({".*)')
);

CREATE TEMP FUNCTION PARSE_ORIGIN_TRIAL(token STRING)
RETURNS STRUCT<
  token STRING,
  feature STRING,
  origin STRING,
  expiry TIMESTAMP,
  is_subdomain BOOL,
  is_third_party BOOL
> AS (
  STRUCT(
    token,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.feature') AS feature,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.origin') AS origin,
    TIMESTAMP_SECONDS(CAST(JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.expiry') AS INT64)) AS expiry,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.isSubdomain') = 'true' AS is_subdomain,
    JSON_VALUE(DECODE_ORIGIN_TRIAL(token), '$.isThirdParty') = 'true' AS is_third_party
  )
);


I’m creating two BigQuery functions: DECODE_ORIGIN_TRIAL and PARSE_ORIGIN_TRIAL.  These two functions were adapted from the ot-decode project by fellow Chromie Sam Dutton, which itself takes inspiration from Jack‘s origin-trials-viewer project. Why didn’t I just use a JavaScript function in BigQuery? Some of the APIs weren’t available, and since it was relatively straightforward to port over to SQL, I opted to take advantage of the performance benefits.



Destination table


CREATE OR REPLACE TABLE httparchive.scratchspace.origin_trials AS


This takes the output of the query and saves it to a table in the httparchive project. This will only work if you have write access to the project, so feel free to comment it out or replace it with a table in your own BigQuery project if you’d like. The httparchive table is still publicly queryable, so feel free to explore it.



Subqueries


WITH valid_pages AS (
  SELECT
    page
  FROM
    `httparchive.all.requests`
  WHERE
    date = '2023-06-01' AND
    client = 'mobile' AND
    is_main_document AND
    NET.REG_DOMAIN(page) = NET.REG_DOMAIN(url)
),


The WITH clause aliases the output of the following subqueries so that I can reference them later. It’s not a temp table necessarily, but it makes the query a lot more readable.



The valid_pages subquery creates a verified subset of pages that don’t have a cross-domain redirect. If foo.com redirects to bar.com, we don’t want bar’s origin trials mistakenly attributed to foo. We’ll join the following queries with this one to ensure that we’re only looking at valid pages.


ranks AS (
  SELECT
    rank,
    page
  FROM
    `httparchive.all.pages`
  WHERE
    date = '2023-06-01' AND
    client = 'mobile'
),


This next ranks subquery simply gets the rank for each page in the mobile dataset, which will be used later.


origin_trials AS (
  SELECT
    rank,
    page,
    'meta' AS source,
    PARSE_ORIGIN_TRIAL(TRIM(token)).*
  FROM
    `httparchive.all.pages`,
    UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.almanac.meta-nodes.nodes')) AS meta,
    UNNEST(SPLIT(JSON_VALUE(meta, '$.content'), ',')) AS token
  JOIN
    valid_pages
  USING
    (page)
  WHERE
    date = '2023-06-01' AND
    client = 'mobile' AND
    is_root_page AND
    LOWER(JSON_VALUE(meta, '$.http-equiv')) = 'origin-trial'


The next subquery origin_trials extracts the origin trial metadata from all of the  tags on the page.



If you’re wondering why I didn’t parse the HTML response body directly in BigQuery, that approach would have only yielded the static  tags. Crucially, we need to also include dynamically injected tags from JavaScript. It’s possible (necessary, even) for third party scripts to inject an origin trial token into the  of the main page after it’s already loaded.



This query takes advantage of the almanac.meta-nodes custom metric, which runs document.querySelectorAll('head meta') on the page and returns the attributes of each tag. So we’re able to filter the results down to only the ones with http-equiv=origin-trial (case insensitive) and extract their content attributes, which contain the origin trial token.



Also note that I’m only querying the June 2023 dataset (latest at this time of writing) with is_root_page set, which limits the dataset to only the home pages. We could remove this and include secondary pages, for example the article page on a news website, but I don’t suspect the overall results would be much different. Maybe that’s something you can check, if you’re up for it 



The output of this query is the rank of the page, the URL of the page itself, the source of the token (which is meta in this case), and the parsed origin trial data from the custom function above.


UNION ALL
  SELECT
    rank,
    page,
    'header' AS source,
    PARSE_ORIGIN_TRIAL(header.value).*
  FROM
    `httparchive.all.requests`,
    UNNEST(response_headers) AS header
  JOIN
    valid_pages
  USING
    (page)
  JOIN
    ranks
  USING
    (page)
  WHERE
    date = '2023-06-01' AND
    client = 'mobile' AND
    is_root_page AND
    is_main_document AND
    LOWER(header.name) = 'origin-trial'
)


The origin_trials subquery continues with a UNION ALL clause to effectively mash together the results of the previous SELECT statement with this one.



The key difference here is that I’m looking at the requests table so that I can extract any Origin-Trial HTTP headers from the document response.



Amazingly, even though this table is 2.4 PB (yes, petabytes) with over 60 billion rows, this part of the query only processes about 12 GB of data. That’s thanks to the built-in partitioning and clustering, and the fact that the headers are directly accessible under the response_headers field rather than having to be parsed out of a massive JSON blob with other request metadata.



Also note that this is where the ranks data comes into play, because the requests table doesn’t annotate each page with its rank. Maybe it should!



There’s one thing missing from this query worth mentioning, and that is origin trial tokens set in HTTP headers outside of the main page. For example, ServiceWorkerBypassFetchHandlerForMainResource requires the token to be set on the response headers of the service worker itself. A quick check of the dataset found no instances of this particular origin trial, but it’s definitely possible that I’m overlooking some other valid tokens. For the sake of simplicity, this post only looks at origin trials set on the main page via first party headers or meta tags.



Validation


SELECT
  *,
  feature IS NULL AS is_invalid_token,
  expiry < CURRENT_TIMESTAMP() AS is_expired_token,
  NET.REG_DOMAIN(page) != NET.REG_DOMAIN(origin) AND is_third_party IS NOT TRUE AS is_invalid_third_party,
  NET.REG_DOMAIN(page) = NET.REG_DOMAIN(origin) AND NET.HOST(page) != NET.HOST(origin) AND is_subdomain IS NOT TRUE AS is_invalid_subdomain
FROM
  origin_trials


The rest of the query is where it all comes together. This is what determines what actually gets written to the output table.



I’m piping everything out of the origin_trials pseudo-table and adding a few validation flags:




If the feature field is null, there must have been a decoding/parsing error, so mark the token as invalid.



If the token’s expiry field is in the past, mark it as expired.



If the origin of the token doesn’t match up with the origin of the page, and there is no thirdParty flag set, mark it as invalid.



If the domain of the token does match up with the domain of the page, but the hosts don’t match (eg foo.example.com and bar.example.com), and there is no subdomain flag set, mark it as invalid.




The output table contains 20,000,950 rows and is 7.22 GB. So it’s definitely affordable for anyone to query and stay well under the 1 TB/month free tier on GCP. (Even still, always set up cost controls!)



The table is publicly accessible at httparchive.scratchspace.origin_trials. Feel free to run your own queries to explore the data, and share what you find with me on Twitter or down in the comments.



Be aware that this table—and every other table in the scratchspace dataset—is provided with no guarantees of uptime/correctness/maintenance. So it won’t be updated regularly with the latest month’s data, it may contain inaccurate or missing data, and it might even be deleted without notice.
The post Origin trials and tribulations appeared first on rviscomi.dev.



Querying parsed HTML in BigQuery
Rick Viscomi — Wed, 24 May 2023 20:26:30 +0000

A longstanding problem in the HTTP Archive dataset has been extracting insights from blobs of HTML in BigQuery. For example, take the source code of example.com:


html>
<html>
<head>
    <title>Example Domaintitle>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">...style>    
head>

<body>
<div>
    <h1>Example Domainh1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.p>
    <p><a href="https://www.iana.org/domains/example">More information...a>p>
div>
body>
html>



If you wanted to extract the link text in the last paragraph, you could do something relatively straightforward like this:


// 'More information...'
document.querySelector('p:last-child a').textContent;



But in BigQuery, we don’t have the luxury of the document object, querySelector, or textContent.



Instead, we’ve had to resort to unwieldy regular expressions like this:


# 'More information...'
SELECT
  REGEXP_EXTRACT(html, r']*>([^<]*)
') AS link_text
FROM
  body



It looks like it works, but it’s brittle.




What if there’s text or whitespace between the elements?



What if there are attributes on the paragraph?



What if there’s another p>a element pair earlier in the page?



What if the page uses uppercase tag names?




It goes on and on.



Using regular expressions for parsing HTML seems like a good idea at first, but it becomes a nightmare as you need to ramp it up to increasingly unpredictable inputs.



To avoid this headache in HTTP Archive analyses, we’ve resorted to custom metrics. These are executed on each page at runtime, and it’s been really effective. It enables us to analyze both the fully rendered page as well as the static HTML. But one big limitation with custom metrics is that they only work at runtime. So if we want to change the code or analyze an older dataset, we’re out of luck.



Cheerio



While looking for a way to implement capo.js in BigQuery to understand how pages in HTTP Archive are ordered, I came across the Cheerio library, which is a jQuery-like interface over an HTML parser.



It works beautifully.







To be able to use Cheerio in BigQuery, I first needed to build a JavaScript binary that I could load into a UDF. The post How To Use NPM Library in Google BigQuery UDF was a big help. I installed the Cheerio library locally and built it into a script with an exposed cheerio global variable using Webpack.



I uploaded the script to HTTP Archive’s Google Cloud Storage bucket. Then in BigQuery, I was able to side-load the script into the UDF with OPTIONS:


OPTIONS (library = 'gs://httparchive/lib/cheerio.js')


From there, the UDF was able to reference the cheerio object to parse the HTML input and generate the results. You can see it in action at capo.sql.



Querying HTML in BigQuery



Here’s a full demo of the example.com link text solution in action:


DECLARE example_html STRING;
SET example_html = '''



    Example Domain

    
    
    
        




    Example Domain
    This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.
    More information...



''';

CREATE TEMP FUNCTION getLinkText(html STRING)
RETURNS STRING LANGUAGE js
OPTIONS (library = 'gs://httparchive/lib/cheerio.js') AS '''
try {
  const $ = cheerio.load(html);
  return $('p:last-child a').text();
} catch (e) {
  return null;
}
''';

SELECT getLinkText(example_html) AS link_text



 Try it on BigQuery



The results show it working as expected:







Limitations



Cheerio is marketed as fast and efficient.



If you try to parse every HTML response body in HTTP Archive, the query will fail.



Fully built, the library is 331 KB. And due to the need for storing the HTML in memory to parse it, it consumes a lot of memory for large blobs.



To minimize the chances of OOM errors and speed up the query, one thing you can do is pare down the HTML to the area of interest using only the most basic regular expressions. Since the capo script is only concerned with the  element, I grabbed everything up to the closing  tag:


httparchive.fn.CAPO(
  REGEXP_EXTRACT(
    response_body,
    r'(?i)(.*)'
  )
)



If there are no natural “breakpoints” in the document for your use case, you could also consider restricting the input to a certain character length like WHERE LENGTH(response_body) < 1000. The query will work and it’ll run more quickly, but the results will be biased towards smaller pages.



Also, some documents may not be able to be parsed at all, resulting in exceptions. I added try/catch blocks to the UDF to intercept any exceptions and return null instead.



That also means that your query needs to be able to handle null values instead. For example, to get the first  element from the results, I needed to use SAFE_OFFSET instead of plain old OFFSET to avoid breaking the query on null values: elements[SAFE_OFFSET(0)]. 



Wrapping up



Cheerio is a really powerful new tool in the HTTP Archive toolbox. It unlocks new types of analysis that used to be prohibitively complex. In the capo.sql use case, I was able to extract insights about pages’ > elements that would have only been possible with custom metrics on future datasets.




I’m really interested to see what new insights are possible with this approach. Let me know your thoughts in the comments and how you plan to use it.
The post Querying parsed HTML in BigQuery appeared first on rviscomi.dev.

Feature	Websites
PrivacySandboxAdsAPIs	3,697,720
WebViewXRequestedWithDeprecation	1,254,718
AttributionReportingCrossAppWeb	1,191,638
InterestCohortAPI	36,990
CoepCredentialless	8,591
PendingBeaconAPI	1,006
SendFullUserAgentAfterReduction	577
FedCmAutoReauthn	410
InstalledApp	371
PermissionsPolicyUnload	329
BackForwardCacheNotRestoredReasons	324

Origin	Websites
https://criteo.net:443	226,027
https://criteo.com:443	225,711
https://s.pinimg.com:443	171,825
https://adroll.com:443	36,607
https://teads.tv:443	7,935
https://ladsp.com:443	4,079
https://ad.doubleclick.net:443	1,005
https://www.googletagmanager.com:443	848
https://doubleclick.net:443	681
https://googletagservices.com:443	671
https://googlesyndication.com:443	664

Origin	Websites
https://doubleclick.net:443	1,254,708
https://googlesyndication.com:443	1,253,735
https://themoneytizer.com:443	5,501
https://www.google.com:443	303
https://facebook.com:443	203
https://airbnb.com:443	87
https://m.youtube.com:443	52
https://pinterest.com:443	20
https://brands-id.shortlyst.com:443	16
https://m.redbus.in:443	15

rviscomi.dev

Breaking Up with Long Tasks or: how I learned to group loops and wield the yield

Everything, On the Main Thread, All at Once

Optimizing interaction responsiveness

Optimizing total processing time

Optimizing smoothness

Try it out

A faster web in 2024

Why care about web performance

How we’re doing

Keeping pace

Regaining lost ground

The weakest link

New tricks

The way forward

You probably don’t need http-equiv meta tags

*Unless…

A brief history of http-equiv

http-equiv adoption

Obsolete keywords

x-ua-compatible

content-style-type, content-script-type

Cheatsheet

Appendix: Methodology

Querying http-equiv adoption

Origin trials and tribulations

Which origin trials are used the most?

How many pages directly or indirectly include an origin trial?

Which third parties are injecting the most invalid tokens?

Setting up the data table

Custom functions

Destination table

Subqueries

Validation

Querying parsed HTML in BigQuery

Cheerio

Querying HTML in BigQuery

Example Domain

Limitations

Wrapping up

A brief history of `http-equiv`

`http-equiv` adoption

`x-ua-compatible`

`content-style-type`, `content-script-type`

Querying `http-equiv` adoption