The Privacy Sandbox is a series of proposals to satisfy third-party use cases without third-party cookies or other tracking mechanisms.
However, when you visit a website you may not be aware of the third parties involved and what they're doing with your data. Even publishers and web developers may not understand the entire third-party supply chain.
Ad selection, conversion measurement, and other use cases currently rely on establishing stable cross-site user identity. Historically this has been done with third-party cookies, but browsers have begun to restrict access to these cookies. There has also been an increase in the use of other mechanisms for cross-site user tracking, such as covert browser storage, device fingerprinting, and requests for personal information such as email addresses.
This is a dilemma for the web. How can legitimate third-party use cases be supported without enabling users to be tracked across sites?
In particular, how can websites fund content by enabling third parties to show ads and measure ad performance—but not allow individual users to be profiled? How can advertisers and site owners evaluate a user's authenticity without resorting to dark patterns such as device fingerprinting?
The way things work at the moment can be problematic for the entire web ecosystem, not just users. For publishers and advertisers, tracking identity and using a variety of non-standard third-party solutions can add to technical debt, code complexity, and data risk. Users, developers, publishers, and advertisers should be confident that the web is protecting user privacy choices.
Advertising is a core web business model for the internet, but advertising has to work for everyone. Which brings us to the Privacy Sandbox's mission: to create a thriving web ecosystem that is respectful of users and private by default.
The Privacy Sandbox introduces a set of privacy-preserving APIs to support business models that fund the open web in the absence of tracking mechanisms like third-party cookies.
The Privacy Sandbox APIs require web browsers to take on a new role. Rather than working with limited tools and protections, the APIs enable the user's browser to act on the user's behalf—locally, on their device—to protect the user's identifying information as they navigate the web. The APIs enable use cases such as ad selection and conversion measurement, without revealing individual private and personal information. In engineering terms a sandbox is a protected environment; a key principle of the Privacy Sandbox is that a user's personal information should be protected and not shared in a way that lets the user be identified across sites.
This is a shift in direction for browsers. The Privacy Sandbox's vision of the future has browsers providing specific tools to satisfy specific use cases, while preserving user privacy. A Potential Privacy Model for the Web sets out core principles behind the APIs:
In order to successfully transition away from third-party cookies the Privacy Sandbox initiative needs your support. The proposal explainers need feedback from developers as well as publishers, advertisers, and ad technology companies, to suggest missing use cases and share information about how to accomplish their goals in a privacy-safe way.
You can comment on the proposal explainers by filing issues against each repository:
You can dive into the API proposal explainers right away, and over the coming months we'll be publishing posts about each proposal individually.
Goal: Enable advertisers to measure ad performance.
There are two proposals for APIs where information about ad impressions and conversions stays inside the browser, and which only offer carefully controlled, privacy-safe ways for that information to get back to advertisers. This controlled, privacy-safe reporting is done in a way that does not enable linking of identities across sites or collection of user browsing history:
Other companies have been investigating similar ideas, such as Facebook's Cross-Browser Anonymous Conversion Reporting, Apple's Ad Click Attribution API and Brave's ad conversion attribution.
Goal: Enable advertisers to display ads relevant to users.
Relevant ads are more favorable to users and more profitable for publishers (the people running ad-supported websites). Third party ad selection tools make ad space more valuable to advertisers (the people who purchase ad space on websites) which in turn increases revenue for ad-supported websites and enables content to get created and published.
There are many ways to make ads relevant to the user, including the following:
First-party-data and contextual ad selection can be achieved without knowing anything about the user other than their activity within a site. These techniques don't require cross-site tracking.
Remarketing is usually done by using cookies or some other way to recognize people across websites: adding users to lists and then selecting specific ads to show them.
TURTLEDOVE: for remarketing.
FLoC: for interest-based audiences.
The API generates clusters of similar people, known as "cohorts". Data is generated locally on the user's browser, not by a third party. The browser shares the generated cohort data, but this cannot be used to identify or track individual users. This enables companies to select ads based on the behavior of people with similar browsing behaviour, while preserving privacy.
Goal: Reduce the amount of potentially identifiable data revealed by APIs and make access to potentially identifiable data controllable by users, and measurable.
Browsers have taken steps to deprecate third-party cookies, but techniques to identify and track the behaviour of individual users, known as fingerprinting, have continued to evolve. Fingerprinting uses mechanisms that users aren't aware of and can't control.
Fingerprinting surfaces such as the User-Agent header will be reduced in scope, and the data made available by alternative mechanisms such as Client Hints will be subject to Privacy Budget limits. Other surfaces, such as the device orientation and battery-level APIs, will be updated to keep the information exposed to a minimum.
Goal: Control access to IP addresses to reduce covert fingerprinting, and allow sites to opt out of seeing IP addresses in order to not consume privacy budget.
A user's IP address is the public 'address' of their computer on the internet, which in most cases is dynamically assigned by the network through which they connect to the internet. However, even dynamic IP addresses may remain stable over a significant period of time. Not surprisingly, this means that IP addresses are a significant source of fingerprint data.
Goal: Verify user authenticity without fingerprinting.
Anti-fraud protection is crucial for keeping users safe, and to ensure that advertisers and site owners can get accurate ad performance measurements. Advertisers and site owners must be able to distinguish between malicious bots and authentic users. If advertisers can't reliably tell which ad clicks are from real humans, they spend less, so site publishers get less revenue. Many third party services currently use techniques such as device fingerprinting to combat fraud.
Unfortunately, the techniques used to identify legitimate users and block spammers, fraudsters, and bots work in ways similar to fingerprinting techniques that damage privacy.
Goal: Enable entities to declare that related domain names are owned by the same first party.
Many organizations own sites across multiple domains. This can become a problem if restrictions are imposed on tracking user identity across sites that are seen as 'third-party' but actually belong to the same organization.
The Privacy Sandbox initiative needs your support. The API proposal explainers need feedback, in particular to suggest missing use cases and more-private ways to accomplish their goals.
A Potential Privacy Model for the Web sets out the core principles underlying the APIs.
The ratio of users who click on an ad, having seen it. (See also impression.)
A conversion attributed to an ad that was 'clicked'.
The completion of an action on an advertiser's website by a user who has previously interacted with an ad from that advertiser. For example, purchase of a product or sign-up for a newsletter after clicking an ad that links to the advertiser's site.
Share information about a dataset to reveal patterns of behaviour without revealing private information about individuals or whether they belong to the dataset.
See Top-Level Domain and eTLD.
'Effective' top level domains are defined by the Public Suffix List. For example:
Effective TLDs are what enable foo.appspot.com to be a different site from bar.appspot.com. The effective top-level domain (eTLD) in this case is appspot.com, and the whole site name (foo.appspot.com, bar.appspot.com) is known as the eTLD+1.
See also Top-Level Domain.
A measure of how much an item of data reveals individual identity.
Data entropy is measured in bits. The more that data reveals identity, the higher its entropy value.
Data can be combined to identify an individual, but it can be difficult to work out whether new data adds to entropy. For example, knowing a person is from Australia doesn't reduce entropy if you already know the person is from Kangaroo Island.
Techniques to identify and track the behaviour of individual users. Fingerprinting uses mechanisms that users aren't aware of and can't control. Sites such as Panopticlick and amiunique.org show how fingerprint data can be combined to identify you as an individual.
Something that can be used (probably in combination with other surfaces) to identify a particular user or device. For example, the
User-Agent HTTP request header provide access to a fingerprinting surface (the user agent string).
Resources from the site you're visiting. For example, the page you're reading is on the site web.dev and includes resources from that site. See also Third-party.
View of an ad. (See also click-through rate.)
A measure of anonymity within a data set. If you have k anonymity, you can't be distinguished from k-1 other individuals in the data set. In other words, k individuals have the same information (including you).
Arbitrary number used once only in cryptographic communication.
The origin of a request, including the server name but no path information. For example:
Some fingerprinting surfaces, such as user agent strings, IP addresses and accept-language headers, are available to every website whether the site asks for them or not. That means passive surfaces can easily consume a site's privacy budget.
The Privacy Sandbox initiative proposes replacing passive surfaces with active ways to get specific information, for example using Client Hints a single time to get the user's language rather than having an accept-language header for every response to every server.
The Privacy Sandbox proposal explainers are mostly about ads, so the kinds of publishers referred to are ones that put ads on their websites.
The total number of people who see an ad.
Advertising to people who've already visited your site. For example, an online store could show ads for a toy sale to people who previously viewed toys on their site.
See Top-Level Domain and eTLD.
See Fingerprinting surface and Passive surface.
Top-level domains such as .com and .org are listed in the Root Zone Database.
Note that some 'sites' are actually just subdomains. For example, translate.google.com and maps.google.com are just subdomains of google.com (which is the eTLD + 1).
It can be useful to access policy or other information about a host before making a request. For example, robots.txt tells web crawlers which pages to visit and which pages to ignore. IETF RFC8615 outlines a standardized way to make site-wide metadata accessible in standard locations in a /.well-known/ subdirectory. You can see a list of these at iana.org/assignments/well-known-uris/well-known-uris.xhtml.
Thanks to all those who helped with writing and reviewing this post.
Photo by Pierre Bamin on Unsplash.