Safe DOM manipulation with the Sanitizer API

The new Sanitizer API aims to build a robust processor for arbitrary strings to be safely inserted into a page.

Jack J
Jack J

Applications deal with untrusted strings all the time, but safely rendering that content as part of an HTML document can be tricky. Without sufficient care, it's easy to accidentally create opportunities for cross-site scripting (XSS) that malicious attackers may exploit.

To mitigate that risk, the new Sanitizer API proposal aims to build a robust processor for arbitrary strings to be safely inserted into a page. This article introduces the API, and explains its usage.

// Expanded Safely !!
$div.setHTML(`<em>hello world</em><img src="" onerror=alert(0)>`, new Sanitizer())

Escaping user input

When inserting user input, query strings, cookie contents, and so on, into the DOM, the strings must be escaped properly. Particular attention should be paid to DOM manipulation via .innerHTML, where unescaped strings are a typical source of XSS.

const user_input = `<em>hello world</em><img src="" onerror=alert(0)>`
$div.innerHTML = user_input

If you escape HTML special characters in the input string above or expand it using .textContent, alert(0) will not be executed. However, since <em> added by the user is also expanded as a string as it is, this method cannot be used in order to keep the text decoration in HTML.

The best thing to do here is not escaping, but sanitizing.

Sanitizing user input

The difference between escaping and sanitizing

Escaping refers to replacing special HTML characters with HTML Entities.

Sanitizing refers to removing semantically harmful parts (such as script execution) from HTML strings.

Example

In the previous example, <img onerror> causes the error handler to be executed, but if the onerror handler was removed, it would be possible to safely expand it in the DOM while leaving <em> intact.

// XSS 🧨
$div.innerHTML = `<em>hello world</em><img src="" onerror=alert(0)>`
// Sanitized ⛑
$div.innerHTML = `<em>hello world</em><img src="">`

To sanitize correctly, it is necessary to parse the input string as HTML, omit tags and attributes that are considered harmful, and keep the harmless ones.

The proposed Sanitizer API specification aims to provide such processing as a standard API for browsers.

Sanitizer API

The Sanitizer API is used in the following way:

const $div = document.querySelector('div')
const user_input = `<em>hello world</em><img src="" onerror=alert(0)>`
$div.setHTML(user_input, { sanitizer: new Sanitizer() }) // <div><em>hello world</em><img src=""></div>

However, the { sanitizer: new Sanitizer() } is the default argument. So it can be just like below.

$div.setHTML(user_input) // <div><em>hello world</em><img src=""></div>

It is worth noting that setHTML() is defined on Element. Being a method of Element, the context to parse is self-explanatory (<div> in this case), the parsing is done once internally, and the result is directly expanded into the DOM.

To get the result of sanitization as a string, you can use .innerHTML from the setHTML() results.

const $div = document.createElement('div')
$div.setHTML(user_input)
$div.innerHTML // <em>hello world</em><img src="">

Customize via configuration

The Sanitizer API is configured by default to remove strings that would trigger script execution. However, you can also add your own customizations to the sanitization process via a configuration object.

const config = {
  allowElements: [],
  blockElements: [],
  dropElements: [],
  allowAttributes: {},
  dropAttributes: {},
  allowCustomElements: true,
  allowComments: true
};
// sanitized result is customized by configuration
new Sanitizer(config)

The following options specify how the sanitization result should treat the specified element.

allowElements: Names of elements that the sanitizer should retain.

blockElements: Names of elements the sanitizer should remove, while retaining their children.

dropElements: Names of elements the sanitizer should remove, along with their children.

const str = `hello <b><i>world</i></b>`

$div.setHTML(str)
// <div>hello <b><i>world</i></b></div>

$div.setHTML(str, { sanitizer: new Sanitizer({allowElements: [ "b" ]}) })
// <div>hello <b>world</b></div>

$div.setHTML(str, { sanitizer: new Sanitizer({blockElements: [ "b" ]}) })
// <div>hello <i>world</i></div>

$div.setHTML(str, { sanitizer: new Sanitizer({allowElements: []}) })
// <div>hello world</div>

You can also control whether the sanitizer will allow or deny specified attributes with the following options:

  • allowAttributes
  • dropAttributes

allowAttributes and dropAttributes properties expect attribute match lists—objects whose keys are attribute names, and values are lists of target elements or the * wildcard.

const str = `<span id=foo class=bar style="color: red">hello</span>`

$div.setHTML(str)
// <div><span id="foo" class="bar" style="color: red">hello</span></div>

$div.setHTML(str, { sanitizer: new Sanitizer({allowAttributes: {"style": ["span"]}}) })
// <div><span style="color: red">hello</span></div>

$div.setHTML(str, { sanitizer: new Sanitizer({allowAttributes: {"style": ["p"]}}) })
// <div><span>hello</span></div>

$div.setHTML(str, { sanitizer: new Sanitizer({allowAttributes: {"style": ["*"]}}) })
// <div><span style="color: red">hello</span></div>

$div.setHTML(str, { sanitizer: new Sanitizer({dropAttributes: {"id": ["span"]}}) })
// <div><span class="bar" style="color: red">hello</span></div>

$div.setHTML(str, { sanitizer: new Sanitizer({allowAttributes: {}}) })
// <div>hello</div>

allowCustomElements is the option to allow or deny custom elements. If they're allowed, other configurations for elements and attributes still apply.

const str = `<custom-elem>hello</custom-elem>`

$div.setHTML(str)
// <div></div>

const sanitizer = new Sanitizer({
  allowCustomElements: true,
  allowElements: ["div", "custom-elem"]
})
$div.setHTML(str, { sanitizer })
// <div><custom-elem>hello</custom-elem></div>

API surface

Comparison with DomPurify

DOMPurify is a well-known library that offers sanitization functionality. The main difference between the Sanitizer API and DOMPurify is that DOMPurify returns the result of the sanitization as a string, which you need to write into a DOM element via .innerHTML.

const user_input = `<em>hello world</em><img src="" onerror=alert(0)>`
const sanitized = DOMPurify.sanitize(user_input)
$div.innerHTML = sanitized
// `<em>hello world</em><img src="">`

DOMPurify can serve as a fallback when the Sanitizer API is not implemented in the browser.

The DOMPurify implementation has a couple of downsides. If a string is returned, then the input string is parsed twice, by DOMPurify and .innerHTML. This double parsing wastes processing time, but can also lead to interesting vulnerabilities caused by cases where the result of the second parsing is different from the first.

HTML also needs context to be parsed. For example, <td> makes sense in <table>, but not in <div>. Since DOMPurify.sanitize() only takes a string as an argument, the parsing context had to be guessed.

The Sanitizer API improves upon the DOMPurify approach and is designed to eliminate the need for double parsing and to clarify the parsing context.

API status and browser support

The Sanitizer API is under discussion in the standardization process and Chrome is in the process of implementing it.

Step Status
1. Create explainer Complete
2. Create specification draft Complete
3. Gather feedback and iterate on design Complete
4. Chrome origin trial Complete
5. Launch Intent to Ship on M105

Mozilla: Considers this proposal worth prototyping, and is actively implementing it.

WebKit: See the response on the WebKit mailing list.

How to enable the Sanitizer API

Enabling via about://flags or CLI option

Chrome

Chrome is in the process of implementing the Sanitizer API. In Chrome 93 or later, you can try out the behavior by enabling about://flags/#enable-experimental-web-platform-features flag. In earlier versions of Chrome Canary and Dev channel, you can enable it via --enable-blink-features=SanitizerAPI and try it out right now. Check out the instructions for how to run Chrome with flags.

Firefox

Firefox also implements the Sanitizer API as an experimental feature. To enable it, set the dom.security.sanitizer.enabled flag to true in about:config.

Feature detection

if (window.Sanitizer) {
  // Sanitizer API is enabled
}

Feedback

If you try this API and have some feedback, we'd love to hear it. Share your thoughts on Sanitizer API GitHub issues and discuss with the spec authors and folks interested in this API.

If you find any bugs or unexpected behavior in Chrome's implementation, file a bug to report it. Select the Blink>SecurityFeature>SanitizerAPI components and share details to help implementers track the problem.

Demo

To see the Sanitizer API in action check out the Sanitizer API Playground by Mike West:

References


Photo by Towfiqu barbhuiya on Unsplash.