URLPattern brings routing to the web platform

URLPattern brings routing to the web platform

An approach to standardizing common pattern matching use cases.

Updated

Background #

Routing is a key piece of every web application. At its heart, routing involves taking a URL, applying some pattern matching or other app-specific logic to it, and then, usually, displaying web content based on the result. Routing might be implemented in a number of ways: it's sometimes code running on a server that maps a path to files on disk, or logic in a single-page app that waits for changes to the current location and creates a corresponding piece of DOM to display.

While there is no one definitive standard, web developers have gravitated towards a common syntax for expressing URL routing patterns that share a lot in common with regular expressions, but with some domain-specific additions like tokens for matching path segments. Popular server-side frameworks like Express and Ruby on Rails use this syntax (or something very close to it), and JavaScript developers can use modules like path-to-regexp or regexpparam to add that logic to their own code.

URLPattern is an addition to the web platform that builds on the foundation created by these frameworks. Its goal is to standardize a routing pattern syntax, including support for wildcards, named token groups, regular expression groups, and group modifiers. URLPattern instances created with this syntax can perform common routing tasks, like matching against full URLs or a URL pathname, and returning information about the token and group matches.

Another benefit to providing URL matching directly in the web platform is that a common syntax can then be shared with other APIs that also need to match against URLs.

Syntax compatibility #

A guiding philosophy for URLPattern is to avoid reinvention. If you're already familiar with the routing syntax used in Express or Ruby on Rails, you shouldn't have to learn anything new. But given the slight divergences between syntaxes in popular routing libraries, something had to be chosen as the base syntax, and the designers of URLPattern decided to use the pattern syntax from path-to-regexp (though not its API surface) as the starting point.

This decision was made after close consultation with the current maintainer of path-to-regexp.

The best way to familiarize yourself with the core of the supported syntax is to refer to the documentation for path-to-regexp. You can read the documentation intended for publication on MDN in its current home on GitHub.

Additional features #

The syntax of URLPattern is a superset of what path-to-regexp supports, as URLPattern supports a uncommon feature among routing libraries: matching origins, including wildcards in hostnames. Most other routing libraries just deal with the pathname, and occasionally the search or hash portion of a URL. They never have to check the origin portion of a URL, as they're only used for same-origin routing within a self-contained web app.

Taking origins into account opens the door for additional use cases, like routing cross-origin requests inside of a service worker's fetch event handler. If you're only routing same-origin URLs, you can effectively ignore this additional feature and use URLPattern like other libraries.

Examples #

Constructing the pattern #

To create a URLPattern, pass its constructor either strings or an object whose properties contain info about the pattern to match against.

Passing an object offers the most explicit control over what pattern to use for matching each URL component. At its most verbose, this can look like

const p = new URLPattern({
protocol: 'https',
username: '',
password: '',
hostname: 'example.com',
port: '',
pathname: '/foo/:image.jpg',
search: '*',
hash: '*',
});

Providing an empty string for a property will only match if the corresponding part of the URL is not set. The wildcard * will match any value for a given portion of the URL.

The constructor offers several shortcuts for simpler usage. Completely omitting search and hash, or any other properties, is equivalent to setting them to the '*' wildcard. The above example could be simplified to

const p = new URLPattern({
protocol: 'https',
username: '',
password: '',
hostname: 'example.com',
port: '',
pathname: '/foo/:image.jpg',
});

As an additional shortcut, all of the information about the origin can be provided in a single property, baseURL, leading to

const p = new URLPattern({
pathname: '/foo/:image.jpg',
baseURL: 'https://example.com',
});

When passing a baseURL to the constructor, any aspects of the URL that are not provided are treated as if they were set to an empty string, not as a '*' wildcard. For instance, new URLPattern({baseURL: 'https://example.com/'}) will create a URLPattern with search, hash, and other unspecified properties set to ''.

All of these examples assume that your use case involves matching origins. If you're only interested in matching on the other portions of the URL, excluding the origin (as is the case for many "traditional" single-origin routing scenarios), then you can omit the origin information entirely, and just provide some combination of the pathname, search, and hash properties. As before, omitted properties will be treated as if they were set to the * wildcard pattern.

const p = new URLPattern({pathname: '/foo/:image.jpg'});

As an alternative to passing in an object to the constructor, you can provide either one or two strings. If one string is provided, it should represent a full URL pattern, including pattern information used to match the origin. If you provide two strings, the second string is used as a baseURL, and the first string is considered relative to that base.

Whether one string or two are provided, the URLPattern constructor will parse the full URL pattern, breaking it up into URL components, and map each portion of the larger pattern to the corresponding component. This means that under the hood, each URLPattern created with strings ends up being represented the same as an equivalent URLPattern created with an object. The strings constructor is just a shortcut, for those who prefer a less verbose interface.

const p = new URLPattern('https://example.com/foo/:image.jpg?*#*');

When using strings to create a URLPattern, there are a few caveats to keep in mind.

Leaving a property out when using an object to construct URLPattern is equivalent to providing a * wildcard for that property. When the full URL string pattern is parsed, if one of the URL components is missing a value, it's treated as if the component's property were set to '', which will only match when that component is empty.

When using strings, you need to explicitly include the wildcards if you want them to be used in the constructed URLPattern.

// p1 and p2 are equivalent.
const p1 = new URLPattern('/foo', location.origin);
const p2 = new URLPattern({
protocol: location.protocol,
hostname: location.hostname,
pathname: '/foo',
search: '',
hash: '',
});

// p3 and p4 are equivalent.
const p3 = new URLPattern('/foo?*#*', location.origin);
const p4 = new URLPattern({
protocol: location.protocol,
hostname: location.hostname,
pathname: '/foo',
});

You should also be aware that parsing a string pattern into its components is potentially ambiguous. There are characters, like :, that are found in URLs but also have special meaning in the pattern matching syntax. To avoid this ambiguity, the URLPattern constructor assumes that any of those special characters are part of a pattern, not part of the URL. If you want an ambiguous character to be interpreted as part of the URL, make sure to escape it with a \ character. For example, the literal URL about:blank should be escaped as 'about\\:blank' when provided as a string.

Using the pattern #

After constructing a URLPattern, you have two options for using it. The test() and exec() methods both take the same input and use the same algorithm to check for a match, and only differ in their return value. test() returns true when there's a match for the given input, and false otherwise. exec() returns detailed information about the match along with capture groups, or null if there is no match. The following examples demonstrate using exec(), but you could swap in test() for any of them if you only want a simple boolean return value.

One way to use the test() and exec() methods is by passing in strings. Similar to what the constructor supports, if a single string is provided, it should be a full URL, including the origin. If two strings are provided, the second string is treated as a baseURL value, and the first string is evaluated as relative to that base.

const p = new URLPattern({
pathname: '/foo/:image.jpg',
baseURL: 'https://example.com',
});

const result = p.exec('https://example.com/foo/cat.jpg');
// result will contain info about the successful match.
// const result = p.exec('/foo/cat.jpg', 'https://example.com')
// is equivalent, using the baseURL syntax.

const noMatchResult = p.exec('https://example.com/bar');
// noMatchResult will be null.

Alternatively, you can pass the same sort of object that the constructor supports, with properties that are set to just the portions of the URL you care about matching.

const p = new URLPattern({pathname: '/foo/:image.jpg'});

const result = p.exec({pathname: '/foo/:image.jpg'});
// result will contain info about the successful match.

When using exec() on a URLPattern that contains wildcards or tokens, the return value will give you information about what the corresponding values were in the input URL. This can save you the trouble of having to parse out those values yourself.

const p = new URLPattern({
hostname: ':subdomain.example.com',
pathname: '/*/:image.jpg'
});

const result = p.exec('https://imagecdn1.example.com/foo/cat.jpg');
// result.hostname.groups.subdomain will be 'imagecdn1'
// result.pathname.groups[0] will be 'foo', corresponding to *
// result.pathname.groups.image will be 'cat'

Anonymous and named groups #

When you pass a URL string to exec(), you get back a value telling your which portions matched all of the pattern's groups.

The return value has properties that correspond to the components of the URLPattern, like pathname. So if a group was defined as part of the pathname portion of the URLPattern, then the matches can be found in the return value's pathname.groups. The matches are represented differently depending on whether the corresponding pattern was an anonymous or named group.

You can use array indices to access the values for an anonymous pattern match. If there are multiple anonymous patterns, index 0 will represent the matching value for the left-most one, with 1 and further indices used for subsequent patterns.

When using named groups in a pattern, the matches will exposed as properties whose names correspond to each group name.

If you're familiar with the exec() method of a RegExp object, then the return values from URLPattern.exec() method will look very similar when it comes to representing anonymous and named groups.

Unicode support and normalization #

URLPattern supports Unicode characters in a few different ways.

  • Named groups, like :café, can contain Unicode characters. The rules used for valid JavaScript identifiers apply to named groups.

  • Text within a pattern will be automatically encoded according to the same rules used for URL encoding of that particular component. Unicode characters within pathname will be percent-encoded, so a pathname pattern like /café is normalized to /caf%C3%A9 automatically. Unicode characters in the hostname are automatically encoded using Punycode, rather than percent-encoding.

  • Regular expression groups must contain only ASCII characters. Regular expression syntax makes it difficult and unsafe to automatically encode Unicode characters in these groups. If you want to match a Unicode character in a regular expression group, you need to percent encode it manually, like (caf%C3%A9) to match café.

In addition to encoding Unicode characters, URLPattern also performs URL normalization. For example, /foo/./bar in the pathname component is collapsed to the equivalent /foo/bar.

When in doubt about how a given input pattern has been normalized, inspect the constructed URLPattern instance using your browser's DevTools.

Putting it all together #

The Glitch demo embedded below illustrates a core use case of URLPattern inside of a service worker's fetch event handler, mapping specific patterns to asynchronous functions that could generate a response to network requests. The concepts in this example could be applied to other routing scenarios as well, either server-side or client-side.

Browser support and polyfills #

URLPattern is enabled by default in Chrome and Edge version 95 and above.

The urlpattern-polyfill library provides a way to use the URLPattern interface in browsers or environments like Node which lack built-in support. If you use the polyfill, make sure that you use feature detection to ensure that you're only loading it if the current environment lacks support. Otherwise, you'll lose one of the key benefits of URLPattern: the fact that support environments don't have to download and parse additional code in order to use it.

if (!(globalThis && 'URLPattern' in globalThis)) {
// URLPattern is not available, so the polyfill is needed.
}

Feedback and future plans #

While the basic functionality for URLPattern has made it to Chrome and Edge, there are additions planned. Some aspects of URLPattern are still being developed, and there are a number of open questions about specific behaviors that may still be refined. We encourage you to try out URLPattern and provide any feedback via a GitHub issue.

Support for templating #

The path-to-regexp library provides a compile() function that effectively reverses the routing behavior. compile() takes a pattern and values for the token placeholders, and returns a string for a URL path with those values substituted in.

We hope to add this to URLPattern in the future, but it's not within scope for the initial release.

Enabling future web platform features #

Assuming URLPattern becomes an established part of the web platform, other features that could benefit from routing or pattern matching can build on top of it as a primitive.

There are ongoing discussions about using URLPattern for proposed features like service worker scope pattern matching, PWAs as file handlers, and speculative prefetching.

Acknowledgements #

See the original explainer document for a full list of acknowledgements.

The thumbnail image for this post is by Vic on Unsplash.

Last updated: Improve article