In this guide, aimed at web developers who want to benefit from WebAssembly, you'll learn how to make use of Wasm to outsource CPU-intensive tasks with the help of a running example. The guide covers everything from best practices for loading Wasm modules to optimizing their compilation and instantiation. It further discusses shifting the CPU-intensive tasks to Web Workers and looks into implementation decisions you'll be confronted with like when to create the Web Worker and whether to keep it permanently alive or spin it up when needed. The guide iteratively develops the approach and introduces one performance pattern at a time, until suggesting the best solution to the problem.
Assumptions
Assume you have a very CPU-intensive task that you want to outsource to
WebAssembly (Wasm) for its close-to-native performance. The CPU-intensive task
used as an example in this guide calculates the factorial of a number. The
factorial is the product of an integer and all the integers below it. For
example, the factorial of four (written as 4!
) is equal to 24
(that is,
4 * 3 * 2 * 1
). The numbers get big quickly. For example, 16!
is
2,004,189,184
. A more realistic example of a CPU-intensive task could be
scanning a barcode or
tracing a raster image.
A performant iterative (rather than recursive) implementation of a factorial()
function is shown in the following code sample written in C++.
#include <stdint.h>
extern "C" {
// Calculates the factorial of a non-negative integer n.
uint64_t factorial(unsigned int n) {
uint64_t result = 1;
for (unsigned int i = 2; i <= n; ++i) {
result *= i;
}
return result;
}
}
For the rest of the article, assume there's a Wasm module based on compiling
this factorial()
function with Emscripten in a file called factorial.wasm
using all
code optimization best practices.
For a refresher on how to do this, read
Calling compiled C functions from JavaScript using ccall/cwrap.
The following command was used to compile factorial.wasm
as
standalone Wasm.
emcc -O3 factorial.cpp -o factorial.wasm -s WASM_BIGINT -s EXPORTED_FUNCTIONS='["_factorial"]' --no-entry
In HTML, there's a form
with an input
paired with an output
and a submit
button
. These elements are referenced from JavaScript based on their names.
<form>
<label>The factorial of <input type="text" value="12" /></label> is
<output>479001600</output>.
<button type="submit">Calculate</button>
</form>
const input = document.querySelector('input');
const output = document.querySelector('output');
const button = document.querySelector('button');
Loading, compilation, and instantiation of the module
Before you can use a Wasm module, you need to load it. On the web, this happens
through the
fetch()
API. As you know that your web app depends on the Wasm module for the
CPU-intensive task, you should preload the Wasm file as early as possible. You
do this with a
CORS-enabled fetch
in the <head>
section of your app.
<link rel="preload" as="fetch" href="factorial.wasm" crossorigin />
In reality, the fetch()
API is asynchronous and you need to await
the
result.
fetch('factorial.wasm');
Next, compile and instantiate the Wasm module. There are temptingly named
functions called
WebAssembly.compile()
(plus
WebAssembly.compileStreaming()
)
and
WebAssembly.instantiate()
for these tasks, but, instead, the
WebAssembly.instantiateStreaming()
method compiles and instantiates a Wasm module directly from a streamed
underlying source like fetch()
—no await
needed. This is the most efficient
and optimized way to load Wasm code. Assuming the Wasm module exports a
factorial()
function, you can then use it straight away.
const importObject = {};
const resultObject = await WebAssembly.instantiateStreaming(
fetch('factorial.wasm'),
importObject,
);
const factorial = resultObject.instance.exports.factorial;
button.addEventListener('click', (e) => {
e.preventDefault();
output.textContent = factorial(parseInt(input.value, 10));
});
Shift the task to a Web Worker
If you execute this on the main thread, with truly CPU-intensive tasks, you risk blocking the entire app. A common practice is to shift such tasks to a Web Worker.
Restructure of the main thread
To move the CPU-intensive task to a Web Worker, the first step is to restructure
the application. The main thread now creates a Worker
, and, apart from that,
only deals with sending the input to the Web Worker and then receiving the
output and displaying it.
/* Main thread. */
let worker = null;
// When the button is clicked, submit the input value
// to the Web Worker.
button.addEventListener('click', (e) => {
e.preventDefault();
// Create the Web Worker lazily on-demand.
if (!worker) {
worker = new Worker('worker.js');
// Listen for incoming messages and display the result.
worker.addEventListener('message', (e) => {
output.textContent = e.result;
});
}
worker.postMessage({ integer: parseInt(input.value, 10) });
});
Bad: Task runs in Web Worker, but code is racy
The Web Worker instantiates the Wasm module and, upon receiving a message,
performs the CPU-intensive task and sends the result back to the main thread.
The problem with this approach is that instantiating a Wasm module with
WebAssembly.instantiateStreaming()
is an asynchronous operation. This means
that the code is racy. In the worst case, the main thread sends data when the
Web Worker isn't ready yet, and the Web Worker never receives the message.
/* Worker thread. */
// Instantiate the Wasm module.
// 🚫 This code is racy! If a message comes in while
// the promise is still being awaited, it's lost.
const importObject = {};
const resultObject = await WebAssembly.instantiateStreaming(
fetch('factorial.wasm'),
importObject,
);
const factorial = resultObject.instance.exports.factorial;
// Listen for incoming messages, run the task,
// and post the result.
self.addEventListener('message', (e) => {
const { integer } = e.data;
self.postMessage({ result: factorial(integer) });
});
Better: Task runs in Web Worker, but with possibly redundant loading and compiling
One workaround to the problem of asynchronous Wasm module instantiation is to move the Wasm module loading, compilation, and instantiation all into the event listener, but this would mean that this work would need to happen on every received message. With HTTP caching and the HTTP cache able to cache the compiled Wasm bytecode, this is not the worst solution, but there's a better way.
By moving the asynchronous code to the beginning of the Web Worker and not actually waiting for the promise to fulfill, but rather storing the promise in a variable, the program immediately moves on to the event listener part of the code, and no message from the main thread will be lost. Inside of the event listener, the promise can then be awaited.
/* Worker thread. */
const importObject = {};
// Instantiate the Wasm module.
// 🚫 If the `Worker` is spun up frequently, the loading
// compiling, and instantiating work will happen every time.
const wasmPromise = WebAssembly.instantiateStreaming(
fetch('factorial.wasm'),
importObject,
);
// Listen for incoming messages
self.addEventListener('message', async (e) => {
const { integer } = e.data;
const resultObject = await wasmPromise;
const factorial = resultObject.instance.exports.factorial;
const result = factorial(integer);
self.postMessage({ result });
});
Good: Task runs in Web Worker, and loads and compiles only once
The result of the static
WebAssembly.compileStreaming()
method is a promise that resolves to a
WebAssembly.Module
.
One nice feature of this object is that it can be transferred using
postMessage()
.
This means the Wasm module can be loaded and compiled just once in the main
thread (or even another Web Worker purely concerned with loading and compiling),
and then be transferred to the Web Worker responsible for the CPU-intensive
task. The following code shows this flow.
/* Main thread. */
const modulePromise = WebAssembly.compileStreaming(fetch('factorial.wasm'));
let worker = null;
// When the button is clicked, submit the input value
// and the Wasm module to the Web Worker.
button.addEventListener('click', async (e) => {
e.preventDefault();
// Create the Web Worker lazily on-demand.
if (!worker) {
worker = new Worker('worker.js');
// Listen for incoming messages and display the result.
worker.addEventListener('message', (e) => {
output.textContent = e.result;
});
}
worker.postMessage({
integer: parseInt(input.value, 10),
module: await modulePromise,
});
});
On the Web Worker side, all that remains is to extract the WebAssembly.Module
object and instantiate it. Since the message with the WebAssembly.Module
isn't
streamed, the code in the Web Worker now uses
WebAssembly.instantiate()
rather than the instantiateStreaming()
variant from before. The instantiated
module is cached in a variable, so the instantiation work only needs to happen
once upon spinning up the Web Worker.
/* Worker thread. */
let instance = null;
// Listen for incoming messages
self.addEventListener('message', async (e) => {
// Extract the `WebAssembly.Module` from the message.
const { integer, module } = e.data;
const importObject = {};
// Instantiate the Wasm module that came via `postMessage()`.
instance = instance || (await WebAssembly.instantiate(module, importObject));
const factorial = instance.exports.factorial;
const result = factorial(integer);
self.postMessage({ result });
});
Perfect: Task runs in inline Web Worker, and loads and compiles only once
Even with HTTP caching, obtaining the (ideally) cached Web Worker code and
potentially hitting the network is expensive. A common performance trick is to
inline the Web Worker and load it as a blob:
URL. This still requires the
compiled Wasm module to be passed to the Web Worker for instantiation, as the
contexts of the Web Worker and the main thread are different, even if they're
based on the same JavaScript source file.
/* Main thread. */
const modulePromise = WebAssembly.compileStreaming(fetch('factorial.wasm'));
let worker = null;
const blobURL = URL.createObjectURL(
new Blob(
[
`
let instance = null;
self.addEventListener('message', async (e) => {
// Extract the \`WebAssembly.Module\` from the message.
const {integer, module} = e.data;
const importObject = {};
// Instantiate the Wasm module that came via \`postMessage()\`.
instance = instance || await WebAssembly.instantiate(module, importObject);
const factorial = instance.exports.factorial;
const result = factorial(integer);
self.postMessage({result});
});
`,
],
{ type: 'text/javascript' },
),
);
button.addEventListener('click', async (e) => {
e.preventDefault();
// Create the Web Worker lazily on-demand.
if (!worker) {
worker = new Worker(blobURL);
// Listen for incoming messages and display the result.
worker.addEventListener('message', (e) => {
output.textContent = e.result;
});
}
worker.postMessage({
integer: parseInt(input.value, 10),
module: await modulePromise,
});
});
Lazy or eager Web Worker creation
So far, all the code samples spun up the Web Worker lazily on-demand, that is, when the button was pressed. Depending on your application, it can make sense to create the Web Worker more eagerly, for example, when the app is idle or even as part of the app's bootstrapping process. Therefore, move the Web Worker creation code outside of the button's event listener.
const worker = new Worker(blobURL);
// Listen for incoming messages and display the result.
worker.addEventListener('message', (e) => {
output.textContent = e.result;
});
Keep the Web Worker around or not
One question that you may ask yourself is whether you should keep the Web Worker permanently around, or recreate it whenever you need it. Both approaches are possible and have their advantages and disadvantages. For example, keeping a Web Worker permanently around may increase your app's memory footprint and make dealing with concurrent tasks harder, since you somehow need to map results coming from the Web Worker back to the requests. On the other hand, your Web Worker's bootstrapping code might be rather complex, so there could be a lot of overhead if you create a new one each time. Luckily this is something you can measure with the User Timing API.
The code samples so far have kept one permanent Web Worker around. The following code sample creates a new Web Worker ad hoc whenever needed. Note that you need to keep track of terminating the Web Worker yourself. (The code snippet skips error handling, but in case something goes wrong, be sure to terminate in all cases, success or failure.)
/* Main thread. */
let worker = null;
const modulePromise = WebAssembly.compileStreaming(fetch('factorial.wasm'));
const blobURL = URL.createObjectURL(
new Blob(
[
`
// Caching the instance means you can switch between
// throw-away and permanent Web Worker freely.
let instance = null;
self.addEventListener('message', async (e) => {
// Extract the \`WebAssembly.Module\` from the message.
const {integer, module} = e.data;
const importObject = {};
// Instantiate the Wasm module that came via \`postMessage()\`.
instance = instance || await WebAssembly.instantiate(module, importObject);
const factorial = instance.exports.factorial;
const result = factorial(integer);
self.postMessage({result});
});
`,
],
{ type: 'text/javascript' },
),
);
button.addEventListener('click', async (e) => {
e.preventDefault();
// Terminate a potentially running Web Worker.
if (worker) {
worker.terminate();
}
// Create the Web Worker lazily on-demand.
worker = new Worker(blobURL);
worker.addEventListener('message', (e) => {
worker.terminate();
worker = null;
output.textContent = e.data.result;
});
worker.postMessage({
integer: parseInt(input.value, 10),
module: await modulePromise,
});
});
Demos
There are two demos for you to play with. One with an
ad hoc Web Worker
(source code)
and one with a
permanent Web Worker
(source code).
If you open the Chrome DevTools and check the Console, you can see the User
Timing API logs that measure the time it takes from the button click to the
displayed result on the screen. The Network tab shows the blob:
URL
request(s). In this example, the timing difference between ad hoc and permanent
is about 3×. In practice, to the human eye, both are indistinguishable in this
case. The results for your own real life app will most likely vary.
Conclusions
This post has explored some performance patterns for dealing with Wasm.
- As a general rule, prefer the streaming methods
(
WebAssembly.compileStreaming()
andWebAssembly.instantiateStreaming()
) over their non-streaming counterparts (WebAssembly.compile()
andWebAssembly.instantiate()
). - If you can, outsource performance-heavy tasks in a Web Worker, and do the Wasm
loading and compiling work only once outside of the Web Worker. This way, the
Web Worker only needs to instantiate the Wasm module it receives from the main
thread where the loading and compiling happened with
WebAssembly.instantiate()
, which means the instance can be cached if you keep the Web Worker around permanently. - Measure carefully whether it makes sense to keep one permanent Web Worker around forever, or to create ad hoc Web Workers whenever they are needed. Also think when it is the best time to create the Web Worker. Things to take into consideration are memory consumption, the Web Worker instantiation duration, but also the complexity of possibly having to deal with concurrent requests.
If you take these patterns into account, you're on the right track to optimal Wasm performance.
Acknowledgements
This guide was reviewed by Andreas Haas, Jakob Kummerow, Deepti Gandluri, Alon Zakai, Francis McCabe, François Beaufort, and Rachel Andrew.