Sometimes you want to use a library that is only available as C or C++ code. Traditionally, this is where you give up. Well, not anymore, because now we have Emscripten and WebAssembly (or Wasm)!
The toolchain
I set myself the goal of working out how to compile some existing C code to Wasm. There's been some noise around LLVM's Wasm backend, so I started digging into that. While you can get simple programs to compile this way, the second you want to use C's standard library or even compile multiple files, you will probably run into problems. This led me to the major lesson I learned:
While Emscripten used to be a C-to-asm.js compiler, it has since matured to target Wasm and is in the process of switching to the official LLVM backend internally. Emscripten also provides a Wasm-compatible implementation of C's standard library. Use Emscripten. It carries a lot of hidden work, emulates a file system, provides memory management, wraps OpenGL with WebGL — a lot of things that you really don't need to experience developing for yourself.
While that might sound like you have to worry about bloat — I certainly worried — the Emscripten compiler removes everything that's not needed. In my experiments, the resulting Wasm modules are appropriately sized for the logic that they contain and the Emscripten and WebAssembly teams are working on making them even smaller in the future.
You can get Emscripten by following the instructions on their website or using Homebrew. If you are a fan of dockerized commands like me and don't want to install things on your system just to have a play with WebAssembly, there is a well-maintained Docker image that you can use instead:
$ docker pull trzeci/emscripten
$ docker run --rm -v $(pwd):/src trzeci/emscripten emcc <emcc options here>
Compiling something simple
Let's take the almost canonical example of writing a function in C that calculates the nth fibonacci number:
#include <emscripten.h>
EMSCRIPTEN_KEEPALIVE
int fib(int n) {
if(n <= 0){
return 0;
}
int i, t, a = 0, b = 1;
for (i = 1; i < n; i++) {
t = a + b;
a = b;
b = t;
}
return b;
}
If you know C, the function itself shouldn't be too surprising. Even if you don't know C but know JavaScript, you will hopefully be able to understand what's going on here.
emscripten.h
is a header file provided by Emscripten. We only need it so we
have access to the EMSCRIPTEN_KEEPALIVE
macro, but it
provides much more functionality.
This macro tells the compiler to not remove a function even if it appears
unused. If we omitted that macro, the compiler would optimize the function away
— nobody is using it after all.
Let's save all that in a file called fib.c
. To turn it into a .wasm
file we
need to turn to Emscripten's compiler command emcc
:
$ emcc -O3 -s WASM=1 -s EXTRA_EXPORTED_RUNTIME_METHODS='["cwrap"]' fib.c
Let's dissect this command. emcc
is Emscripten's compiler. fib.c
is our C
file. So far, so good. -s WASM=1
tells Emscripten to give us a Wasm file
instead of an asm.js file.
-s EXTRA_EXPORTED_RUNTIME_METHODS='["cwrap"]'
tells the compiler to leave the
cwrap()
function available in the JavaScript file — more on this function
later. -O3
tells the compiler to optimize aggressively. You can choose lower
numbers to decrease build time, but that will also make the resulting bundles
bigger as the compiler might not remove unused code.
After running the command, you should end up with a JavaScript file called
a.out.js
and a WebAssembly file called a.out.wasm
. The Wasm file (or
"module") contains our compiled C code and should be fairly small. The
JavaScript file takes care of loading and initializing our Wasm module and
providing a nicer API. If needed, it will also take care of setting up the
stack, the heap, and other functionality usually expected to be provided by the
operating system when writing C code. As such, the JavaScript file is a bit
bigger, weighing in at 19KB (~5KB gzip'd).
Running something simple
The easiest way to load and run your module is to use the generated JavaScript
file. Once you load that file, you will have a
Module
global
at your disposal. Use
cwrap
to create a JavaScript native function that takes care of converting parameters
to something C-friendly and invoking the wrapped function. cwrap
takes the
function name, return type and argument types as arguments, in that order:
<script src="a.out.js"></script>
<script>
Module.onRuntimeInitialized = _ => {
const fib = Module.cwrap('fib', 'number', ['number']);
console.log(fib(12));
};
</script>
If you run this code, you should see the "144" in the console, which is the 12th Fibonacci number.
The holy grail: Compiling a C library
Up until now, the C code we have written was written with Wasm in mind. A core use-case for WebAssembly, however, is to take the existing ecosystem of C libraries and allow developers to use them on the web. These libraries often rely on C's standard library, an operating system, a file system and other things. Emscripten provides most of these features, although there are some limitations.
Let's go back to my original goal: compiling an encoder for WebP to Wasm. The source for the WebP codec is written in C and available on GitHub as well as some extensive API documentation. That's a pretty good starting point.
$ git clone https://github.com/webmproject/libwebp
To start off simple, let's try to expose WebPGetEncoderVersion()
from
encode.h
to JavaScript by writing a C file called webp.c
:
#include "emscripten.h"
#include "src/webp/encode.h"
EMSCRIPTEN_KEEPALIVE
int version() {
return WebPGetEncoderVersion();
}
This is a good simple program to test if we can get the source code of libwebp to compile, as we don't require any parameters or complex data structures to invoke this function.
To compile this program, we need to tell the compiler where it can find
libwebp's header files using the -I
flag and also pass it all the C files of
libwebp that it needs. I'm going to be honest: I just gave it all the C
files I could find and relied on the compiler to strip out everything that was
unnecessary. It seemed to work brilliantly!
$ emcc -O3 -s WASM=1 -s EXTRA_EXPORTED_RUNTIME_METHODS='["cwrap"]' \
-I libwebp \
webp.c \
libwebp/src/{dec,dsp,demux,enc,mux,utils}/*.c
Now we only need some HTML and JavaScript to load our shiny new module:
<script src="/a.out.js"></script>
<script>
Module.onRuntimeInitialized = async (_) => {
const api = {
version: Module.cwrap('version', 'number', []),
};
console.log(api.version());
};
</script>
And we will see the correction version number in the output:
Get an image from JavaScript into Wasm
Getting the encoder's version number is great and all, but encoding an actual image would be more impressive, right? Let's do that, then.
The first question we have to answer is: How do we get the image into Wasm land?
Looking at the
encoding API of libwebp, it expects
an array of bytes in RGB, RGBA, BGR or BGRA. Luckily, the Canvas API has
getImageData()
,
that gives us an
Uint8ClampedArray
containing the image data in RGBA:
async function loadImage(src) {
// Load image
const imgBlob = await fetch(src).then((resp) => resp.blob());
const img = await createImageBitmap(imgBlob);
// Make canvas same size as image
const canvas = document.createElement('canvas');
canvas.width = img.width;
canvas.height = img.height;
// Draw image onto canvas
const ctx = canvas.getContext('2d');
ctx.drawImage(img, 0, 0);
return ctx.getImageData(0, 0, img.width, img.height);
}
Now it's "only" a matter of copying the data from JavaScript land into Wasm land. For that, we need to expose two additional functions. One that allocates memory for the image inside Wasm land and one that frees it up again:
EMSCRIPTEN_KEEPALIVE
uint8_t* create_buffer(int width, int height) {
return malloc(width * height * 4 * sizeof(uint8_t));
}
EMSCRIPTEN_KEEPALIVE
void destroy_buffer(uint8_t* p) {
free(p);
}
create_buffer
allocates a buffer for the RGBA image — hence 4 bytes per pixel.
The pointer returned by malloc()
is the address of the first memory cell of
that buffer. When the pointer is returned to JavaScript land, it is treated as
just a number. After exposing the function to JavaScript using cwrap
, we can
use that number to find the start of our buffer and copy the image data.
const api = {
version: Module.cwrap('version', 'number', []),
create_buffer: Module.cwrap('create_buffer', 'number', ['number', 'number']),
destroy_buffer: Module.cwrap('destroy_buffer', '', ['number']),
};
const image = await loadImage('/image.jpg');
const p = api.create_buffer(image.width, image.height);
Module.HEAP8.set(image.data, p);
// ... call encoder ...
api.destroy_buffer(p);
Grand Finale: Encode the image
The image is now available in Wasm land. It is time to call the WebP encoder to
do its job! Looking at the
WebP documentation, WebPEncodeRGBA
seems like a perfect fit. The function takes a pointer to the input image and
its dimensions, as well as a quality option between 0 and 100. It also allocates
an output buffer for us, that we need to free using WebPFree()
once we are
done with the WebP image.
The result of the encoding operation is an output buffer and its length. Because functions in C can't have arrays as return types (unless we allocate memory dynamically), I resorted to a static global array. I know, not clean C (in fact, it relies on the fact that Wasm pointers are 32bit wide), but to keep things simple I think this is a fair shortcut.
int result[2];
EMSCRIPTEN_KEEPALIVE
void encode(uint8_t* img_in, int width, int height, float quality) {
uint8_t* img_out;
size_t size;
size = WebPEncodeRGBA(img_in, width, height, width * 4, quality, &img_out);
result[0] = (int)img_out;
result[1] = size;
}
EMSCRIPTEN_KEEPALIVE
void free_result(uint8_t* result) {
WebPFree(result);
}
EMSCRIPTEN_KEEPALIVE
int get_result_pointer() {
return result[0];
}
EMSCRIPTEN_KEEPALIVE
int get_result_size() {
return result[1];
}
Now with all of that in place, we can call the encoding function, grab the pointer and image size, put it in a JavaScript-land buffer of our own, and release all the Wasm-land buffers we have allocated in the process.
api.encode(p, image.width, image.height, 100);
const resultPointer = api.get_result_pointer();
const resultSize = api.get_result_size();
const resultView = new Uint8Array(Module.HEAP8.buffer, resultPointer, resultSize);
const result = new Uint8Array(resultView);
api.free_result(resultPointer);
Depending on the size of your image, you might run into an error where Wasm can't grow the memory enough to accommodate both the input and the output image:
Luckily, the solution to this problem is in the error message! We just need to
add -s ALLOW_MEMORY_GROWTH=1
to our compilation command.
And there you have it! We compiled a WebP encoder and transcoded a JPEG image to
WebP. To prove that it worked, we can turn our result buffer into a blob and use
it on an <img>
element:
const blob = new Blob([result], { type: 'image/webp' });
const blobURL = URL.createObjectURL(blob);
const img = document.createElement('img');
img.src = blobURL;
document.body.appendChild(img);
Behold, the glory of a new WebP image!
Conclusion
It's not a walk in the park to get a C library to work in the browser, but once you understand the overall process and how the data flow works, it becomes easier and the results can be mind-blowing.
WebAssembly opens many new possibilities on the web for processing, number crunching and gaming. Keep in mind that Wasm is not a silver bullet that should be applied to everything, but when you hit one of those bottlenecks, Wasm can be an incredibly helpful tool.
Bonus content: Running something simple the hard way
If you want to try and avoid the generated JavaScript file, you might be able to. Let's go back to the Fibonacci example. To load and run it ourselves, we can do the following:
<!DOCTYPE html>
<script>
(async function () {
const imports = {
env: {
memory: new WebAssembly.Memory({ initial: 1 }),
STACKTOP: 0,
},
};
const { instance } = await WebAssembly.instantiateStreaming(
fetch('/a.out.wasm'),
imports,
);
console.log(instance.exports._fib(12));
})();
</script>
WebAssembly modules that have been created by Emscripten have no memory to work
with unless you provide them with memory. The way you provide a Wasm module with
anything is by using the imports
object — the second parameter of the
instantiateStreaming
function. The Wasm module can access everything inside
the imports object, but nothing else outside of it. By convention, modules
compiled by Emscripting expect a couple of things from the loading JavaScript
environment:
- Firstly, there is
env.memory
. The Wasm module is unaware of the outside world so to speak, so it needs to get some memory to work with. EnterWebAssembly.Memory
. It represents a (optionally growable) piece of linear memory. The sizing parameters are in "in units of WebAssembly pages", meaning the code above allocates 1 page of memory, with each page having a size of 64 KiB. Without providing amaximum
option, the memory is theoretically unbounded in growth (Chrome currently has a hard limit of 2GB). Most WebAssembly modules shouldn't need to set a maximum. env.STACKTOP
defines where the stack is supposed to start growing. The stack is needed to make function calls and to allocate memory for local variables. Since we don't do any dynamic memory management shenanigans in our little Fibonacci program, we can just use the entire memory as a stack, henceSTACKTOP = 0
.