How do you integrate WebAssembly into this setup? In this article we are going to work this out with C/C++ and Emscripten as an example.
WebAssembly (wasm) is often framed as either a performance primitive or a way to run your existing C++ codebase on the web. With squoosh.app, we wanted to show that there is at least a third perspective for wasm: making use of the huge ecosystems of other programming languages. With Emscripten, you can use C/C++ code, Rust has wasm support built in, and the Go team is working on it, too. I'm sure many other languages will follow.
In these scenarios, wasm is not the centerpiece of your app, but rather a puzzle piece: yet another module. Your app already has JavaScript, CSS, image assets, a web-centric build system and maybe even a framework like React. How do you integrate WebAssembly into this setup? In this article we are going to work this out with C/C++ and Emscripten as an example.
Docker
I have found Docker to be invaluable when working with Emscripten. C/C++ libraries are often written to work with the operating system they are built on. It is incredibly helpful to have a consistent environment. With Docker you get a virtualized Linux system that is already set up to work with Emscripten and has all the tools and dependencies installed. If something is missing, you can just install it without having to worry about how it affects your own machine or your other projects. If something goes wrong, throw the container away and start over. If it works once, you can be sure that it will continue to work and produce identical results.
The Docker Registry has an Emscripten image by trzeci that I have been using extensively.
Integration with npm
In the majority of cases, the entry point to a web project is npm's
package.json. By convention, most projects can be built with npm install &&
npm run build.
In general, the build artifacts produced by Emscripten (a .js and a .wasm
file) should be treated as just another JavaScript module and just another
asset. The JavaScript file can be handled by a bundler like webpack or rollup,
and the wasm file should be treated like any other bigger binary asset, like
images.
As such, the Emscripten build artifacts need to be built before your "normal" build process kicks in:
{
    "name": "my-worldchanging-project",
    "scripts": {
    "build:emscripten": "docker run --rm -v $(pwd):/src trzeci/emscripten
./build.sh",
    "build:app": "<the old build command>",
    "build": "npm run build:emscripten && npm run build:app",
    // ...
    },
    // ...
}
The new build:emscripten task could invoke Emscripten directly, but as
mentioned before, I recommend using Docker to make sure the build environment is
consistent.
docker run ... trzeci/emscripten ./build.sh tells Docker to spin up a new
container using the trzeci/emscripten image and run the ./build.sh command.
build.sh is a shell script that you are going to write next! --rm tells
Docker to delete the container after it's done running. This way, you don't build
up a collection of stale machine images over time. -v $(pwd):/src means that
you want Docker to "mirror" the current directory ($(pwd)) to /src inside
the container. Any changes you make to files in the /src directory inside the
container will be mirrored to your actual project. These mirrored directories
are called "bind mounts".
Let's take a look at build.sh:
#!/bin/bash
set -e
export OPTIMIZE="-Os"
export LDFLAGS="${OPTIMIZE}"
export CFLAGS="${OPTIMIZE}"
export CXXFLAGS="${OPTIMIZE}"
echo "============================================="
echo "Compiling wasm bindings"
echo "============================================="
(
    # Compile C/C++ code
    emcc \
    ${OPTIMIZE} \
    --bind \
    -s STRICT=1 \
    -s ALLOW_MEMORY_GROWTH=1 \
    -s MALLOC=emmalloc \
    -s MODULARIZE=1 \
    -s EXPORT_ES6=1 \
    -o ./my-module.js \
    src/my-module.cpp
    # Create output folder
    mkdir -p dist
    # Move artifacts
    mv my-module.{js,wasm} dist
)
echo "============================================="
echo "Compiling wasm bindings done"
echo "============================================="
There's a lot to dissect here!
set -e puts the shell into "fail fast" mode. If any commands in the script
return an error, the entire script gets aborted immediately. This can be
incredibly helpful as the last output of the script will always be a success
message or the error that caused the build to fail.
With the export statements, you define the values of a couple of environment
variables. They allow you to pass additional command-line parameters to the C
compiler (CFLAGS), the C++ compiler (CXXFLAGS), and the linker (LDFLAGS).
They all receive the optimizer settings via OPTIMIZE to make sure that
everything gets optimized the same way. There are a couple of possible values
for the OPTIMIZE variable:
- -O0: Don't do any optimization. No dead code is eliminated, and Emscripten does not minify the JavaScript code it emits, either. Good for debugging.
- -O3: Optimize aggressively for performance.
- -Os: Optimize aggressively for performance and size as a secondary criterion.
- -Oz: Optimize aggressively for size, sacrificing performance if necessary.
For the web, I mostly recommend -Os.
The emcc command has a myriad of options of its own. Note that emcc is
supposed to be a  "drop-in replacement for compilers like GCC or clang". So all
flags that you might know from GCC will most likely be implemented by emcc as
well. The -s flag is special in that it allows us to configure Emscripten
specifically. All available options can be found in Emscripten's
settings.js,
but that file can be quite overwhelming. Here's a list of the Emscripten flags
that I think are most important for web developers:
- --bindenables embind.
- -s STRICT=1drops support for all deprecated build options. This ensures that your code builds in a forward compatible manner.
- -s ALLOW_MEMORY_GROWTH=1allows memory to be automatically grown if necessary. At the time of writing, Emscripten will allocate 16MB of memory initially. As your code allocates chunks of memory, this option decides if these operations will make the entire wasm module fail when memory is exhausted, or if the glue code is allowed to expand the total memory to accommodate the allocation.
- -s MALLOC=...chooses which- malloc()implementation to use.- emmallocis a small and fast- malloc()implementation specifically for Emscripten. The alternative is- dlmalloc, a fully-fledged- malloc()implementation. You only need to switch to- dlmallocif you are allocating a lot of small objects frequently or if you want to use threading.
- -s EXPORT_ES6=1will turn the JavaScript code into an ES6 module with a default export that works with any bundler. Also requires- -s MODULARIZE=1to be set.
The following flags are not always necessary or are only helpful for debugging purposes:
- -s FILESYSTEM=0is a flag that relates to Emscripten and it's ability to emulate a filesystem for you when your C/C++ code uses filesystem operations. It does some analysis on the code it compiles to decide whether to include the filesystem emulation in the glue code or not. Sometimes, however, this analysis can get it wrong and you pay a rather hefty 70kB in additional glue code for a filesystem emulation that you might not need. With- -s FILESYSTEM=0you can force Emscripten to not include this code.
- -g4will make Emscripten include debugging information in the- .wasmand also emit a source maps file for the wasm module. You can read more on debugging with Emscripten in their debugging section.
And there you go! To test this setup, let's whip up a tiny my-module.cpp:
    #include <emscripten/bind.h>
    using namespace emscripten;
    int say_hello() {
      printf("Hello from your wasm module\n");
      return 0;
    }
    EMSCRIPTEN_BINDINGS(my_module) {
      function("sayHello", &say_hello);
    }
And an index.html:
    <!doctype html>
    <title>Emscripten + npm example</title>
    Open the console to see the output from the wasm module.
    <script type="module">
    import wasmModule from "./my-module.js";
    const instance = wasmModule({
      onRuntimeInitialized() {
        instance.sayHello();
      }
    });
    </script>
(Here is a gist containing all files.)
To build everything, run
$ npm install
$ npm run build
$ npm run serve
Navigating to localhost:8080 should show you the following output in the DevTools console:
 
Adding C/C++ code as a dependency
If you want to build a C/C++ library for your web app, you need its code to be
part of your project. You can add the code to your project's repository manually
or you can use npm to manage these kind of dependencies as well. Let's say I
want to use libvpx in my webapp. libvpx
is a C++ library to encode images with VP8, the codec used in .webm files.
However, libvpx is not on npm and doesn't have a package.json, so I can't
install it using npm directly.
To get out of this conundrum, there is
napa. napa allows you to install any git
repository URL as a dependency into your node_modules folder.
Install napa as a dependency:
$ npm install --save napa
and make sure to run napa as an install script:
{
// ...
"scripts": {
    "install": "napa",
    // ...
},
"napa": {
    "libvpx": "git+https://github.com/webmproject/libvpx"
}
// ...
}
When you run npm install, napa takes care of cloning the libvpx GitHub
repository into your node_modules under the name libvpx.
You can now extend your build script to build libvpx. libvpx uses configure
and make to be built. Luckily, Emscripten can help ensure that configure and
make use Emscripten's compiler. For this purpose there are the wrapper
commands emconfigure and emmake:
# ... above is unchanged ...
echo "============================================="
echo "Compiling libvpx"
echo "============================================="
(
    rm -rf build-vpx || true
    mkdir build-vpx
    cd build-vpx
    emconfigure ../node_modules/libvpx/configure \
    --target=generic-gnu
    emmake make
)
echo "============================================="
echo "Compiling libvpx done"
echo "============================================="
echo "============================================="
echo "Compiling wasm bindings"
echo "============================================="
# ... below is unchanged ...
A C/C++ library is split into two parts: the headers (traditionally .h or
.hpp files) that define the data structures, classes, constants etc. that a
library exposes and the actual library (traditionally .so or .a files). To
use the VPX_CODEC_ABI_VERSION constant of the library in your code, you have
to include the library's header files using a #include statement:
#include "vpxenc.h"
#include <emscripten/bind.h>
int say_hello() {
    printf("Hello from your wasm module with libvpx %d\n", VPX_CODEC_ABI_VERSION);
    return 0;
}
The problem is that the compiler doesn't know where to look for vpxenc.h.
This is what the -I flag is for. It tells the compiler which directories to
check for header files. Additionally, you also need to give the compiler the
actual library file:
# ... above is unchanged ...
echo "============================================="
echo "Compiling wasm bindings"
echo "============================================="
(
    # Compile C/C++ code
    emcc \
    ${OPTIMIZE} \
    --bind \
    -s STRICT=1 \
    -s ALLOW_MEMORY_GROWTH=1 \
    -s ASSERTIONS=0 \
    -s MALLOC=emmalloc \
    -s MODULARIZE=1 \
    -s EXPORT_ES6=1 \
    -o ./my-module.js \
    -I ./node_modules/libvpx \
    src/my-module.cpp \
    build-vpx/libvpx.a
# ... below is unchanged ...
If you run npm run build now, you will see that the process builds a new .js
and a new .wasm file and that the demo page will indeed output the constant:
 
You will also notice that the build process takes a long time. The reason for
long build times can vary. In the case of libvpx, it takes a long time because
it compiles an encoder and a decoder for both VP8 and VP9 every time you run
your build command, even though the source files haven't changed. Even a small
change to your my-module.cpp will take a long time to build. It would be very
beneficial to keep the build artifacts of libvpx around once they have been
built the first time.
One way to achieve this is using environment variables.
# ... above is unchanged ...
eval $@
echo "============================================="
echo "Compiling libvpx"
echo "============================================="
test -n "$SKIP_LIBVPX" || (
    rm -rf build-vpx || true
    mkdir build-vpx
    cd build-vpx
    emconfigure ../node_modules/libvpx/configure \
    --target=generic-gnu
    emmake make
)
echo "============================================="
echo "Compiling libvpx done"
echo "============================================="
# ... below is unchanged ...
(Here's a gist containing all the files.)
The eval command allows us to set environment variables by passing parameters
to the build script. The test command will skip building libvpx if
$SKIP_LIBVPX is set (to any value).
Now you can compile your module but skip rebuilding libvpx:
$ npm run build:emscripten -- SKIP_LIBVPX=1
Customizing the build environment
Sometimes libraries depend on additional tools to build. If these dependencies
are missing in the build environment provided by the Docker image, you need to
add them yourself. As an example, let's say you also want to build the
documentation of libvpx using doxygen. Doxygen is not
available inside your Docker container, but you can install it using apt.
If you were to do that in your build.sh, you would re-download and re-install
doxygen everytime you want to build your library. Not only would that be
wasteful, but it would also stop you from working on your project while offline.
Here it makes sense to build your own Docker image. Docker images are built by
writing a Dockerfile that describes the build steps. Dockerfiles are quite
powerful and have a lot of
commands, but most of the
time you can get away with just using FROM, RUN and ADD. In this case:
FROM trzeci/emscripten
RUN apt-get update && \
    apt-get install -qqy doxygen
With FROM, you can declare which Docker image you want to use as a starting
point. I chose trzeci/emscripten as a basis — the image you have been using
all along. With RUN, you instruct Docker to run shell commands inside the
container. Whatever changes these commands make to the container is now part of
the Docker image. To make sure that your Docker image has been built and is
available before you run build.sh, you have to adjust your package.json a
bit:
{
    // ...
    "scripts": {
    "build:dockerimage": "docker image inspect -f '.' mydockerimage || docker build -t mydockerimage .",
    "build:emscripten": "docker run --rm -v $(pwd):/src mydockerimage ./build.sh",
    "build": "npm run build:dockerimage && npm run build:emscripten && npm run build:app",
    // ...
    },
    // ...
}
(Here's a gist containing all the files.)
This will build your Docker image, but only if it has not been built yet. Then
everything runs as before, but now the build environment has the doxygen
command available, which will cause the documentation of libvpx to be built as
well.
Conclusion
It is not surprising that C/C++ code and npm are not a natural fit, but you can make it work quite comfortably with some additional tooling and the isolation that Docker provides. This setup will not work for every project, but it's a decent starting point that you can adjust for your needs. If you have improvements, please share.
Appendix: Making use of Docker image layers
An alternative solution is to encapsulate more of these problems with Docker and Docker's smart approach to caching. Docker executes Dockerfiles step-by-step and assigns the result of each step an image of it's own. These intermediate images are often called "layers". If a command in a Dockerfile hasn't changed, Docker won't actually re-run that step when you are re-building the Dockerfile. Instead it reuses the layer from the last time the image was built.
Previously, you had to go through some effort to not rebuild libvpx every time
you build your app. Instead you can move the building instructions for libvpx
from your build.sh into the Dockerfile to make use of Docker's caching
mechanism:
FROM trzeci/emscripten
RUN apt-get update && \
    apt-get install -qqy doxygen git && \
    mkdir -p /opt/libvpx/build && \
    git clone https://github.com/webmproject/libvpx /opt/libvpx/src
RUN cd /opt/libvpx/build && \
    emconfigure ../src/configure --target=generic-gnu && \
    emmake make
(Here's a gist containing all the files.)
Note that you need to manually install git and clone libvpx as you don't have
bind mounts when running docker build. As a side-effect, there is no need for
napa anymore.
