Binaryen is a compiler and toolchain
infrastructure library for WebAssembly, written in C++. It aims to make
compiling to WebAssembly intuitive, fast, and effective. In this post, using the
example of a synthetic toy language called ExampleScript, learn how to write
WebAssembly modules in JavaScript using the Binaryen.js API. You'll cover the
basics of module creation, function addition to the module, and exporting
functions from the module. This will give you knowledge about the overall
mechanics of compiling actual programming languages to WebAssembly. Further,
you'll learn how to optimize Wasm modules both with Binaryen.js and on the
command line with wasm-opt
.
Background on Binaryen
Binaryen has an intuitive C API in a single header, and can also be used from JavaScript. It accepts input in WebAssembly form, but also accepts a general control flow graph for compilers that prefer that.
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. Binaryen's internal IR uses compact data structures and is designed for completely parallel code generation and optimization, using all available CPU cores. Binaryen's IR compiles down to WebAssembly due to being a subset of WebAssembly.
Binaryen's optimizer has many passes that can improve code size and speed. These optimizations aim to make Binaryen powerful enough to be used as a compiler backend by itself. It includes WebAssembly-specific optimizations (that general-purpose compilers might not do), which you can think of as Wasm minification.
AssemblyScript as an example user of Binaryen
Binaryen is used by a number of projects, for example, AssemblyScript, which uses Binaryen to compile from a TypeScript-like language directly to WebAssembly. Try the example in the AssemblyScript playground.
AssemblyScript input:
export function add(a: i32, b: i32): i32 {
return a + b;
}
Corresponding WebAssembly code in textual form generated by Binaryen:
(module
(type $0 (func (param i32 i32) (result i32)))
(memory $0 0)
(export "add" (func $module/add))
(export "memory" (memory $0))
(func $module/add (param $0 i32) (param $1 i32) (result i32)
local.get $0
local.get $1
i32.add
)
)
The Binaryen toolchain
The Binaryen toolchain offers a number of useful tools for both JavaScript
developers and command line users. A subset of these tools is listed in the
following; the
full list of contained tools
is available on the project's README
file.
binaryen.js
: A standalone JavaScript library that exposes Binaryen methods for creating and optimizing Wasm modules. For builds, see binaryen.js on npm (or download it directly from GitHub or unpkg).wasm-opt
: Command line tool that loads WebAssembly and runs Binaryen IR passes on it.wasm-as
andwasm-dis
: Command line tools that assemble and disassemble WebAssembly.wasm-ctor-eval
: Command line tool that can execute functions (or parts of functions) at compile time.wasm-metadce
: Command line tool to remove parts of Wasm files in a flexible way that depends on how the module is used.wasm-merge
: Command line tool that merges multiple Wasm files into a single file, connecting corresponding imports to exports as it does so. Like a bundler for JavaScript, but for Wasm.
Compiling to WebAssembly
Compiling one language to another commonly involves several steps, the most important ones are listed in the following list:
- Lexical analysis: Break the source code into tokens.
- Syntax analysis: Create an abstract syntax tree.
- Semantic analysis: Check for errors and enforce language rules.
- Intermediate code generation: Create a more abstract representation.
- Code generation: Translate to the target language.
- Target-specific code optimization: Optimize for the target.
In the Unix world, frequently used tools for compiling are
lex
and
yacc
:
lex
(Lexical Analyzer Generator):lex
is a tool that generates lexical analyzers, also known as lexers or scanners. It takes a set of regular expressions and corresponding actions as input, and generates code for a lexical analyzer that recognizes patterns in the input source code.yacc
(Yet Another Compiler Compiler):yacc
is a tool that generates parsers for syntax analysis. It takes a formal grammar description of a programming language as input and generates code for a parser. Parsers typically produce abstract syntax trees (ASTs) that represent the hierarchical structure of the source code.
A worked example
Given the scope of this post, it's impossible to cover a complete programming language, so for the sake of simplicity, consider a very limited and useless synthetic programming language called ExampleScript that works by expressing generic operations through concrete examples.
- To write an
add()
function, you code up an example of any addition, say2 + 3
. - To write a
multiply()
function, you write, for instance,6 * 12
.
As per the pre-warning, completely useless, but simple enough for its lexical
analyzer to be a single regular expression: /\d+\s*[\+\-\*\/]\s*\d+\s*/
.
Next, there needs to be a parser. Actually, a very simplified version of
an abstract syntax tree can be created by using a regular expression with
named capturing groups:
/(?<first_operand>\d+)\s*(?<operator>[\+\-\*\/])\s*(?<second_operand>\d+)/
.
ExampleScript commands are one per line, so the parser can process the code line-wise by splitting on newline characters. This is enough to check the first three steps from the bullet list before, namely lexical analysis, syntax analysis, and semantic analysis. The code for these steps is in the following listing.
export default class Parser {
parse(input) {
input = input.split(/\n/);
if (!input.every((line) => /\d+\s*[\+\-\*\/]\s*\d+\s*/gm.test(line))) {
throw new Error('Parse error');
}
return input.map((line) => {
const { groups } =
/(?<first_operand>\d+)\s*(?<operator>[\+\-\*\/])\s*(?<second_operand>\d+)/gm.exec(
line,
);
return {
firstOperand: Number(groups.first_operand),
operator: groups.operator,
secondOperand: Number(groups.second_operand),
};
});
}
}
Intermediate code generation
Now that ExampleScript programs can be represented as an abstract syntax tree (albeit a quite simplified one), the next step is to create an abstract intermediate representation. The first step is to create a new module in Binaryen:
const module = new binaryen.Module();
Each line of the abstract syntax tree contains a triple consisting of
firstOperand
, operator
, and secondOperand
. For each of the four possible
operators in ExampleScript, that is, +
, -
, *
, /
, a new
function needs to be added to the module
with Binaryen's Module#addFunction()
method. The parameters of the
Module#addFunction()
methods are as follows:
name
: astring
, represents the name of the function.functionType
: aSignature
, represents the function's signature.varTypes
: aType[]
, indicates additional locals, in the given order.body
: anExpression
, the contents of the function.
There's some more details to unwind and break down and the
Binaryen documentation
can help you navigate the space, but eventually, for ExampleScript's +
operator, you end up at the Module#i32.add()
method as one of several
available
integer operations.
Addition requires two operands, the first and the second summand. For the
function to be actually callable, it needs to be
exported
with Module#addFunctionExport()
.
module.addFunction(
'add', // name: string
binaryen.createType([binaryen.i32, binaryen.i32]), // params: Type
binaryen.i32, // results: Type
[binaryen.i32], // vars: Type[]
// body: ExpressionRef
module.block(null, [
module.local.set(
2,
module.i32.add(
module.local.get(0, binaryen.i32),
module.local.get(1, binaryen.i32),
),
),
module.return(module.local.get(2, binaryen.i32)),
]),
);
module.addFunctionExport('add', 'add');
After processing the abstract syntax tree, the module contains four methods,
three working with integer numbers, namely add()
based on Module#i32.add()
,
subtract()
based on Module#i32.sub()
, multiply()
based on
Module#i32.mul()
, and the outlier divide()
based on Module#f64.div()
because ExampleScript works with floating point results as well.
for (const line of parsed) {
const { firstOperand, operator, secondOperand } = line;
if (operator === '+') {
module.addFunction(
'add', // name: string
binaryen.createType([binaryen.i32, binaryen.i32]), // params: Type
binaryen.i32, // results: Type
[binaryen.i32], // vars: Type[]
// body: ExpressionRef
module.block(null, [
module.local.set(
2,
module.i32.add(
module.local.get(0, binaryen.i32),
module.local.get(1, binaryen.i32)
)
),
module.return(module.local.get(2, binaryen.i32)),
])
);
module.addFunctionExport('add', 'add');
} else if (operator === '-') {
module.subtractFunction(
// Skipped for brevity.
)
} else if (operator === '*') {
// Skipped for brevity.
}
// And so on for all other operators, namely `-`, `*`, and `/`.
If you deal with actual code bases, sometimes there will be dead code that never gets called. To artificially introduce dead code (that will be optimized and eliminated in a later step) in the running example of ExampleScript's compilation to Wasm, adding a non-exported function does the job.
// This function is added, but not exported,
// so it's effectively dead code.
module.addFunction(
'deadcode', // name: string
binaryen.createType([binaryen.i32, binaryen.i32]), // params: Type
binaryen.i32, // results: Type
[binaryen.i32], // vars: Type[]
// body: ExpressionRef
module.block(null, [
module.local.set(
2,
module.i32.div_u(
module.local.get(0, binaryen.i32),
module.local.get(1, binaryen.i32),
),
),
module.return(module.local.get(2, binaryen.i32)),
]),
);
The compiler is almost ready now. It's not strictly necessary, but definitely
good practice to
validate the module
with the Module#validate()
method.
if (!module.validate()) {
throw new Error('Validation error');
}
Obtaining the resulting Wasm code
To
obtain the resulting Wasm code,
two methods exist in Binaryen for getting the
textual representation
as a .wat
file in S-expression
as a human-readable format, and the
binary representation
as a .wasm
file that can directly run in the browser. The binary code can be
run directly in the browser. To see that it worked, logging the exports can
help.
const textData = module.emitText();
console.log(textData);
const wasmData = module.emitBinary();
const compiled = new WebAssembly.Module(wasmData);
const instance = new WebAssembly.Instance(compiled, {});
console.log('Wasm exports:\n', instance.exports);
The complete textual representation for an ExampleScript program with all four
operations is listed in the following. Note how the dead code is still there,
but isn't exposed as per the screenshot of the
WebAssembly.Module.exports()
.
(module
(type $0 (func (param i32 i32) (result i32)))
(type $1 (func (param f64 f64) (result f64)))
(export "add" (func $add))
(export "subtract" (func $subtract))
(export "multiply" (func $multiply))
(export "divide" (func $divide))
(func $add (param $0 i32) (param $1 i32) (result i32)
(local $2 i32)
(local.set $2
(i32.add
(local.get $0)
(local.get $1)
)
)
(return
(local.get $2)
)
)
(func $subtract (param $0 i32) (param $1 i32) (result i32)
(local $2 i32)
(local.set $2
(i32.sub
(local.get $0)
(local.get $1)
)
)
(return
(local.get $2)
)
)
(func $multiply (param $0 i32) (param $1 i32) (result i32)
(local $2 i32)
(local.set $2
(i32.mul
(local.get $0)
(local.get $1)
)
)
(return
(local.get $2)
)
)
(func $divide (param $0 f64) (param $1 f64) (result f64)
(local $2 f64)
(local.set $2
(f64.div
(local.get $0)
(local.get $1)
)
)
(return
(local.get $2)
)
)
(func $deadcode (param $0 i32) (param $1 i32) (result i32)
(local $2 i32)
(local.set $2
(i32.div_u
(local.get $0)
(local.get $1)
)
)
(return
(local.get $2)
)
)
)
Optimizing WebAssembly
Binaryen offers two ways to optimize Wasm code. One in Binaryen.js itself, and one for the command line. The former applies the standard set of optimization rules by default and lets you set the optimize and the shrink level, and the latter by default uses no rules, but instead allows for full customization, which means that with enough experimentation, you can tailor the settings for optimal results based on your code.
Optimizing with Binaryen.js
The most straightforward way to optimize a Wasm module with Binaryen is to
directly call the Module#optimize()
method of Binaryen.js, and optionally
setting the
optimize and the shrink level.
// Assume the `wast` variable contains a Wasm program.
const module = binaryen.parseText(wast);
binaryen.setOptimizeLevel(2);
binaryen.setShrinkLevel(1);
// This corresponds to the `-Os` setting.
module.optimize();
Doing so removes the dead code that was artificially introduced before, so the
textual representation of the Wasm version of the ExampleScript toy example no
longer contains it. Note also how the local.set/get
pairs are removed by the
optimization steps
SimplifyLocals
(miscellaneous locals-related optimizations) and the
Vacuum
(removes obviously unneeded code), and the return
is removed by
RemoveUnusedBrs
(removes breaks from locations that are not needed).
(module
(type $0 (func (param i32 i32) (result i32)))
(type $1 (func (param f64 f64) (result f64)))
(export "add" (func $add))
(export "subtract" (func $subtract))
(export "multiply" (func $multiply))
(export "divide" (func $divide))
(func $add (; has Stack IR ;) (param $0 i32) (param $1 i32) (result i32)
(i32.add
(local.get $0)
(local.get $1)
)
)
(func $subtract (; has Stack IR ;) (param $0 i32) (param $1 i32) (result i32)
(i32.sub
(local.get $0)
(local.get $1)
)
)
(func $multiply (; has Stack IR ;) (param $0 i32) (param $1 i32) (result i32)
(i32.mul
(local.get $0)
(local.get $1)
)
)
(func $divide (; has Stack IR ;) (param $0 f64) (param $1 f64) (result f64)
(f64.div
(local.get $0)
(local.get $1)
)
)
)
There are many
optimization passes,
and Module#optimize()
uses the particular optimize and shrink levels' default
sets. For full customization, you need to use the command line tool wasm-opt
.
Optimizing with the wasm-opt command line tool
For full customization of the to-be-used passes, Binaryen includes the
wasm-opt
command line tool. To get a
complete list of the possible optimization options,
check the tool's help message. The wasm-opt
tool is probably the most popular
of the tools, and is used by several compiler toolchains to optimize Wasm code,
including Emscripten,
J2CL,
Kotlin/Wasm,
dart2wasm,
wasm-pack, and others.
wasm-opt --help
To give you a feel for the passes, here's an excerpt of some of the ones that are comprehensible without expert knowledge:
- CodeFolding: Avoids duplicate code by merging it (for example, if two
if
arms have some shared instructions at their end). - DeadArgumentElimination: Link time optimization pass to remove arguments to a function if it is always called with the same constants.
- MinifyImportsAndExports: Minifies them to
"a"
,"b"
. - DeadCodeElimination: Remove dead code.
There's an
optimization cookbook
available with several tips for identifying which of the various flags are more
important and worth trying first. For example, sometimes running wasm-opt
repeatedly over and over again shrinks the input further. In such cases, running
with the
--converge
flag
keeps iterating until no further optimization happens and a fixed point is
reached.
Demo
To see the concepts introduced in this post in action, play with the embedded demo providing it any ExampleScript input you can think of. Also be sure to view the source code of the demo.
Conclusions
Binaryen provides a powerful toolkit for compiling languages to WebAssembly and optimizing the resulting code. Its JavaScript library and command-line tools offer flexibility and ease of use. This post demonstrated the core principles of Wasm compilation, highlighting Binaryen's effectiveness and potential for maximum optimization. While many of the options for customizing Binaryen's optimizations require deep knowledge about the internals of Wasm, usually the default settings work great already. With that, happy compiling and optimizing with Binaryen!
Acknowledgements
This post was reviewed by Alon Zakai, Thomas Lively and Rachel Andrew.