Compiling C to Wasm

September 4, 2020 | Written by Nathan

Continuing from the previous post, Betting on Wasm, I’d like to walk you through writing and compiling a very simple C program to Wasm. I will start by first quickly describing which languages can currently compile to Wasm, introduce the WASI SDK, and then demonstrate compiling and executing a C program in a browser.

Which languages compile to Wasm?

The list of languages that compile to Wasm is growing all the time. Check out awesome-wasm-langs for a pretty exhaustive list of languages and their status and support for targeting Wasm.

The languages we have looked at, so far, are Rust, AssemblyScript, Go, Swift, and C and C++.

Rust has the largest intersection of its community with the WebAssembly community and probably the most mature tooling. I recommend checking out the Minimal Rust & WebAssembly example for a very bare-bones example and wasm-pack for more sophisticated programs. The Rust + Wasm community’s focus seems to be on writing and running Rust in the browser: for example, wasm-pack literally creates and distributes npm packages. However, for Shareup we want to write in one language and then use the produced .wasm binary in a browser, in an iOS project, on the server, and more. Having a Javascript library output along with the .wasm file isn’t going to help us much outside the browser.

AssemblyScript seems great. It is similar to Typescript, but not a full implementation and there are some very important differences which can be confusing. AssemblyScript includes a very small and optional Javascript runtime called the loader. We’ve experimented and written some code with and without the runtime and we’ve seen extremely small binaries, sometimes smaller than the those produced from our C code.

Swift can also be compiled to Wasm now, which is exciting for us; we already write Swift for our iOS app. However, the binaries we have produced have been at least 4MB or so in size because of the runtime included. We hope the compiler can become smart and only include what it definitely needs.

Go can also compile for Wasm for the browser. It includes a Javascript runtime to make passing complex values back and forth easier. We haven’t played with Go → Wasm ourselves, but it’s nice to know so many languages are on board and it’s an option for us anytime we might need it.

For our work on Shareup we are using C to write our shared code which is then compiled to Wasm. Clang has turned out to be the easiest and most reliable way for us to produce a self-contained Wasm binary which is easily shared to many platforms.

How to compile C to Wasm

I recommend using Clang.

The best blog post I found when learning about compiling C to Wasm in the simplest way is Compiling C to WebAssembly without Emscripten. Reading this and doing similar experiments gave me the confidence to push forward and make more complicated .wasm binaries using Clang and LLVM.

WASI SDK

I recommend using the WASI SDK which includes pre-compiled versions of Clang and LLVM. You can download the latest release for your platform and unzip it into your project. Make sure you have the WASI SDK’s Clang in your PATH before running any of the below commands. If you are not sure what that means then you can checkout these setup instructions.

All the code in this blog post can be found in this repository: shareup/wasm-blog-posts.

A simple program

A simple program we can test with is to add two numbers together. Create an adder.c containing:

__attribute__((export_name("add")))
int add(int a, int b) {
  return a + b;
}

The export_name attribute sets the name for the function when called from outside the Wasm binary.

Also, It’s important we are only using int types and not using any standard library types or functions. We will soon learn about compiling more sophisticated programs, but not yet.

We can compile our program:

$ clang \
  --target=wasm32 \
  -nostdlib \
  -Wl,--no-entry \
  -o adder.wasm \
  adder.c

# clang is the program we are running
# the target is wasm32
# we are not using the c standard library
# tell the linker we don't have an "entry function" or a main
# output a file named adder.wasm
# the source is in adder.c

or, all on one line:

$ clang --target=wasm32 -nostdlib -Wl,--no-entry -o adder.wasm adder.c

Clang will produce an adder.wasm file. The binary Wasm format is not easy to read so we can use the WebAssembly Binary Toolkit to convert the Wasm binary to the more readable WebAssembly Text Format (which usually has a .wat file extension).

(Make sure you have wabt installed and wasm2wat is in your PATH. If you are not sure what that means then you can checkout these setup instructions.)

$ wasm2wat adder.wasm -o adder.wat

Reading the adder.wat file is much easier. You can see my version of adder.wat in the repo. While the file may be a little confusing, there are a few things we can understand:

There is a func named $add which seems to take two arguments ((param i32 i32)) and return a number ((result i32))
There are two exports: memory and add (which seems to reference the $add function)
There are a lot of local.gets and local.sets 😳
There is an i32.add which is the CPU instruction to add numbers, so that’s good I guess

Optimizing the output

It does seem like a lot of work to add two numbers together – so let’s try to suggest to Clang to optimize the Wasm output more and maybe make it a bit shorter.

$ clang \
  --target=wasm32 \
  -O3 \
  -flto \
  -nostdlib \
  -Wl,--no-entry \
  -Wl,--lto-O3 \
  -o adder-optimized.wasm \
  adder.c
  
# clang
# target is wasm32
# O for optimize, level 3, which is a lot
# Include information to pass through to the linker
# no standard library
# linker, no entry function
# linker, also optimize at level 3
# output adder-optimized.wasm
# the source is in adder.c
  
$ wasm2wat adder-optimized.wasm -o adder-optimized.wat

Wow, the resulting .wat is now much easier to understand. I’ll paste my adder-optimized.wat below:

(module
  (type (;0;) (func (param i32 i32) (result i32)))
  (func (;0;) (type 0) (param i32 i32) (result i32)
    local.get 0
    local.get 1
    i32.add)
  (memory (;0;) 2)
  (export "memory" (memory 0))
  (export "add" (func 0)))

Get the first two arguments (local.get 0 and local.get 1) and then i32.add them together. Pretty neat.

Exports

You’ve seen our add function is “exported” which means we can call it from outside the Wasm binary, or what is called the “host machine.” A Wasm binary can have many exports and always has the "memory" export. Exporting the memory means the host machine always has full read and write access to the internal memory of the Wasm VM.

Wasm’s stack instructions only support four types of data: 32bit integers and floats and 64bit integers and floats. These are usually written as: i32, i64, f32, f64. All this to say exported functions can only return numbers and accept numbers for arguments. This may seem very limiting, and it is, but there are ways to communicate with arrays of bytes which I’ll show in the post following this one.

Executing Wasm in a browser

We can execute adder-optimized.wasm in any modern browser. The add function will be “exported” and we can call it from Javascript. Below is an example html file with Javascript inline, name it test.html:

<!doctype html>
<html>
<head><title>Add!</title></head>
<body>
<script type="module">
  // Only want to download and compile the binary once
  let cachedModule = new Promise(async (resolve) => {
    const response = await fetch('./adder-optimized.wasm')
    const bytes = await response.arrayBuffer()
    resolve(WebAssembly.compile(bytes))
  })

  async function add(a, b) {
    const module = await cachedModule
    const instance = await WebAssembly.instantiate(module)
    console.debug(instance.exports.add(a, b))
  }

  // a couple examples:
  add(25, 5)
  add(2, 3)
  
  // now make globally available for the console
  window.add = add
</script>
</body>
</html>

You can serve the above page, navigate to it, open the console, and then use the new add function which should print the answer to the console after getting the answer back from the Wasm VM. In the browser there isn’t a way to instantiate Wasm modules synchronously, so any usage or loading must be async.

If you need help serving this file and loading it in your browser, then checkout these instructions in the project repo.

If you look at the Javascript code carefully you will see I load and compile the Wasm file once, then I create a new Wasm “instance” (think of it as a VM) each time I need to add some numbers. You may think “this is very wasteful creating a new instance each time” and you are right for this very simple case.

However, this new instance technique can make sure no information is leaked or shared between runs, work can be parallelized by multiple Workers, and the memory and stack state is reset each time so we know each program execution is identical. For more complicated programs these characteristics all become super helpful and prevents any type of accidental state sharing or similar problems.

Executing Wasm outside the browser

The browser isn’t the only place Wasm binaries can be executed. One of our favorite alternate execution environments is Deno. Deno has full WebAssembly support and is very quick to boot and run. We currently write all our integration tests in Typescript. We run those tests with Deno to make sure our C program always meets our expectations from the host machine’s perspective.

We can load and execute the wasm in Deno like this (named test.ts):

const bytes = await Deno.readFile('./adder-optimized.wasm')
const module = await WebAssembly.compile(bytes)

async function add (a: number, b: number) {
  const instance = await WebAssembly.instantiate(module)
  return instance.exports.add(a, b) as number
}

console.debug(await add(25, 5))
console.debug(await add(2, 3))

You can execute this Typescript file with Deno like:

deno run --allow-read test.ts

--allow-read is necessary so Deno can read the adder-optimized.wasm file from disk. Deno by default disallows all network and disk access.

We’ve just executed the same Wasm binary in two different environments! And it worked! We could continue to integrate our shared code into a native app or a serverless (or server-full) solution as well.

More to come…

I hope you’ve been able to get a glimpse of how Wasm can allow us to write code once and run it everywhere. In a future post I’ll show a more complex example which requires communicated more than simple numbers, allocating memory, and integrating into even more platforms.