Accessing memory inside WebAssembly modules using Swift

This is part of a series we’re writing on WebAssembly on iOS. Be sure to check out all the articles in this series:

  1. Using Wasm on iOS
  2. Loading Wasm modules in Swift
  3. Calling Wasm functions using Swift
  4. Accessing memory inside Wasm modules using Swift

After adding the ability to call Wasm functions to WasmInterpreter, our next task was to allow the library’s consumer to read and write the memory of a WebAssembly module. This required some research to learn about how memory works in Wasm modules and how Wasm3 (and, by extension, our Swift wrapper CWasm3) interacts with it.

Memory in WebAssembly

Memory in WebAssembly is represented as a contiguous, mutable array of bytes. You can specify a maximum size manually. The size of a module’s memory block must always be a multiple of the WebAssembly page size, which is 64 kibibytes (KiB). The following WebAssembly module, written in the WebAssembly text format, specifies a maximum memory size of one WebAssembly page (64 KiB).

(module
  (memory 1))

Remember, although this is a valid WebAssembly module, it can’t be interpreted directly. First, you have to translate the text format into the WebAssembly binary format. The easiest way to do so is to use a command-line program called wat2wasm.

Accessing WebAssembly memory using CWasm3

CWasm3 exposes a function that returns a pointer to the beginning of a module’s memory buffer and the total size of that buffer. As is the case for most C functions exposed to Swift, the method signature can be a bit difficult to parse, but here’s a simplified version of it:

func m3_GetMemory(
  _: IM3Runtime, 
  _: UnsafeMutablePointer<UInt32>, 
  _: UInt32
) -> UnsafeMutablePointer<UInt8>?

The function takes an instance of the Wasm3 runtime, an in-out variable that will hold the total size of the module’s memory buffer after the function returns, and a third, unused variable that should always be 0. Upon successful execution of the function, it returns a pointer to the beginning of the Wasm module’s memory buffer. These two pieces of information—the memory pointer and the buffer’s total size—were all we needed to read from and write to the Wasm module’s memory buffer. However, this API wasn’t very consumer-friendly. We needed to do better.

Creating a native Swift wrapper around the memory buffer

WebAssembly’s memory block is roughly analogous to the idea of heap memory. So, when modeling it in code, we decided to create a simple data structure called Heap that could be used to access the memory buffer.

struct Heap {
  let pointer: UnsafeMutablePointer<UInt8>
  let size: Int
  
  func isValid(byteOffset: Int, length: Int) -> Bool {
    byteOffset + length <= size
  }
}

The Heap struct is very simple. It contains a pointer to the beginning of the module’s memory block and its total size. We also added a helper function for verifying a given memory range lives within the memory buffer.

We initialized Heap using m3_GetMemory():

func heap() throws -> Heap {
  let totalBytes = UnsafeMutablePointer<UInt32>.allocate(capacity: 1)
  
  // Don't forget to deallocate the `UnsafeMutablePointer`
  defer { totalBytes.deallocate() }

  guard let bytesPointer = m3_GetMemory(_runtime, totalBytes, 0)
  else { throw WasmInterpreterError.invalidMemoryAccess }

  return Heap(pointer: bytesPointer, size: Int(totalBytes.pointee))
}

It’s possible for m3_GetMemory() to return NULL if the runtime doesn’t exist. We handled that possibility by throwing an error.

By using the Heap struct returned by this function, we were able to access or modify a Wasm module’s memory buffer. However, we didn’t want to expose this API to consumers of WasmInterpreter because it required the use of unsafe pointers, which went against our goal of exposing a clean, safe, Swift-native API.

Ideally, we wanted to be able to write code like this:

try module.writeToHeap(string: "Hello", byteOffset: 0)
print(try module.stringFromHeap(byteOffset: 0, length: 5)) // prints "Hello"

Converting raw pointers to Data and back again

The first step towards creating a clean API was to convert Heap’s raw pointer to Swift’s native Data type using Data.init(bytes:count:). Before trying to initialize Data, we verified the memory was valid.

func dataFromHeap(byteOffset: Int, length: Int) throws -> Data {
  let heap = try self.heap()

  // Ensure the provided memory range is valid
  guard heap.isValid(byteOffset: byteOffset, length: length)
  else { throw WasmInterpreterError.invalidMemoryAccess }

  // Advance the pointer to the correct position in the buffer
  return Data(bytes: heap.pointer.advanced(by: byteOffset), count: length)
}

Now that we were able to fetch data from the memory buffer, our next step was writing data to it. This was a bit more complicated because we needed to first convert the Data provided by the caller of this function into UnsafeRawBufferPointer and then write it to the correct location in the Wasm module’s memory buffer.

func writeToHeap(data: Data, byteOffset: Int) throws {
  let heap = try self.heap()

  // Ensure the data can fit inside of the memory buffer
  guard heap.isValid(byteOffset: byteOffset, length: data.count)
  else { throw WasmInterpreterError.invalidMemoryAccess }

  try data.withUnsafeBytes { (rawPointer: UnsafeRawBufferPointer) -> Void in
    guard let pointer = rawPointer.bindMemory(to: UInt8.self).baseAddress
    else { throw WasmInterpreterError.couldNotBindMemory }
    
    heap.pointer
      .advanced(by: byteOffset) // Advance to the correct position in the buffer
      .initialize(from: pointer, count: data.count) // Copy the bytes
  }
}

After adding these two functions, we were able to write code like this:

try module.writeToHeap(data: Data("Hello".utf8), byteOffset: 0)
let data = try module.dataFromHeap(byteOffset: 0, length: 5)
print(String(data: data, encoding: .utf8)!) // prints "Hello"

This code was much better than manipulating raw pointers, but it still wasn’t fluent enough for us.

Adding more convenience functions

In our app, Shareup, we typically read and write UTF-8 strings or WebAssembly’s basic data types, for which we defined WasmTypeProtocol, which we discussed in our last post. Given that, we wanted to add convenience functions for each of these use cases.

The string functions were straightforward because we were able to combine dataFromHeap()/writeToHeap() with Swift String’s built-in ability to convert to and from Data.

func stringFromHeap(byteOffset: Int, length: Int) throws -> String {
  let data = try dataFromHeap(byteOffset: byteOffset, length: length)
  
  // Throw an error if the data isn't a valid UTF-8 string
  guard let string = String(data: data, encoding: .utf8)
  else { throw WasmInterpreterError.invalidUTF8String }
  
  return string
}

func writeToHeap(string: String, byteOffset: Int) throws {
  try writeToHeap(data: Data(string.utf8), byteOffset: byteOffset)
}

Oddly, manipulating WebAssembly’s primitive values was more complicated than working with text values. The reason is because, at its essence, UTF-8 text is just a series of single-byte integer values, which can be directly read from or written to a memory buffer. However, WebAssembly’s primitive values are multi-byte values (e.g., a 32-bit integer occupies 4 bytes), which means a number of bytes needs to be read together and interpreted as the correct primitive type. Apple provides a mechanism for doing this using UnsafeRawPointer.bindMemory(to:capacity:).

func valuesFromHeap<T: WasmTypeProtocol>(byteOffset: Int, length: Int) throws -> [T] {
  let heap = try self.heap()

  // Ensure the provided memory range is valid
  guard heap.isValid(byteOffset: byteOffset, length: length)
  else { throw WasmInterpreterError.invalidMemoryAccess }

  // Interpret the buffer as the desired primitive type
  let ptr = UnsafeRawPointer(heap.pointer)
    .advanced(by: byteOffset)
    .bindMemory(to: T.self, capacity: length)

  // Copy the primitive value from the buffer into an array and return it
  return (0..<length).map { ptr[$0] }
}

We were able use our writeToHeap(data:byteOffset:) method to write primitive values to the module’s memory buffer because Swift allows mutable values and collections of values to be implicitly bridged to bytes using &.

func writeToHeap<T: WasmTypeProtocol>(values: Array<T>, byteOffset: Int) throws {
  
  // Make a mutable reference to values to enable implicit bridging
  var values = values
  
  try writeToHeap(
    // Create data from the mutable values array, making sure to multiply
    // the number of values by the size of each value.
    data: Data(bytes: &values, count: values.count * MemoryLayout<T>.size),
    byteOffset: byteOffset
  )
}

After adding these convenience functions, we could write simple, ergonomic code like this:

try module.writeToHeap(
  values: [0, 1, 2, 3].map(Int32.init),
  byteOffset: 0
)
let values = try module.valuesFromHeap(byteOffset: 0, length: 4)
print(values) // prints "[0, 1, 2, 3]"

Endianness

Whenever you read or write raw bytes, you should always pay attention to the endianness of your platform. We didn’t specifically address endianness in this post because 1) it would have made this post even longer than it already is and 2) our use case didn’t require it. All of Apple’s platforms are little-endian, and WebAssembly assumes little-endian byte ordering. However, if we wanted to make this code cross-platform, we would need to convert the bytes we read/write to the correct byte ordering.

Next…

In the next article, I’ll write about how to import native functions into a WebAssembly module. As always, if you don’t want to wait, you can look at all of the code now. Clone WasmInterpreter and start playing around. A good place to start is MemoryModule, which is uses a lot of the memory-accessing functions we wrote today. Be sure to let me know if you have any thoughts or use WasmInterpreter to build something cool.