Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 128 bit fat pointers for continuation objects #186

Merged
merged 18 commits into from
Jun 12, 2024

Conversation

frank-emrich
Copy link

This PR changes the representation introduced in #182 , where continuation objects were turned into tagged pointers, containing a pointer to a VMContRef as well as a 16bit sequence counter to perform linearity checks.

In this PR, the representation is changed from 64bit tagged pointers to 128bit fat pointers, where 64bit are used for the pointer and the sequence counter.

Some implementation details:

  • The design introduced in Continuation objects as fat pointers #182, where we use disassemble_contobj and assemble_contobj to go from effectively Optional<VMContObj> to Optional<VMContRef> is preserved.
  • The feature unsafe_disable_continuation_linearity_check is preserved: If it is enabled, we do not use fat (or tagged) pointers at all, and all revision checks are disabled.
  • Overflow checks for the revision counter are no longer necessary and have been removed.
  • In wasm, we now use the SIMD type I8X16 for any value of type (ref $continuation) and (ref null $continuation). See the comment on vm_contobj_type in shared.rs for why we cannot use I128 or I64X2 instead.
  • Some translate_* functions in the FuncEnvironment trait now need to take a FunctionBuilder parameter, instead of FuncCursor, which slightly increases the footprint of this PR.
  • The implementation of table.fill for continuation tables was missing. I've added this and in the process extended cont_table.wast to be generally more exhaustive.
  • For those libcalls that take a parameter that is a variant type including VMContObj, I've introduced dedicated versions for the VMContObj case, namely table_fill_cont_obj and table_grow_cont_obj in libcalls.rs. These manually split the VMContObj into two parts.

@frank-emrich
Copy link
Author

Some benchmarking results:
First, I compare the fat pointer implementation against the existing tagged pointer one. Enabling them actually makes all benchmarks except c10m fail, because they overflow the counter. Thus, I've had to slightly tweak their parameters.
In the list below, each line shows the value of X/Y, where X is the runtime of that particular benchmark with tagged pointers, and Y is the runtime with fat pointers. As usual, the difference between, say c10m_wasmfx and c10m_wasmfx_fiber is that the latter uses the fiber interface, while the former uses handwritten wat files.

Suite: c10m
c10m_wasmfx: 1.0125704809561387
c10m_wasmfx_fiber: 0.9931000528537908

Suite: sieve (cut number of primes in half)
sieve_wasmfx: 0.9637743103971731
sieve_wasmfx_fiber: 0.9910300798077857

Suite: skynet (5 instead of 6 levels)
skynet_wasmfx: 0.9970199355953799
skynet_wasmfx_fiber: 0.9912801597259853

Suite: state
only runs when counting up to 8000, at which point runtime is 10ms

I now compare the performance impact of enabling vs disabling the linearity check when using this PR (i.e., whether or not the unsafe_disable_continuation_linearity_check is enabled). Again the values shown are X/Y, where X is the runtime without linearity checks, and Y is the runtime with linearity checks.

Suite: c10m
c10m_wasmfx: 0.9162058249858285
c10m_wasmfx_fiber: 0.9677704802233246
Suite: sieve
sieve_wasmfx: 0.9758646600083649
sieve_wasmfx_fiber: 0.9808578875186281
Suite: skynet
skynet_wasmfx: 0.9675361140008778
skynet_wasmfx_fiber: 0.9859123548277564
Suite: state
state_wasmfx: 0.9729201800828162
state_wasmfx_fiber: 0.983206464991699

@frank-emrich frank-emrich requested a review from dhil May 27, 2024 16:43
Copy link
Member

@dhil dhil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the removal of the cont_twice.wast test.

Comment on lines 680 to 696
# Crude check for whether
# `unsafe_disable_continuation_linearity_check` makes the test
# `cont_twice` fail.
- run: |
(cargo test --features=unsafe_disable_continuation_linearity_check --test wast -- --exact Cranelift/tests/misc_testsuite/typed-continuations/cont_twice.wast && test $? -eq 101) || test $? -eq 101

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this test? It should still fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on wast.rs

crates/cranelift/src/wasmfx/optimized.rs Outdated Show resolved Hide resolved
// continuation reference and the revision count.
// If `unsafe_disable_continuation_linearity_check` is enabled, the revision value is arbitrary.
// To denote the continuation being `None`, `init_contref` may be 0.
table_grow_cont_obj(vmctx: vmctx, table: i32, delta: i32, init_contref: pointer, init_revision : i64) -> i32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be nice to extend the interface of libcalls API to support 128 bit wide values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a particularly nice way of doing that. We would effectively have to do the splitting into two i64 values at the libcall translation layer, and thus the implementation of the libcall in libcalls.rs would still receive two parameters. This only gets uglier when you then incorporate the switching between safe and unsafe mode. Given that we only have two libcalls actually taking these kinds of values, I'd rather avoid all of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting doing it now. But I think it will be simpler than you think. We should be able to map a hypothetical i128 to Rust u128 just as i32 maps to u32, etc.

crates/types/src/lib.rs Outdated Show resolved Hide resolved
tests/wast.rs Outdated
Comment on lines 194 to 200
// This test specifically checks that we catch a continuation being
// resumed twice, which we cannot detect in this mode.
if test.ends_with("cont_twice.wast") {
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this is only true when unsafe_disable_continuation_linearity_check is toggled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that check should read test.ends_with("cont_twice.wast") && cfg!(feature = "unsafe_disable_continuation_linearity_check").

My intention is to make sure that the test suite passes normally with this feature enabled, thus disabling this particular test in its presence. Given that, I had to remove the check regarding cont_twice.wast from main.yml. Or are you particularly interested in ensuring that the test does indeed fail if unsafe_disable_continuation_linearity_check is enabled? That's reasonable (the test case will just not trap with the message expected in that case), but beyond that I would consider the behaviour of that program to be undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure the test fails when unsafe_disable_continuation_linearity_check is toggled.

Comment on lines -463 to -467
let overflow =
builder
.ins()
.icmp_imm(IntCC::UnsignedLessThan, revision_plus1, 1 << 16);
builder.ins().trapz(overflow, ir::TrapCode::IntegerOverflow); // TODO(dhil): Consider introducing a designated trap code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to preserve these traps in debug mode. I think that can prove useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what should they check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should check whether the revision counter has wrapped around.

@frank-emrich
Copy link
Author

I noticed that there is an issue when continuation tables are allocated in a TablePool. I'll update the PR once I have time to fix it.

@dhil
Copy link
Member

dhil commented May 29, 2024

I noticed that there is an issue when continuation tables are allocated in a TablePool. I'll update the PR once I have time to fix it.

What's the problem/error?

@frank-emrich
Copy link
Author

The TablePool manages a single mmapped memory region from which it allocates all tables. To this end, it calculates the required overall size of this region as max_number_of_allowed_tables * max_allowed_element_count_per_table * size_of_each_table_entry. Thus, the memory for table with index i in the pool then starts at i * max_allowed_element_count_per_table * size_of_each_table_entry.

However, all of this is based on the (hardcoded) assumption that all table entries across all table types are pointer-sized (i.e., size_of_each_table_entry is sizeof(*mut u8)). But as of this PR, this is not the case anymore.

I will address this as follows:

  1. Change the calculation of the overall size of the mmapped region to max_number_of_allowed_tables * max_allowed_element_count_per_table * max_size_of_each_table_entry, where max_size_of_each_table_entry is now sizeof(VMContObj) == 16. This effectively doubles the amount of address space occupied by the table pool. The calculation of the start address of each table is changed accordingly.
  2. Change the logic for allocating and deallocating tables from the pool so that we take the element size for that particular table type into account when committing and decommitting memory.

In summary, these changes mean that while the table pool occupies more virtual address space, the amount of actually committed pages for non-continuation tables does not change.

There are some other solutions, which seem less preferable:

  1. Simply refuse to allocate continuation tables that have more than max_allowed_element_count_per_table / 2 entries. That seems dodgy.
  2. Have two separate mmapped regions, one for tables with pointer-sized entries and one for tables that contain fat pointers. Despite complicating the implementation of TablePool, it has the following drawback, defeating the whole purpose of the separation: The current design of the TablePool assumes that you allocate (but don't commit) all the required memory upfront. But the size of the mmapped region for small tables + the size of the region for large tables would together be larger than the single unified region proposed above.

@frank-emrich
Copy link
Author

I have implemented this fix now independently #192, meaning that the current PR needs to be landed after #192.

dhil pushed a commit that referenced this pull request Jun 12, 2024
This PR provides a prerequisite for #186, by implementing a solution for
a problem originally described
[here](#186 (comment)).

To reiterate, the problem is as follows:
For "static" tables (= tables managed my a `TablePool`), the `TablePool`
manages a single mmapped memory region from which it allocates all
tables. To this end, it calculates the required overall size of this
region as `max_number_of_allowed_tables` *
`max_allowed_element_count_per_table` * `size_of_each_table_entry`.
Thus, the memory for table with index i in the pool then starts at i *
`max_allowed_element_count_per_table` * `size_of_each_table_entry`.

However, all of this is based on the (hardcoded) assumption that all
table entries across all table types are pointer-sized (i.e.,
`size_of_each_table_entry` is `sizeof(*mut u8))`. But once #186 lands,
this is not the case any more.

This PR addresses this as follows:

1. Change the calculation of the overall size of the mmapped region to
`max_number_of_allowed_tables` * `max_allowed_element_count_per_table` *
`max_size_of_each_table_entry`, where `max_size_of_each_table_entry`
will be `sizeof(VMContObj)` == 16 once #186 lands. This effectively
doubles the amount of address space occupied by the table pool. The
calculation of the start address of each table is changed accordingly.
2. Change the logic for allocating and deallocating tables from the pool
so that we take the element size for that particular table type into
account when committing and decommitting memory.

Note that the logic implemented this PR is independent from the
underlying element sizes. This means that this PR itself does not change
the space occupied by the tables, as `max_size_of_each_table_entry` is
currently still the size of a pointer. The necessary changes happen
implicitly once #186 lands, which changes the size of `ContTableElem`
which in turns changes the constant `MAX_TABLE_ELEM_SIZE`.

In summary, these changes mean that in the future the table pool
occupies more virtual address space, but the amount of actually
committed pages for non-continuation tables does not change.
@frank-emrich
Copy link
Author

This should be good to go now

Copy link
Member

@dhil dhil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -692,7 +692,7 @@ jobs:
# `unsafe_disable_continuation_linearity_check` makes the test
# `cont_twice` fail.
- run: |
(cargo test --features=unsafe_disable_continuation_linearity_check --test wast -- --exact Cranelift/tests/misc_testsuite/typed-continuations/cont_twice.wast && test $? -eq 101) || test $? -eq 101
(cargo run --features=unsafe_disable_continuation_linearity_check -- wast -W=exceptions,function-references,typed-continuations tests/misc_testsuite/typed-continuations/cont_twice.wast && test $? -eq 101) || test $? -eq 101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cargo run and not cargo test? I thought the cargo test artifact would already have been built (maybe the run artifact hasn't/has too?).

Copy link
Author

@frank-emrich frank-emrich Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular test is now #[ignore]-d in the presence of unsafe_disable_continuation_linearity_check. Thus, you need to manually feed it into wasmtime wast to run it.

Edit: Sorry, it's not actually ignored using #[ignore], but manually skipped by the logic in tests/wast.rs. But the result is the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the following works:

cargo test --features=unsafe_disable_continuation_linearity_check --test wast -- --include-ignored --exact Cranelift/tests/misc_testsuite/typed-continuations/cont_twice.wast

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of avoiding additional building, it shouldn't make a difference anyway. As far as I can tell, this is the only place where we actually build with unsafe_disable_continuation_linearity_check, meaning that it will cause a separate rebuild of most stuff anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am fine with either. I was just curious. Thanks!

@frank-emrich frank-emrich merged commit 78b813d into wasmfx:main Jun 12, 2024
36 checks passed
@frank-emrich frank-emrich deleted the 128-fatpointers branch June 13, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants