Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a side-tracing + hardware tracing bug. #1475

Merged
merged 2 commits into from
Nov 29, 2024
Merged

Conversation

vext01
Copy link
Contributor

@vext01 vext01 commented Nov 25, 2024

When we are constructing a side trace we will have first deopted back to the conditional branch that caused the guard to fail and side-tracing to start. Because the conditional branch is re-executed, this means that when we are doing hardware tracing, we will see a spurious block at the beginning of the trace that isn't actually part of the trace. We therefore skip it.

Fixes crashes like this, where we get a local_map miss for local variables relating to the conditional branch:

thread '' panicked at ykrt/src/compile/jitc_yk/trace_builder.rs:404:69: no entry found for key

@vext01
Copy link
Contributor Author

vext01 commented Nov 25, 2024

There is still one other non-deterministic bug lurking, but it's quite rare, and I hope this is enough to clear the PR backlog.

Once all the PRs are in I can try to repro the other bug. If I'm lucky, something fixes it already ! 🫰

@ltratt ltratt added this pull request to the merge queue Nov 25, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 25, 2024
@ltratt
Copy link
Contributor

ltratt commented Nov 25, 2024

Please force push a fix.

@vext01
Copy link
Contributor Author

vext01 commented Nov 25, 2024

Force pushed the fix. It's not pretty.

@ltratt ltratt added this pull request to the merge queue Nov 25, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 25, 2024
@vext01
Copy link
Contributor Author

vext01 commented Nov 26, 2024

Turns out we didn't understand the problem entirely in our last attempt. 9c3613d is (we believe) a correct fix.

I've cherry picked the commit from #1473 into this branch, because I'm seeing that problem crop up more and more and it's unlikely that my our changes will merge without it.

I've done a fairly large benchmark run, run unit tests and lua tests and haven't had any issues. Fingers crossed.

If you are OK with this, then I'd like to squash, reorder the commits and reword the commit message before merge.

@vext01
Copy link
Contributor Author

vext01 commented Nov 26, 2024

@ptersilie you may want to double-check the comments I've revised since you last saw this.

// identified by `prev_bid`. The way that we deopt back to (and re-execute) the conditional
// that caused the deopt, combined with the way that hardware tracing works, means that we
// can see duplicated `prev_bid` entries at the start of the trace. When we side-trace, we
// don't want those duplicates in the trace, hence we strip them.
Copy link
Contributor

@ptersilie ptersilie Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

The variable `prev_bid` contains the block of the guard that initiated
side-tracing (for normal traces this is set to `None`). When hardware tracing,
we capture this block again as part of the side-trace. However, since we've
already processed this block in the parent trace, we must not process it again
in the side-trace.

Typically, the mapper would strip this block for us, but for codegen related
reasons, e.g. a switch statement codegenning to many machine blocks, it's
possible for multiple duplicates of this same block to show up here, which we
all need to skip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a19aba

// that some conditionals (e.g. switches) span multiple machine blocks, so this may
// remove only part of the conditional. Unfortunately we don't have enough information
// at this level to kill all of the conditional, so there's an additional purge loop in
// the trace builder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

If we are side-tracing then this attempts to remove the block containing the
failed guard, which is captured by the hardware tracer, but which we have
already executed in the parent trace. Note though, that some conditionals (e.g.
switches) can span multiple machine blocks, which are not all removed here.
Since we don't have enough information at this level to remove all of them,
there's a workaround in the trace builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ad61c8e

@ltratt
Copy link
Contributor

ltratt commented Nov 26, 2024

I'm fine with this: as soon as Lukas is happy with the updated comments, please squash.

@ptersilie
Copy link
Contributor

I am. Let's go!

ltratt and others added 2 commits November 26, 2024 14:35
Currently our register allocator assumes that SSA variables can't alias.
This is true for the parts under our control -- but isn't true of the
inputs to a trace! For example both `%0` and `%1` might come from `RAX`.
This caused a panic in the register allocator.

I'd noted the problem in ykjit#1449 but
thought it happened rarely. Edd noticed that it happens in other, more
common situations (including as a result of common subexpression
elimination), including in a yklua benchmark, and created a test for it.

This commit is a "quick fix" (it's not pretty, or efficient): when we
spot aliasing, we spill aliased variables onto the stack, implicitly
dealiasing them. In other words, we avoid having to think about aliasing
in the main bulk of the register allocator at the expense of having to
spill (and potentially unspill) values unnecessarily.

Co-authored-by: Edd Barrett <[email protected]>
When we are constructing a side trace we will have first deopted back to
the conditional branch that initiated side-tracing. Because the
conditional branch is re-executed, this means that when we are doing
hardware tracing, we will likely see a spurious blocks at the beginning
of the trace that are not desirable in the side-trace. We therefore skip
them.

Fixes crashes like this, where we get a local_map miss for local
variables relating to the conditional branch:

thread '<unnamed>' panicked at ykrt/src/compile/jitc_yk/trace_builder.rs:404:69:
no entry found for key

Co-authored-by: Lukas Diekmann <[email protected]>
@vext01
Copy link
Contributor Author

vext01 commented Nov 26, 2024

squashed and rebased.

@ltratt ltratt added this pull request to the merge queue Nov 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024
@vext01
Copy link
Contributor Author

vext01 commented Nov 26, 2024

Can we retry this? That's a different bug.

@ltratt ltratt added this pull request to the merge queue Nov 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024
@ltratt ltratt added this pull request to the merge queue Nov 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 29, 2024
@ltratt ltratt added this pull request to the merge queue Nov 29, 2024
Merged via the queue into ykjit:master with commit a25b17c Nov 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants