-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a side-tracing + hardware tracing bug. #1475
Conversation
There is still one other non-deterministic bug lurking, but it's quite rare, and I hope this is enough to clear the PR backlog. Once all the PRs are in I can try to repro the other bug. If I'm lucky, something fixes it already ! 🫰 |
Please force push a fix. |
1acdeab
to
00b41f4
Compare
Force pushed the fix. It's not pretty. |
Turns out we didn't understand the problem entirely in our last attempt. 9c3613d is (we believe) a correct fix. I've cherry picked the commit from #1473 into this branch, because I'm seeing that problem crop up more and more and it's unlikely that my our changes will merge without it. I've done a fairly large benchmark run, run unit tests and lua tests and haven't had any issues. Fingers crossed. If you are OK with this, then I'd like to squash, reorder the commits and reword the commit message before merge. |
@ptersilie you may want to double-check the comments I've revised since you last saw this. |
// identified by `prev_bid`. The way that we deopt back to (and re-execute) the conditional | ||
// that caused the deopt, combined with the way that hardware tracing works, means that we | ||
// can see duplicated `prev_bid` entries at the start of the trace. When we side-trace, we | ||
// don't want those duplicates in the trace, hence we strip them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
The variable `prev_bid` contains the block of the guard that initiated
side-tracing (for normal traces this is set to `None`). When hardware tracing,
we capture this block again as part of the side-trace. However, since we've
already processed this block in the parent trace, we must not process it again
in the side-trace.
Typically, the mapper would strip this block for us, but for codegen related
reasons, e.g. a switch statement codegenning to many machine blocks, it's
possible for multiple duplicates of this same block to show up here, which we
all need to skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7a19aba
ykrt/src/trace/hwt/mapper.rs
Outdated
// that some conditionals (e.g. switches) span multiple machine blocks, so this may | ||
// remove only part of the conditional. Unfortunately we don't have enough information | ||
// at this level to kill all of the conditional, so there's an additional purge loop in | ||
// the trace builder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
If we are side-tracing then this attempts to remove the block containing the
failed guard, which is captured by the hardware tracer, but which we have
already executed in the parent trace. Note though, that some conditionals (e.g.
switches) can span multiple machine blocks, which are not all removed here.
Since we don't have enough information at this level to remove all of them,
there's a workaround in the trace builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ad61c8e
I'm fine with this: as soon as Lukas is happy with the updated comments, please squash. |
I am. Let's go! |
Currently our register allocator assumes that SSA variables can't alias. This is true for the parts under our control -- but isn't true of the inputs to a trace! For example both `%0` and `%1` might come from `RAX`. This caused a panic in the register allocator. I'd noted the problem in ykjit#1449 but thought it happened rarely. Edd noticed that it happens in other, more common situations (including as a result of common subexpression elimination), including in a yklua benchmark, and created a test for it. This commit is a "quick fix" (it's not pretty, or efficient): when we spot aliasing, we spill aliased variables onto the stack, implicitly dealiasing them. In other words, we avoid having to think about aliasing in the main bulk of the register allocator at the expense of having to spill (and potentially unspill) values unnecessarily. Co-authored-by: Edd Barrett <[email protected]>
When we are constructing a side trace we will have first deopted back to the conditional branch that initiated side-tracing. Because the conditional branch is re-executed, this means that when we are doing hardware tracing, we will likely see a spurious blocks at the beginning of the trace that are not desirable in the side-trace. We therefore skip them. Fixes crashes like this, where we get a local_map miss for local variables relating to the conditional branch: thread '<unnamed>' panicked at ykrt/src/compile/jitc_yk/trace_builder.rs:404:69: no entry found for key Co-authored-by: Lukas Diekmann <[email protected]>
ad61c8e
to
844d090
Compare
squashed and rebased. |
Can we retry this? That's a different bug. |
When we are constructing a side trace we will have first deopted back to the conditional branch that caused the guard to fail and side-tracing to start. Because the conditional branch is re-executed, this means that when we are doing hardware tracing, we will see a spurious block at the beginning of the trace that isn't actually part of the trace. We therefore skip it.
Fixes crashes like this, where we get a local_map miss for local variables relating to the conditional branch:
thread '' panicked at ykrt/src/compile/jitc_yk/trace_builder.rs:404:69: no entry found for key