Fix SIGCONT handling on threads blocked in syscalls #3874

KJTsanaktsidis · 2024-11-09T04:00:17Z

If a process receives a SIGSTOP, we emulate the group-stop by:

Leaving the thread which happened to receive the SIGSTOP signal ptrace-stopped
Refusing to schedule any other thread until the group-stop is over

The whole group-stop is therefore emulated by rr and not actually enforced by the kernel.

When a SIGCONT is received, we need to end the group-stop. However, we can't actually know that a ptrace-stopped thread received a signal until we try and resume it. To work around this, we check /proc/tid/status's SigPnd and ShdPnd fields in the scheduler to detect when a thread that's in a group-stop has a pending SIGCONT, and so needs to be PTRACE_CONT'd so we can actually wait and receive that SIGCONT.

A problem however arises in the following case:

A process has at least two threads,
One thread "A" receives a SIGSTOP,
And the other thread "B" is in a blocking system call,
And then a process-directed SIGCONT is sent to the process,
And the scheduler checks if "B" is runnable before checking if "A" is runnable.

In this case, the issue is that the process-directed SIGCONT will set the bit in ShdPnd for both threads. So
t->is_signal_pending(SIGCONT) will be true for both thread A and B. The scheduler then tries to PTRACE_CONT thread B, but it's not actually in a ptrace-stop, so it all goes pear shaped (actually you get an assertion failure in t->resume_execution()).

The fix is not to perform this SigPnd/ShdPnd checking at all for threads that are not actually in a ptrace-stop. They don't need this kind of special handling, because they're actually not ptrace-stopped; when we go to try_wait on them later on, we'll notice that they received a signal, and the handling in RecordTask::signal_delivered will actualy run emulate_SIGCONT then.

Fixes #3871

If a process receives a SIGSTOP, we emulate the group-stop by: * Leaving the thread which happened to receive the SIGSTOP signal ptrace-stopped * Refusing to schedule any other thread until the group-stop is over The whole group-stop is therefore emulated by rr and not actually enforced by the kernel. When a SIGCONT is received, we need to end the group-stop. However, we can't actually _know_ that a ptrace-stopped thread received a signal until we try and resume it. To work around this, we check /proc/tid/status's `SigPnd` and `ShdPnd` fields in the scheduler to detect when a thread that's in a group-stop has a pending SIGCONT, and so needs to be PTRACE_CONT'd so we can actually `wait` and receive that SIGCONT. A problem however arises in the following case: * A process has at least two threads, * One thread "A" receives a SIGSTOP, * And the other thread "B" is in a blocking system call, * And then a process-directed SIGCONT is sent to the process, * And the scheduler checks if "B" is runnable before checking if "A" is runnable. In this case, the issue is that the process-directed SIGCONT will set the bit in `ShdPnd` for _both_ threads. So `t->is_signal_pending(SIGCONT)` will be true for both thread A and B. The scheduler then tries to PTRACE_CONT thread B, but it's not actually in a ptrace-stop, so it all goes pear shaped (actually you get an assertion failure in `t->resume_execution()`). The fix is not to perform this `SigPnd`/`ShdPnd` checking at all for threads that are not actually in a ptrace-stop. They don't need this kind of special handling, because they're actually not ptrace-stopped; when we go to `try_wait` on them later on, we'll notice that they received a signal, and the handling in `RecordTask::signal_delivered` will actually run `emulate_SIGCONT` then.

rocallahan · 2024-11-09T04:15:49Z

Thanks! This is very tricky stuff.

KJTsanaktsidis mentioned this pull request Nov 9, 2024

"Assertion `is_stopped_' failed to hold." with SIGSTOP/SIGCONT in multithreaded program #3871

Closed

KJTsanaktsidis force-pushed the ktsanaktsidis/fix_sigcont_threaded branch from 0bc299a to 6862f7c Compare November 9, 2024 04:04

rocallahan merged commit b1e461a into rr-debugger:master Nov 9, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SIGCONT handling on threads blocked in syscalls #3874

Fix SIGCONT handling on threads blocked in syscalls #3874

KJTsanaktsidis commented Nov 9, 2024 •

edited

Loading

rocallahan commented Nov 9, 2024

Fix SIGCONT handling on threads blocked in syscalls #3874

Fix SIGCONT handling on threads blocked in syscalls #3874

Conversation

KJTsanaktsidis commented Nov 9, 2024 • edited Loading

rocallahan commented Nov 9, 2024

KJTsanaktsidis commented Nov 9, 2024 •

edited

Loading