Skip to content

Commit

Permalink
Fix race condition with destroying tls range. (#398)
Browse files Browse the repository at this point in the history
Running the thread creation loop from hell (you know the one), a
double-free occurs (a double explicit free, not a collection free). This
triggers the assert at
https://github.com/snazzy-d/sdc/blob/d693c6d140be865e304c289502bdb8b0c0c3e0b5/sdlib/d/gc/tcache.d#L716

The stack trace I had was:

```gdb
(gdb) bt
#0  0x00005555555c38d4 in __sd_assert_fail ()
#1  0x00005555555edbfa in d.gc.tcache.ThreadCache.getPageDescriptor(ref d.gc.tcache.ThreadCache, void*) ()
#2  0x00005555555d92e1 in d.gc.tcache.ThreadCache.free(ref d.gc.tcache.ThreadCache, void*) ()
#3  0x00005555555f1a7f in __sd_gc_pre_suspend_hook ()
#4  0x00005555555eb335 in d.gc.signal.suspendThreadImpl(scope d.gc.tstate.ThreadState*) ()
#5  0x00005555555f12f2 in __sd_gc_push_registers ()
#6  0x00005555555eb4fd in d.gc.signal.suspendThreadDelayed(scope d.gc.tstate.ThreadState*) ()
#7  0x00005555555ef7f2 in d.gc.tstate.ThreadState.exitBusyStateSlow(ref d.gc.tstate.ThreadState, ulong) ()
#8  0x00005555555ee017 in d.gc.tstate.ThreadState.exitBusyState(ref d.gc.tstate.ThreadState) ()
#9  0x00005555555d989b in d.gc.tcache.ThreadCache.flushCache(ref d.gc.tcache.ThreadCache) ()
#10 0x00005555555ed9bf in d.gc.tcache.ThreadCache.destroyThread(ref d.gc.tcache.ThreadCache) ()
#11 0x00005555555eec51 in _D1d2gc6thread13destroyThreadFMZv ()
#12 0x00005555555c46d9 in d.rt.trampoline.runThread(scope d.rt.trampoline.ThreadRunner*) ()
#13 0x00007ffff7dfeac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#14 0x00007ffff7e90850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
```

What happens is inside `destroyThread`, the `tlsSegments` array is
freed. However, it is not set to `null`. Then it is interrupted by a
signal telling it to stop for a scan. At this point, it is in a busy
state flushing cache (but I don't think it really matters too much, as
long as the signal is received after this free).

Once it exits the busy state, it runs the suspend hook, we find out it
is a druntime thread (this is with the integration from druntime), and
the hook tries to free the same pointer. However, that pointer points to
an unallocated block, and we have an assert.

Entering the busy state early, and setting the pointer to null, means we
can avoid the race condition.

Note that I can't add a test directly for this, this *requires* druntime
to return true for the `thread_preSuspend` hook.
  • Loading branch information
schveiguy authored and deadalnix committed Oct 18, 2024
1 parent d693c6d commit 3cbfa8e
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 5 deletions.
12 changes: 11 additions & 1 deletion sdlib/d/gc/tcache.d
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,10 @@ public:
}

void destroyThread() {
free(tlsSegments.ptr);
state.enterBusyState();
scope(exit) state.exitBusyState();

clearTLSSegments();
flushCache();
}

Expand Down Expand Up @@ -627,6 +630,13 @@ private:
tlsSegments[index] = makeRange(range);
}

void clearTLSSegments() {
if (tlsSegments.ptr !is null) {
free(tlsSegments.ptr);
tlsSegments = [];
}
}

private:
/**
* Appendable's mechanics:
Expand Down
5 changes: 1 addition & 4 deletions sdlib/dmd/thread.d
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,7 @@ void __sd_gc_pre_suspend_hook(void* stackTop) {
* pushed on it.
*/
import d.gc.tcache;
if (threadCache.tlsSegments.ptr !is null) {
threadCache.free(threadCache.tlsSegments.ptr);
threadCache.tlsSegments = [];
}
threadCache.clearTLSSegments();
}
}

Expand Down

0 comments on commit 3cbfa8e

Please sign in to comment.