Skip to content

Commit

Permalink
MEDIUM: debug: on panic, make the target thread automatically allocat…
Browse files Browse the repository at this point in the history
…e its buf

One main problem with panic dumps is that they're filling the dumping
thread's trash, and that the global thread_dump_buffer is too small to
catch enough of them.

Here we're proceeding differently. When dumping threads for a panic, we're
passing the magic value 0x2 as the buffer, and it will instruct the target
thread to allocate its own buffer using get_trash_chunk() (which is signal
safe), so that each thread dumps into its own buffer. Then the thread will
wait for the buffer to be consumed, and will assign its own thread_dump_buffer
to it. This way we can simply dump all threads' buffers from gdb like this:

  (gdb) set $t=0
        while ($t < global.nbthread)
          printf "%s\n", ha_thread_ctx[$t].thread_dump_buffer.area
          set $t=$t+1
        end

For now we make it wait forever since it's only called on panic and we
want to make sure the thread doesn't leave and continues to use that trash
buffer or do other nasty stuff. That way the dumping thread will make all
of them die.

This would be useful to backport to the most recent branches to help
troubleshooting. It backports well to 2.9, except for some trivial
context in tinfo-t.h for an updated comment. 2.8 and older would also
require TAINTED_PANIC. The following previous patches are required:

   MINOR: debug: make mark_tainted() return the previous value
   MINOR: chunk: drop the global thread_dump_buffer
   MINOR: debug: split ha_thread_dump() in two parts
   MINOR: debug: slightly change the thread_dump_pointer signification
   MINOR: debug: make ha_thread_dump_done() take the pointer to be used
   MINOR: debug: replace ha_thread_dump() with its two components

(cherry picked from commit 278b961)
[wt: ctx updt in tinfo-t for comment]
Signed-off-by: Willy Tarreau <[email protected]>
  • Loading branch information
wtarreau committed Oct 19, 2024
1 parent 1b176e7 commit b8adef0
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 11 deletions.
2 changes: 1 addition & 1 deletion include/haproxy/tinfo-t.h
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ struct thread_ctx {

unsigned long long out_bytes; /* total #of bytes emitted */
unsigned long long spliced_out_bytes; /* total #of bytes emitted though a kernel pipe */
struct buffer *thread_dump_buffer; /* NULL out of dump, valid during a dump, 0x01 once done */
struct buffer *thread_dump_buffer; /* NULL out of dump, 0x02=to alloc, valid during a dump, |0x01 once done */
// around 64 bytes here for shared variables

ALWAYS_ALIGN(128);
Expand Down
44 changes: 34 additions & 10 deletions src/debug.c
Original file line number Diff line number Diff line change
Expand Up @@ -598,18 +598,22 @@ void ha_panic()
return;
}

buf = get_trash_chunk();

chunk_reset(&trash);
chunk_appendf(&trash, "Thread %u is about to kill the process.\n", tid + 1);
chunk_printf(&trash, "Thread %u is about to kill the process.\n", tid + 1);
DISGUISE(write(2, trash.area, trash.data));

for (thr = 0; thr < global.nbthread; thr++) {
if (!ha_thread_dump_fill(&trash, thr))
if (thr == tid)
buf = get_trash_chunk();
else
buf = (void *)0x2UL; // let the target thread allocate it

buf = ha_thread_dump_fill(buf, thr);
if (!buf)
continue;
DISGUISE(write(2, trash.area, trash.data));
ha_thread_dump_done(NULL, thr);
b_force_xfer(buf, &trash, b_room(buf));
chunk_reset(&trash);

DISGUISE(write(2, buf->area, buf->data));
/* restore the thread's dump pointer for easier post-mortem analysis */
ha_thread_dump_done(buf, thr);
}

#ifdef USE_LUA
Expand Down Expand Up @@ -2020,19 +2024,33 @@ static void debug_release_memstats(struct appctx *appctx)

/* handles DEBUGSIG to dump the state of the thread it's working on. This is
* appended at the end of thread_dump_buffer which must be protected against
* reentrance from different threads (a thread-local buffer works fine).
* reentrance from different threads (a thread-local buffer works fine). If
* the buffer pointer is equal to 0x2, then it's a panic. The thread allocates
* the buffer from its own trash chunks so that the contents remain visible in
* the core, and it never returns.
*/
void debug_handler(int sig, siginfo_t *si, void *arg)
{
struct buffer *buf = HA_ATOMIC_LOAD(&th_ctx->thread_dump_buffer);
int harmless = is_thread_harmless();
int no_return = 0;

/* first, let's check it's really for us and that we didn't just get
* a spurious DEBUGSIG.
*/
if (!buf || (ulong)buf & 0x1UL)
return;

/* Special value 0x2 is used during panics and requires that the thread
* allocates its own dump buffer among its own trash buffers. The goal
* is that all threads keep a copy of their own dump.
*/
if ((ulong)buf == 0x2UL) {
no_return = 1;
buf = get_trash_chunk();
HA_ATOMIC_STORE(&th_ctx->thread_dump_buffer, buf);
}

/* now dump the current state into the designated buffer, and indicate
* we come from a sig handler.
*/
Expand All @@ -2044,6 +2062,12 @@ void debug_handler(int sig, siginfo_t *si, void *arg)
if (!harmless &&
!(_HA_ATOMIC_LOAD(&th_ctx->flags) & TH_FL_SLEEPING))
_HA_ATOMIC_OR(&th_ctx->flags, TH_FL_STUCK);

/* in case of panic, no return is planned so that we don't destroy
* the buffer's contents and we make sure not to trigger in loops.
*/
while (no_return)
wait(NULL);
}

static int init_debug_per_thread()
Expand Down

0 comments on commit b8adef0

Please sign in to comment.