Skip to content

Commit

Permalink
broker: provision dead brokers for flub replacement
Browse files Browse the repository at this point in the history
Problem: there is no way to replace a node in Flux instance
that goes down.

Call overlay_flub_provision () when a rank goes offline
so that the flub allocator can allocate its rank to a replacement.
Unprovision ranks when they return to online.
  • Loading branch information
garlick committed Aug 15, 2024
1 parent 1a59ee7 commit 0ba34b1
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions src/broker/state_machine.c
Original file line number Diff line number Diff line change
Expand Up @@ -925,6 +925,24 @@ static void broker_online_cb (flux_future_t *f, void *arg)
}
idset_destroy (loss);
}
/* A broker that drops out of s->quorum.online is provisioned
* for replacement via flub, and unprovisioned if it returns.
*/
if (previous_online) {
unsigned int id;
id = idset_first (previous_online);
while (id != IDSET_INVALID_ID) { // online -> offline
if (!idset_test (s->quorum.online, id))
(void)overlay_flub_provision (s->ctx->overlay, id, id, true);
id = idset_next (previous_online, id);
}
id = idset_first (s->quorum.online);
while (id != IDSET_INVALID_ID) { // offline -> online
if (!idset_test (previous_online, id))
(void)overlay_flub_provision (s->ctx->overlay, id, id, false);
id = idset_next (s->quorum.online, id);
}
}

idset_destroy (previous_online);
flux_future_reset (f);
Expand Down

0 comments on commit 0ba34b1

Please sign in to comment.