-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broker: enable brokers to be added to running instances #5184
base: master
Are you sure you want to change the base?
Commits on Aug 15, 2024
-
broker: ensure CURVE certificate has a name
Problem: internally generated curve certs are not named, so overlay_cert_name() can return NULL, but a name is required when authorizing a cert. This API inconsistency results in extra code and confusion when implementing a new boot method. Use the rank as the name for internally generated certs.
Configuration menu - View commit details
-
Copy full SHA for 7ee4e69 - Browse repository at this point
Copy the full SHA 7ee4e69View commit details -
broker: allow instance size > PMI bootstrap size
Problem: there is no way to bootstrap a flux instance using PMI with ranks (initially) missing. Allow the 'size' broker attribute to be set on the command line. If set to a value greater than the PMI size, perform the PMI exchange as usual with the PMI size, but configure the overlay topology with the additional ranks. Since 'hostlist' is an immutable attribute that is expected to be set by the bootstrap implementation, set it to include placeholders for the ranks that haven't connected yet "extra[0-N]" so we get something other than "(null)" in the logs.
Configuration menu - View commit details
-
Copy full SHA for 2e9f9f1 - Browse repository at this point
Copy the full SHA 2e9f9f1View commit details -
broker: refactor bootstrap block
Problem: the code block that selects which boot method to use is not very clear. Simplify code block so that the default path is clear and adding a boot method won't increase complexity.
Configuration menu - View commit details
-
Copy full SHA for b6328d9 - Browse repository at this point
Copy the full SHA b6328d9View commit details -
broker: add flub bootstrap method
Problem: there is no way to add brokers to an instance that has extra slots available. Add support for FLUB, the FLUx Bootstrap protocol, used when the broker is started with broker.boot-server=<uri> The bootstrap protocol consists of two RPCs: 1) overlay.flub-getinfo, which requests the allocation of an available rank from rank 0 of the instance that is being extended, and also retrieves the instance size and some broker attributes. 2) overlay.flub-kex, which exchanges public keys with the new rank's TBON parent and obtains the parent's TBON URI. Assumptions: - all ranks have the same topology configuration Limitations (for now): - hostnames will be logged as extra[0-N] - a broker rank cannot be re-allocated to a new broker - a broker cannot replace one that failed in a regular instance - dummy resources for the max size of the instance must be configured
Configuration menu - View commit details
-
Copy full SHA for 35ac794 - Browse repository at this point
Copy the full SHA 35ac794View commit details -
broker: add flub RPC methods to overlay
Problem: the flub bootstrap method requires broker services. Add the following services (instance owner only): overlay.flub-getinfo (rank 0 only) Allocate an unused rank from rank 0 and also return size and misc. broker attributes to be set in the new broker overlay.flub-kex (peer rank) Exchange public keys with the TBON parent and obtain its zeromq URI. Add overlay_flub_provision() which is called by boot_pmi.c when extra ranks are configured, making those ranks available for allocation.
Configuration menu - View commit details
-
Copy full SHA for c03df33 - Browse repository at this point
Copy the full SHA c03df33View commit details -
testsuite: add coverage for instance size override
Problem: there is no test coverage for broker bootstrap with a PMI size less than the actual size. Add some tests.
Configuration menu - View commit details
-
Copy full SHA for 9ccf30c - Browse repository at this point
Copy the full SHA 9ccf30cView commit details -
testsuite: cover flub bootstrap
Problem: there is no test coverage for adding brokers to a flux instance. Add some tests.
Configuration menu - View commit details
-
Copy full SHA for 1a59ee7 - Browse repository at this point
Copy the full SHA 1a59ee7View commit details -
broker: provision dead brokers for flub replacement
Problem: there is no way to replace a node in Flux instance that goes down. Call overlay_flub_provision () when a rank goes offline so that the flub allocator can allocate its rank to a replacement. Unprovision ranks when they return to online.
Configuration menu - View commit details
-
Copy full SHA for 0ba34b1 - Browse repository at this point
Copy the full SHA 0ba34b1View commit details