Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support partially allocated jobs across scheduler reload #6445

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Commits on Nov 19, 2024

  1. job-manager: support partial-ok in hello request

    Problem: RFC 27 allows the scheduler to send a partial-ok flag
    in the hello request, and then receive partially allocated jobs
    in hello responses.
    
    If the hello request includes this flag, pass it on to housekeeping.
    For each partially released housekeeping job, include the 'free'
    idset in the response per RFC 27.
    garlick committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    ad86f6a View commit details
    Browse the repository at this point in the history
  2. libschedutil: add SCHEDUTIL_HELLO_PARTIAL_OK flag

    Problem: libschedutil provides no way for the scheduler to
    indicate that the partial-ok flag should be set in the hello
    request.
    
    Add the SCHEDUTIL_HELLO_PARTIAL_OK flag which is passed to
    schedutil_create().
    garlick committed Nov 19, 2024
    Configuration menu
    Copy the full SHA
    c7fc95d View commit details
    Browse the repository at this point in the history

Commits on Nov 20, 2024

  1. libschedutil: support hello 'free' key

    Problem: when processing hello responses, all schedulers now need
    to process R - free for partial releases.
    
    As a convenience, change the libschedutil hello callback to subtract
    the free idset from the R it fetched from the KVS.
    
    Note that the scheduling key, if present, remains the full object
    which is opaque to flux-core.
    garlick committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    8d1a441 View commit details
    Browse the repository at this point in the history
  2. sched-simple: support partial hello responses

    Problem: sched-simple does not support partial hello responses.
    
    Set the SCHEDUTIL_HELLO_PARTIAL_OK flag.
    Add a 'test-hello-nopartial' module option to get the old behavior.
    
    Set test-hello-nopartial in the current test of partial housekeeping
    release.
    garlick committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    8e7e06b View commit details
    Browse the repository at this point in the history
  3. testsuite: cover hello with partial allocation

    Problem: there is no coverage of reloading the scheduler with
    partially released jobs in housekeeping.
    
    Add a test.
    garlick committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    409670a View commit details
    Browse the repository at this point in the history
  4. sched-simple: improve error log message

    Problem: when the hello protocol cannot process a job, it logs
    the name of the wrong rlist function.
    
    Make the log message a little more high level.
    garlick committed Nov 20, 2024
    Configuration menu
    Copy the full SHA
    24436e3 View commit details
    Browse the repository at this point in the history