Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-ingest: improve cleanup and stats output #6438

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Nov 14, 2024

This adds a count of "trash" processes for each work crew to the flux module stats job-ingest output, so that there is some indication that the module is tracking subprocesses whose input has been closed, but that have not yet exited.

While in there I noticed that the cleanup code that sends SIGKILL to running workers doesn't signal workers that are lingering in the trash list. Add code to signal those too and a test.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Problem: when validator/frobnicator processes do not exit after
their stdin is closed, 'flux module stats job-ingest' does not
list them.

Provide a count of "trash" processes.

Fixes flux-framework#6432
Problem: when job-ingest shuts down, it SIGKILLs any running workers,
but ignores workers whose stdin has been closed but might be stuck.

Send SIGKILL to lingering workers that have been placed in the
slot's trash list.

Note: this is not strictly necessary because the broker SIGKILLs
any running subprocesses when it shuts down, but it seems proper for
the creator of the mess to clean it up.
Problem: there is no test coverage for a validator that refuses
to be shut down.

Add a test.
@mergify mergify bot merged commit ae1f516 into flux-framework:master Nov 14, 2024
33 of 34 checks passed
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.

Project coverage is 83.63%. Comparing base (1de307c) to head (95da25e).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/job-ingest/worker.c 80.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6438   +/-   ##
=======================================
  Coverage   83.63%   83.63%           
=======================================
  Files         524      524           
  Lines       87672    87693   +21     
=======================================
+ Hits        73324    73346   +22     
+ Misses      14348    14347    -1     
Files with missing lines Coverage Δ
src/modules/job-ingest/workcrew.c 82.08% <100.00%> (+1.02%) ⬆️
src/modules/job-ingest/worker.c 78.29% <80.00%> (+1.44%) ⬆️

... and 14 files with indirect coverage changes

@garlick garlick deleted the issue#6432 branch November 14, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants