-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell: add extra info to doom exceptions #6453
base: master
Are you sure you want to change the base?
Conversation
ce9d74d
to
7f51a84
Compare
We might be able to shorten it by just prefixing the existing exception note with Now that I'm thinking about it, I wonder if |
Oh that's a good idea, that just slipped my mind. |
Problem: Exceptions from the doom plugin do not specify what job failed. It can be confusing to users which job failed amongst multiple in a batch job, or if the broker from a batch job itself failed. Add the job executable name to the doom exception outputs. Fixes flux-framework#6357
7f51a84
to
c00ed88
Compare
re-pushed with tweaks per comments above
so now it looks like this (example from t2614-job-shell-doom)
|
Again, it would be shorter to just put the jobid as a prefix:
Since there's no mistaking that's the jobid. However, that's pretty minor and probably not worth quibbling about. Edit: moved jobid after job.exception in example output. |
9449739
to
69e5ceb
Compare
ok, re-pushed with the output message tweak. Also had to tweak two tests that had to be updated for the expected output. Edit: output is now
|
static char buf[PATH_MAX+1] = {0}; | ||
json_t *s = json_array_get (doom->shell->info->jobspec->command, 0); | ||
const char *path = json_string_value (s); | ||
char *rv; | ||
snprintf (buf, PATH_MAX, "%s", path); | ||
rv = basename (buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to just do something like
const char *p = strrchr (s, "/");
const char *base = p ? p + 1 : s;
And avoid the copy? Maybe we should have our own implementation of basename()
that doesn't modify its argument (I think some places in the codebase may already assume GNU basename(3)
, which doesn't modify its arg, but since we support alpine and someday BSD, that's probably not safe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that would be better. When I read the manpage for basename(3) I was like "huh, it needs to modify the buffer?" Lets go with your simpler approach.
Ah, bummer. The change I suggested breaks some flux-sched checks which are matching Sorry if that was a bad suggestion. |
those tests are easily fixable (as is one I missed in |
Problem: In a batch job, it may be difficult to discern which job had an exception. When flux job attach outputs an exception, also output the jobid of the job. Update tests in t2608-job-shell-log.t and t2304-sched-simple-alloc-check.t for change in expected output.
69e5ceb
to
31800a8
Compare
Up to you if you don't mind submitting a flux-sched PR |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6453 +/- ##
==========================================
+ Coverage 83.61% 83.63% +0.02%
==========================================
Files 523 523
Lines 87547 87555 +8
==========================================
+ Hits 73199 73228 +29
+ Misses 14348 14327 -21
|
Problem: Exceptions from the doom plugin do not specify what job failed. It can be confusing to users which job failed amongst multiple in a batch job, or if the broker from a batch job itself failed.
Add the job executable name and jobid to the doom exception outputs.
Fixes #6357
we could obviously debate the message format, I went back and forth and just ended up at this. Tweak suggestions welcome.