add example for job info #229

vsoch · 2023-04-26T04:25:42Z

Problem: We do not have good examples for replicating flux job info in Python
Solution: Add an interactive demo

chu11 · 2023-04-26T05:10:12Z

I'm curious what the request and/or need was for this. flux job info is a "plumbing" command and we don't really advertise it, i.e. we decided not to put flux job info in the manpage for flux-job(1).

If a user has requested they really want to/need to get the jobspec or something for their job, and they specifically want to do it in python, perhaps we need to discuss how best to do that in python, i.e. if new function needs to be added, etc.

vsoch · 2023-04-26T05:12:13Z

@xorJane asked me directly on Mattermost for this example, and given that Python interactions might want this complete metadata, it’s worthy to add. If we wind up
Changing the UI we can just update the example.

TLDR: a real request for this exact example to demonstrate it’s needed and should be provided.

Also, I’ve mentioned before (and asked multiple times) for getting job info on the command line and in Python and disagree about it being a “plumbing” command. Getting job info back is hugely useful in many contexts.

vsoch · 2023-04-26T05:18:29Z

See flux-framework/flux-core#4761 as just one of the times, and a clearly laid out user interaction.

chu11 · 2023-04-26T05:39:03Z

See flux-framework/flux-core#4761 as just one of the times, and a clearly laid out user interaction.

The data provided by that feature is different than what is provided by flux job info, that data is more correlated to what is provided by flux jobs, and we added a function into the same area of the Python API that flux jobs uses.

Also, I’ve mentioned before (and asked multiple times) for getting job info on the command line and in Python and disagree about it being a “plumbing” command.

Maybe we have different definitions of "plumbing". As you note in the key output from flux job info.

    J
    R
    eventlog
    jobspec
    guest.exec.eventlog
    guest.input
    guest.output

The eventlogs cannot be understood without reading the RFCs, and the stdin/stdout cannot be groked without understanding our stdin/stdout protocol and encoding. That's sort of why we don't advertise it, a normal user would not have any understanding of what is being returned.

@xorJane can you provide more detail on what was the request and need? Perhaps there is a specific piece of data that was desired, and it just isn't provided via job-list/flux jobs at the moment? Update: Or maybe they were just interested in getting the jobspec?

Edit:

Getting job info back is hugely useful in many contexts.

I'm not disagreeing with "job information" in general, you are correct. I'm speaking specifically of the information provided by flux job info. You show the eventlog output from guest.output in your PR. Does a user really want to parse and decode that information to get stdout? Granted we have issue flux-framework/flux-core#4854 open and don't have a good API for it yet, but would calling flux job attach from within Python make more sense?

grondo · 2023-04-26T14:01:45Z

FWIW, I think it is fine if users use flux job info or the job-info module interface to grab the jobspec or R for the job, For example you can't view the original job environment or the full command line arguments, or the exact resource spec without it. Sometimes it is fine to use a plumbing command or API if necessary. 🤷

That being said we do have an open issue on providing a more user appropriate interface to fetch job output, and we probably need a nice Python API for getting the original jobspec, since that requires fetching the signed J from job-info and decoding it.

Note that the jobspec you are fetching directly from the KVS is the version with its environment (and possibly other keys) redacted and modified by the instance for its use. Also, only the instance owner can fetch directly from the KVS like this, so this will only work for the instance owner (i.e. a normal user couldn't use these examples in a multi-user instance)

flux job info --original jobspec takes care of fetching J and "unrwrapping" it for the user, and in this way it is probably less of a plumbing command, and for now is the suggested method to get the original jobspec.

chu11 · 2023-04-26T14:23:36Z

FWIW, I think it is fine if users use flux job info or the job-info module interface to grab the jobspec or R for the job, For example you can't view the original job environment or the full command line arguments, or the exact resource spec without it. Sometimes it is fine to use a plumbing command or API if necessary. shrug

I recall that we decided not to advertise it in the flux-job(1) manpage. I'm not sure if that was generically b/c it was "plumbing" or if it's because we thought flux job info might be confusing to users, thinking it's like flux jobs purpose? Or possibly for both reasons.

Should we start to advertise it? Or perhaps flux job info --original jobspec only? Or should we slightly tweak it? Perhaps just outputting something to the effect:

    general:
    jobspec (consider using w/ --original)
    R
    
    advanced - only if you know what you're doing:
    J
    eventlog
    guest.exec.eventlog
    guest.input
    guest.output

would make it better.

🤷

Edit: oh here's an idea, only list the advanced ones with a --verbose or similar. If we only advertise retrieving jobspec/R, it begins to look more user facing.

chu11 · 2023-04-26T14:33:21Z

re-reading this PR's contents, I think maybe what confused me is that it's really 3 different examples lumped into one, all under the heading of "job info", which I think maybe isn't the best way to organize this. I think what might be better is:

one example that covers getting general job information about a job, could list jobs for a user, get job information via job id, the basic "job list" kinda stuff. This is what 90% of users using python w/ flux would care about.
one example would cover reading stdout from a job. I think reading from guest.output is way too complex to document on the website and would be better documented as a workaround in an issue for the time being. But if there is strong opinion to publish an example on the website, that can be changed when libflux/python: convenience function to read stdio flux-core#4854 is completed.
a third example would be reading the jobspec, which goes into the kvs api space, etc. and some of the caveats that @grondo mentions above can also be covered. If/when we come up with a better API for this, we can update that example.

vsoch · 2023-04-26T18:02:17Z

FWIW, I think it is fine if users use flux job info or the job-info module interface to grab the jobspec or R for the job,

@grondo thank you for hearing me.

one example that covers getting general job information about a job, could list jobs for a user, get job information via job id, the basic "job list" kinda stuff. This is what 90% of users using python w/ flux would care about.

I again respectfully disagree. As a user I don't place things into these same categories based on design (that a developer would be biased to see). When I want "job information" or go looking for a tutorial to show me how to do that, I want the whole gamut of things, from the original jobspec, to the output contents, to the core info like status / return code. I don't want to have to know there are three different tutorials because (in the mind of the developer) "but they are different!" I am a layperson, I submit a job, and I want to know everything about it.

xorJane · 2023-04-26T18:44:21Z

Hi all! Just chiming in late with the context for my request.

Post TOSS 4 updates on LC, users have asked about the disappearance of the checkjob command on systems running Slurm, which is a handy Moab utility for seeing all sorts of job stats. Since we're not supporting Moab anymore, Jeff Long & I put together a tool that nicely formats some of the output you can get with squeue and sacct. The result looks something like

janeh@pascal83:~$ slurm_jobinfo.py -j 16436_9
JobID        :  <JobID>
JobName      :  Symmetric
State        :  PENDING
User         :  <user>
Group        :  <user's group>
Account      :  <bank>
Partition    :  pbatch
QOS          :  normal
Timelimit    :  1-00:00:00
Submit       :  Tue 04/25 08:49:06
Eligible     :  Tue 04/25 08:49:07
Start        :  Unknown
End          :  Unknown
Elapsed      :  00:00:00
Priority     :  124135
NNodes       :  4
NCPUS        :  144
MinCPUNode   :
NodeList     :  None assigned
WorkDir      :  /p/lustre1/<user directory>
SubmitLine   :  sbatch --array=0-12 Submit_4_nodes.sh
Dependency   :  (null)
EstStart     :  N/A
Reason       :  (AssocMaxJobsLimit)

A user requested a similar utility for Flux (as well as more tutorials on how to get job stats with native Flux commands), and I was trying to use the Python APIs to get there. What I have so far:

janeh@tioga10:~$ flux_jobinfo.py -j fnwSSLATbTM
Getting info for Flux JobID: fnwSSLATbTM
JobID        :  fnwSSLATbTM
JobName      :  flux
State        :  RUN
User         :  <user>
Partition    :  pdebug
Timelimit    :  4:00:00
Submit       :  Wed 04/26/2023 09:20:44
Elapsed      :  2:14:53
Start        :  Wed 04/26/2023 09:20:44
End          :  Wed 04/26/2023 13:20:44
NNodes       :  1
NCores       :  64
NTasks       :  1
NodeList     :  tioga26
Dependencies  :  []

Thanks so much for helping me out @vsoch!

chu11 · 2023-04-26T18:57:54Z

I again respectfully disagree. As a user I don't place things into these same categories based on design (that a developer would be biased to see). When I want "job information" or go looking for a tutorial to show me how to do that, I want the whole gamut of things, from the original jobspec, to the output contents, to the core info like status / return code. I don't want to have to know there are three different tutorials because (in the mind of the developer) "but they are different!". I am a layperson, I submit a job, and I want to know everything about it.

I disagree with this a bit.

Part of this is information overload. The average user mostly wants status / return code / stuff from flux jobs. Showing them a whole bunch of more advanced information can be confusing.
its not because of just "developer division", it's about what a person is looking for. If a person asks, "I'd like to look at stdout for my job", I don't think the first place they will look is for something under the header "job info". They would look under something related to "stdio" or "stdout".

FWIW I have been a user before (https://github.com/LLNL/magpie). So my opinions aren't coming solely from developer land. It's coming from that experience as well as experience aiding users from that project. I suspect you might think I am trying to "baby" users, and that is sort of what I lean towards based on my experiences.

That said, my experiences can be very biased to the users I was helping. So I don't know what collective opinion would be here on this topic.

vsoch · 2023-04-26T19:18:33Z

Part of this is information overload.

But it's not. The tutorial itself isn't that long, and it's neatly packaged under a single label that makes sense. When a person finds what they need, they are good. Information overload would be giving them three separate tutorials that seem to be for similar things, and requiring them to read through all of them to put together a single, cohesive picture. The larger issue right now is that they don't currently find information how to do this - this is why I've had to come into the Flux Slack umpteen times and ask "How do I get output? How do I get a return code?" I couldn't find it. It's a signal when @xorJane comes to me and asks how to do something, it tells me that the docs don't do a good enough job to give her that information. It means it cannot be found where someone went looking for it. It might be obvious to a core developer, but it's not obvious to a developer user.

"I'd like to look at stdout for my job", I don't think the first place they will look is for something under the header "job info". They would look under something related to "stdio" or "stdout".

Having worked on many client tools for many years, I respectfully again disagree. In fact, the user doesn't even know what they are looking for, so "job info" would click in their head as "information about my job" - yes! Many users don't even think in terms of stdin and stdout, those are more advanced concepts (maybe for power users, which maybe the lab is biased to have, but not most centers).

FWIW I have been a user before (https://github.com/LLNL/magpie). So my opinions aren't coming solely from developer land. It's coming from that experience as well as experience aiding users from that project. I suspect you might think I am trying to "baby" users, and that is sort of what I lean towards based on my experiences.

I don't think you are trying to baby users, I'm just not sure you are putting yourself in all of the different shoes you might! I've worked in different contexts sitting within labs and also providing support for users at Duke, Stanford, and (not much here) but several fairly large open source communities. My bias / perspective comes from both being a user, and learning over time how to put myself in their mental map and then best derive a piece of documentation or similar to make it understandable.

And to be clear, I would be totally in support of refactor / change of the interactions themselves, but until that happens, this is currently how someone would do this, and I think we should provide it as a simple tutorial for those that come looking for it. It can be updated later if needed. It doesn't make sense to me to split it up, or continue to hide information just because of personal opinions about labeling it plumbing or not.

My 0.02.

chu11 · 2023-04-26T20:41:24Z

I guess we'll just agree to disagree. As a side question, what part of the script @xorJane needed to get the jobspec? From the above, it looks like everything is from job list.

Additional comments.

as i said above, flux job info was intentionally left out of flux-job(1) in the past, so if this is becoming user facing, that needs to be corrected. So we should make that a TODO.
I would recommend you decode the guest.output into actually stdout in your example. I'm not sure showing the raw eventlog output as "standard output" is very useful and if someone does want to get stdout from a job, they'll need to know how to decode it.

grondo · 2023-04-26T20:42:59Z

@xorJane, would you mind if I copied your comment into a new flux-core issue (or perhaps a discussion)? I think many of thie items you are looking for are already available from the jobInfo objects returned from JobList, but it might be nice if we kept our suggestions in a separate issue so it isn't mixed up with the discussion about documentation.

Then as we find holes in the API where data is missing, we can open up separate issues if necessary (I do think we should offer a high-level Python API call to get the original or redacted jobspec for example)

grondo · 2023-04-26T20:45:52Z

what part of the script @xorJane needed to get the jobspec? From the above, it looks like everything is from job list.

Well there's WorkDir (cwd)...

chu11 · 2023-04-26T20:54:23Z

Well there's WorkDir (cwd)...

Ahhh I was only looking at the flux output, and I now see that it's an in-progress work.

xorJane · 2023-04-26T21:11:45Z

@grondo I don't mind at all! Also, does Flux offer job start prediction, at least soon before the job actually starts? I noticed that the reported start time is often 12/31/1969 before a job starts, but I'm wondering if that reporting changes closer to runtime.

grondo · 2023-04-26T21:43:44Z

Also, does Flux offer job start prediction

With Fluxion (flux-sched) there is an optional t_estimate scheduler annoation (i.e. sched.t_estimate), however, I've only seen it working on the highest priority pending job. There's an issue open on that here: flux-framework/flux-sched#1015

noticed that the reported start time is often 12/31/1969 before a job starts

Yes, the t_* fields are initialized to zero, so a job that hasn't started yet will have a start time of 0 seconds since epoch or 12/31/1969... Same for other time fields, e.g. t_cleanup etc.

vsoch · 2023-04-26T21:44:56Z

I would like to point out that we have two people (myself and Jane) saying "we noticed the absence of small bit of documentation we want need" (and we've put in the work to figure it out and provide for others) and the first response is "you don't really need/want it."

Feels a bit... off. I certainly hope this is not how a new contributor would be received here.

chu11 · 2023-04-26T21:59:14Z

I would like to point out that we have two people (myself and Jane) saying "we noticed the absence of small bit of documentation we want need" (and we've put in the work to figure it out and provide for others) and the first response is "you don't really need/want it."

Apologies if that is how it came off. The issue was not the job list stuff, that was absolutely important and necessary.

The subtlety was there was a conscious decision to not document flux job info in the past. Maybe that was a bad decision by flux-core people, but it was a decision that was made. So I was trying to understand what need there was, and subsequently if there was a better way, or a new way should be created.

vsoch · 2023-04-26T22:12:28Z

The subtlety was there was a conscious decision to not document flux job info in the past. Maybe that was a bad decision by flux-core people, but it was a decision that was made. So I was trying to understand what need there was, and subsequently if there was a better way, or a new way should be created.

I think that's totally valid! I think my high level observation (and suggestion for the future) might be a slight tweak to how this is communicated. I'll also say there are no hard feelings - I've learned in my open source experience it's important to have very thick skin on these issue threads (cue memory of me biking home early in my OSS experience completely sobbing because of a conversation, lol). If it helps, what I try to do when there is a new contributor (or someone that I am more formal with because I don't know them super well yet) would be something of the following pattern:

Thank them for the contribution + additional commentary to restate their goals so they see I understand
Clarification of the use case / problem they were having (this also tells them that I'm hearing them properly)
Mention of discussion to tweak the work and best address the problem
And then technical discussion / details and follow up

So (as a quick example) for the PR here (and there are many ways to skin a cat) but one approach might be like:

Hey <name> - thanks for adding this tutorial! So that I best understand, you were trying to get information about a job using the Flux Python API - and that includes output, job info/metadata, and the jobspec? In the past we had declared this "job info" group to be "plumbing" or "for developer users only" but it sounds like based on your need, we might need to adjust this view. Let's figure out the best way to add these examples, and we can work together to adjust the code here appropriately. Does that sound OK? <ping other project devs> should we rethink how the underlying API is working here, and then if so, should we release this example in the meantime (and update later) since it's the current way to go about it?

That's just one example, but in the above I've thanked the contributor, asked for clarity about their problem, and then explained my view / opinion (hopefully without making them feel like they have to go on the defense "yes I really want/need this!" The issue itself (or PR in this case) should be sufficient for that. I'll also emphasize just saying "let's work together on this" to set the original tone. And then after all that (when the contributor feels heard, and involved) I bring in the other devs to start the more technical discussion.

Again, totally no hard feelings - I can't tell you how many times I've messed up with interactions in issues - it's really hard. A lot of times I'll also have negative experiences, and see patterns, or maybe wake up the next morning and realize something isn't sitting right. A lot of it is really subtle, so that's why it's hard. It's important we can talk about it, definitely between the core team here, so when a new contributor does show up, we don't scare them away! 😆

vsoch · 2023-05-03T05:34:42Z

Just for reference I'm coming here now to reference this tutorial to remember how to fully interact with jobs :)

vsoch · 2023-06-09T23:42:15Z

Another time I'm visiting this PR to copy paste this code for another script that I need to get job info for!

grondo · 2023-06-09T23:53:44Z

Looks like this has some conflicts. Also the PR branch is pushed to this repo instead of a fork, and as we saw that seems to break mergify. I guess you'll have to create a new PR or we can merge this one manually. However, I'd suggest creating all PRs from a personal fork in the future.

Problem: We do not have good examples for replicating flux job info in Python Solution: Add an interactive demo Signed-off-by: vsoch <[email protected]>

Signed-off-by: vsoch <[email protected]>

vsoch · 2023-06-10T00:20:43Z

All set!

grondo mentioned this pull request Apr 26, 2023

How to access a wide range of job information for a user tool flux-framework/flux-core#5119

Closed

chu11 mentioned this pull request Apr 26, 2023

job-list: access user protected data from job-info flux-framework/flux-core#5120

Open

chu11 mentioned this pull request Apr 26, 2023

flux-job(1): document flux job info flux-framework/flux-core#5121

Closed

vsoch mentioned this pull request May 18, 2023

job usage: archive long term job records in the accounting db flux-framework/flux-accounting#353

Closed

vsoch added 3 commits June 9, 2023 18:10

add example for job info

9600e08

Problem: We do not have good examples for replicating flux job info in Python Solution: Add an interactive demo Signed-off-by: vsoch <[email protected]>

bug with rst rendering in flux job tutorials

e71be2e

Signed-off-by: vsoch <[email protected]>

add auto examples to build

da924eb

Signed-off-by: vsoch <[email protected]>

vsoch force-pushed the add/flux-python-info-example branch from 92e212b to da924eb Compare June 10, 2023 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add example for job info #229

add example for job info #229

vsoch commented Apr 26, 2023

chu11 commented Apr 26, 2023

vsoch commented Apr 26, 2023 •

edited

Loading

vsoch commented Apr 26, 2023

chu11 commented Apr 26, 2023 •

edited

Loading

grondo commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023 •

edited

Loading

vsoch commented Apr 26, 2023

xorJane commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023

vsoch commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023

grondo commented Apr 26, 2023

grondo commented Apr 26, 2023

chu11 commented Apr 26, 2023

xorJane commented Apr 26, 2023

grondo commented Apr 26, 2023

vsoch commented Apr 26, 2023

chu11 commented Apr 26, 2023

vsoch commented Apr 26, 2023 •

edited

Loading

vsoch commented May 3, 2023

vsoch commented Jun 9, 2023

grondo commented Jun 9, 2023

vsoch commented Jun 10, 2023

add example for job info #229

Are you sure you want to change the base?

add example for job info #229

Conversation

vsoch commented Apr 26, 2023

chu11 commented Apr 26, 2023

vsoch commented Apr 26, 2023 • edited Loading

vsoch commented Apr 26, 2023

chu11 commented Apr 26, 2023 • edited Loading

grondo commented Apr 26, 2023 • edited Loading

chu11 commented Apr 26, 2023 • edited Loading

chu11 commented Apr 26, 2023 • edited Loading

vsoch commented Apr 26, 2023

xorJane commented Apr 26, 2023 • edited Loading

chu11 commented Apr 26, 2023

vsoch commented Apr 26, 2023 • edited Loading

chu11 commented Apr 26, 2023

grondo commented Apr 26, 2023

grondo commented Apr 26, 2023

chu11 commented Apr 26, 2023

xorJane commented Apr 26, 2023

grondo commented Apr 26, 2023

vsoch commented Apr 26, 2023

chu11 commented Apr 26, 2023

vsoch commented Apr 26, 2023 • edited Loading

vsoch commented May 3, 2023

vsoch commented Jun 9, 2023

grondo commented Jun 9, 2023

vsoch commented Jun 10, 2023

vsoch commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023 •

edited

Loading

grondo commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023 •

edited

Loading

chu11 commented Apr 26, 2023 •

edited

Loading

xorJane commented Apr 26, 2023 •

edited

Loading

vsoch commented Apr 26, 2023 •

edited

Loading

vsoch commented Apr 26, 2023 •

edited

Loading