-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add example for job info #229
base: master
Are you sure you want to change the base?
Conversation
I'm curious what the request and/or need was for this. If a user has requested they really want to/need to get the jobspec or something for their job, and they specifically want to do it in python, perhaps we need to discuss how best to do that in python, i.e. if new function needs to be added, etc. |
@xorJane asked me directly on Mattermost for this example, and given that Python interactions might want this complete metadata, it’s worthy to add. If we wind up TLDR: a real request for this exact example to demonstrate it’s needed and should be provided. Also, I’ve mentioned before (and asked multiple times) for getting job info on the command line and in Python and disagree about it being a “plumbing” command. Getting job info back is hugely useful in many contexts. |
See flux-framework/flux-core#4761 as just one of the times, and a clearly laid out user interaction. |
The data provided by that feature is different than what is provided by
Maybe we have different definitions of "plumbing". As you note in the key output from
The eventlogs cannot be understood without reading the RFCs, and the stdin/stdout cannot be groked without understanding our stdin/stdout protocol and encoding. That's sort of why we don't advertise it, a normal user would not have any understanding of what is being returned. @xorJane can you provide more detail on what was the request and need? Perhaps there is a specific piece of data that was desired, and it just isn't provided via Edit:
I'm not disagreeing with "job information" in general, you are correct. I'm speaking specifically of the information provided by |
FWIW, I think it is fine if users use That being said we do have an open issue on providing a more user appropriate interface to fetch job output, and we probably need a nice Python API for getting the original jobspec, since that requires fetching the signed J from job-info and decoding it. Note that the jobspec you are fetching directly from the KVS is the version with its environment (and possibly other keys) redacted and modified by the instance for its use. Also, only the instance owner can fetch directly from the KVS like this, so this will only work for the instance owner (i.e. a normal user couldn't use these examples in a multi-user instance)
|
I recall that we decided not to advertise it in the Should we start to advertise it? Or perhaps
would make it better. 🤷 Edit: oh here's an idea, only list the advanced ones with a |
re-reading this PR's contents, I think maybe what confused me is that it's really 3 different examples lumped into one, all under the heading of "job info", which I think maybe isn't the best way to organize this. I think what might be better is:
|
@grondo thank you for hearing me.
I again respectfully disagree. As a user I don't place things into these same categories based on design (that a developer would be biased to see). When I want "job information" or go looking for a tutorial to show me how to do that, I want the whole gamut of things, from the original jobspec, to the output contents, to the core info like status / return code. I don't want to have to know there are three different tutorials because (in the mind of the developer) "but they are different!" I am a layperson, I submit a job, and I want to know everything about it. |
Hi all! Just chiming in late with the context for my request. Post TOSS 4 updates on LC, users have asked about the disappearance of the
A user requested a similar utility for Flux (as well as more tutorials on how to get job stats with native Flux commands), and I was trying to use the Python APIs to get there. What I have so far:
Thanks so much for helping me out @vsoch! |
I disagree with this a bit.
FWIW I have been a user before (https://github.com/LLNL/magpie). So my opinions aren't coming solely from developer land. It's coming from that experience as well as experience aiding users from that project. I suspect you might think I am trying to "baby" users, and that is sort of what I lean towards based on my experiences. That said, my experiences can be very biased to the users I was helping. So I don't know what collective opinion would be here on this topic. |
But it's not. The tutorial itself isn't that long, and it's neatly packaged under a single label that makes sense. When a person finds what they need, they are good. Information overload would be giving them three separate tutorials that seem to be for similar things, and requiring them to read through all of them to put together a single, cohesive picture. The larger issue right now is that they don't currently find information how to do this - this is why I've had to come into the Flux Slack umpteen times and ask "How do I get output? How do I get a return code?" I couldn't find it. It's a signal when @xorJane comes to me and asks how to do something, it tells me that the docs don't do a good enough job to give her that information. It means it cannot be found where someone went looking for it. It might be obvious to a core developer, but it's not obvious to a developer user.
Having worked on many client tools for many years, I respectfully again disagree. In fact, the user doesn't even know what they are looking for, so "job info" would click in their head as "information about my job" - yes! Many users don't even think in terms of stdin and stdout, those are more advanced concepts (maybe for power users, which maybe the lab is biased to have, but not most centers).
I don't think you are trying to baby users, I'm just not sure you are putting yourself in all of the different shoes you might! I've worked in different contexts sitting within labs and also providing support for users at Duke, Stanford, and (not much here) but several fairly large open source communities. My bias / perspective comes from both being a user, and learning over time how to put myself in their mental map and then best derive a piece of documentation or similar to make it understandable. And to be clear, I would be totally in support of refactor / change of the interactions themselves, but until that happens, this is currently how someone would do this, and I think we should provide it as a simple tutorial for those that come looking for it. It can be updated later if needed. It doesn't make sense to me to split it up, or continue to hide information just because of personal opinions about labeling it plumbing or not. My 0.02. |
I guess we'll just agree to disagree. As a side question, what part of the script @xorJane needed to get the jobspec? From the above, it looks like everything is from Additional comments.
|
@xorJane, would you mind if I copied your comment into a new flux-core issue (or perhaps a discussion)? I think many of thie items you are looking for are already available from the Then as we find holes in the API where data is missing, we can open up separate issues if necessary (I do think we should offer a high-level Python API call to get the original or redacted jobspec for example) |
Well there's WorkDir (cwd)... |
Ahhh I was only looking at the flux output, and I now see that it's an in-progress work. |
@grondo I don't mind at all! Also, does Flux offer job start prediction, at least soon before the job actually starts? I noticed that the reported start time is often 12/31/1969 before a job starts, but I'm wondering if that reporting changes closer to runtime. |
With Fluxion (flux-sched) there is an optional
Yes, the |
I would like to point out that we have two people (myself and Jane) saying "we noticed the absence of small bit of documentation we want need" (and we've put in the work to figure it out and provide for others) and the first response is "you don't really need/want it." Feels a bit... off. I certainly hope this is not how a new contributor would be received here. |
Apologies if that is how it came off. The issue was not the The subtlety was there was a conscious decision to not document |
I think that's totally valid! I think my high level observation (and suggestion for the future) might be a slight tweak to how this is communicated. I'll also say there are no hard feelings - I've learned in my open source experience it's important to have very thick skin on these issue threads (cue memory of me biking home early in my OSS experience completely sobbing because of a conversation, lol). If it helps, what I try to do when there is a new contributor (or someone that I am more formal with because I don't know them super well yet) would be something of the following pattern:
So (as a quick example) for the PR here (and there are many ways to skin a cat) but one approach might be like:
That's just one example, but in the above I've thanked the contributor, asked for clarity about their problem, and then explained my view / opinion (hopefully without making them feel like they have to go on the defense "yes I really want/need this!" The issue itself (or PR in this case) should be sufficient for that. I'll also emphasize just saying "let's work together on this" to set the original tone. And then after all that (when the contributor feels heard, and involved) I bring in the other devs to start the more technical discussion. Again, totally no hard feelings - I can't tell you how many times I've messed up with interactions in issues - it's really hard. A lot of times I'll also have negative experiences, and see patterns, or maybe wake up the next morning and realize something isn't sitting right. A lot of it is really subtle, so that's why it's hard. It's important we can talk about it, definitely between the core team here, so when a new contributor does show up, we don't scare them away! 😆 |
Just for reference I'm coming here now to reference this tutorial to remember how to fully interact with jobs :) |
Another time I'm visiting this PR to copy paste this code for another script that I need to get job info for! |
Looks like this has some conflicts. Also the PR branch is pushed to this repo instead of a fork, and as we saw that seems to break mergify. I guess you'll have to create a new PR or we can merge this one manually. However, I'd suggest creating all PRs from a personal fork in the future. |
Problem: We do not have good examples for replicating flux job info in Python Solution: Add an interactive demo Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
92e212b
to
da924eb
Compare
All set! |
Problem: We do not have good examples for replicating flux job info in Python
Solution: Add an interactive demo
Ping @xorJane !