Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to access a wide range of job information for a user tool #5119

Closed
grondo opened this issue Apr 26, 2023 · 5 comments
Closed

How to access a wide range of job information for a user tool #5119

grondo opened this issue Apr 26, 2023 · 5 comments

Comments

@grondo
Copy link
Contributor

grondo commented Apr 26, 2023

          Hi all! Just chiming in late with the context for my request.

Post TOSS 4 updates on LC, users have asked about the disappearance of the checkjob command on systems running Slurm, which is a handy Moab utility for seeing all sorts of job stats. Since we're not supporting Moab anymore, Jeff Long & I put together a tool that nicely formats some of the output you can get with squeue and sacct. The result looks something like

janeh@pascal83:~$ slurm_jobinfo.py -j 16436_9
JobID        :  <JobID>
JobName      :  Symmetric
State        :  PENDING
User         :  <user>
Group        :  <user's group>
Account      :  <bank>
Partition    :  pbatch
QOS          :  normal
Timelimit    :  1-00:00:00
Submit       :  Tue 04/25 08:49:06
Eligible     :  Tue 04/25 08:49:07
Start        :  Unknown
End          :  Unknown
Elapsed      :  00:00:00
Priority     :  124135
NNodes       :  4
NCPUS        :  144
MinCPUNode   :
NodeList     :  None assigned
WorkDir      :  /p/lustre1/<user directory>
SubmitLine   :  sbatch --array=0-12 Submit_4_nodes.sh
Dependency   :  (null)
EstStart     :  N/A
Reason       :  (AssocMaxJobsLimit)

A user requested a similar utility for Flux (as well as more tutorials on how to get job stats with native Flux commands), and I was trying to use the Python APIs to get there. What I have so far:

janeh@tioga10:~$ flux_jobinfo.py -j fnwSSLATbTM
Getting info for Flux JobID: fnwSSLATbTM
JobID        :  fnwSSLATbTM
JobName      :  flux
State        :  RUN
User         :  <user>
Partition    :  pdebug
Timelimit    :  4:00:00
Submit       :  Wed 04/26/2023 09:20:44
Elapsed      :  2:14:53
Start        :  Wed 04/26/2023 09:20:44
End          :  Wed 04/26/2023 13:20:44
NNodes       :  1
NCores       :  64
NTasks       :  1
NodeList     :  tioga26
Dependencies  :  []

Thanks so much for helping me out @vsoch!

Originally posted by @xorJane in flux-framework/flux-docs#229 (comment)

@chu11
Copy link
Member

chu11 commented Apr 26, 2023

link to a brainstorming i just did #5120

@grondo
Copy link
Contributor Author

grondo commented Apr 26, 2023

Some initial comments:

  • Group - The primary group is not captured when submitting a Flux job, so this item does not apply
  • Account - I think in Flux we call this the "bank" and it is not yet available from the job listing service (job-list: support retrieving job bank? #4697)
  • QOS - Flux does not have an equivalent to QoS at this point
  • Eligible - Unsure what this is, but I'm confident there is no Flux equivalent
  • Priority - the priority is available directly from the priority field of the JobInfo object for a job
  • MinCPUNode - no Flux equivalent
  • WorkDir - not available in the JobInfo object (will have to fetch jobspec)
  • SubmitLine - since jobs can be submitted in so many different ways in Flux, there is no equivalent to this
  • EstStart - in theory this should be available in job.sched.t_estimate, however there's an open issue on why this annotation is not always present (Only next job in queue has a sched.t_estimate flux-sched#1015)
  • Reason - not really an equivalent in Flux, however see the contextual_info member of JobInfo, it returns different strings dependent on the state the job is in (ie. waiting for priority, waiting for dependencies, etc)

Here I assume you are getting the bulk of the information from the flux.job.JobList or more simply for a single job flux.job.job_list_id functions.

E.g. here's a simple script to get the job priority:

import sys
import flux
from flux.job import JobID, job_list_id

job = job_list_id(flux.Flux(), JobID(sys.argv[1])).get_jobinfo()

print(f"priority = {job.priority}")

The get_jobinfo() function returns a flux.job.JobInfo object, which has many of the properties you are after above.

@chu11
Copy link
Member

chu11 commented Apr 26, 2023

WorkDir - not available in the JobInfo object (will have to fetch jobspec)

Is this something could be made available in JobInfo? It certainly feels borderline on ok vs not ok.

@grondo
Copy link
Contributor Author

grondo commented Apr 26, 2023

Here's an example of how to currently fetch the jobspec and pull information like the WorkDir out of it:

import sys
import flux
import json
from flux.job import JobID

h = flux.Flux()
payload = {"id": JobID(sys.argv[1]), "keys": ["jobspec"], "flags": 0}
jobspec = json.loads(h.rpc("job-info.lookup", payload).get()["jobspec"])

cwd = jobspec["attributes"]["system"]["cwd"]
print(f"WorKDir: {cwd}")

Note that this is redacted jobspec, so the environment has been removed, and the instance may have added or modified fields (like adding a default queue or duration).

I don't think the bank is actually added yet, so that is not available if the user did not set one on the command line.

@grondo
Copy link
Contributor Author

grondo commented Apr 26, 2023

Since only the job user can fetch jobspec, you'd want to catch OSError from the RPC and just suppress the workdir when the errno is EPERM.

Is this something could be made available in JobInfo? It certainly feels borderline on ok vs not ok.

Good question. Slurm seems to show it for other user jobs so it is likely ok 🤷

@flux-framework flux-framework locked and limited conversation to collaborators Apr 28, 2023
@grondo grondo converted this issue into discussion #5130 Apr 28, 2023
chu11 added a commit to chu11/flux-core that referenced this issue May 6, 2023
Problem: Several users have requested getting a job's current working
directory from flux jobs/job-list.

Solution: Add retrieval of job current working directory via the "cwd"
attribute.

Fixes flux-framework#5119
chu11 added a commit to chu11/flux-core that referenced this issue May 8, 2023
Problem: Several users have requested getting a job's current working
directory from flux jobs/job-list.

Solution: Add retrieval of job current working directory via the "cwd"
attribute.

Fixes flux-framework#5119

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants