Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: job-list: store inactive job data in job database #4336

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

chu11
Copy link
Member

@chu11 chu11 commented May 19, 2022

Per discussion in #4273, some notes on what this does:

  • add an sqlite database that stores all inactive data to it.

  • db table is:

const char *sql_create_table = "CREATE TABLE if not exists jobs("
                               "  id CHAR(16) PRIMARY KEY,"
                               "  t_inactive REAL,"
                               "  jobdata JSON,"
                               "  eventlog TEXT,"
                               "  jobspec JSON,"
                               "  R JSON"

Where jobdata is a json blob containing all job data (the all job-list attribute from #4324, but added a states_mask which is used internally)

The main reason we have a t_inactive column is b/c this is backwards compatible to sqlite versions that don't have json support.

  • database archiving is only enabled if a dbpath is specified in a job-list toml config or if statedir is configured. So in other words, if no db is configured, it behaves just like before.

Ideas:

  • should the jobdata have a version embedded? if things like job states bitmask or things like that change?

@cmoussa1
Copy link
Member

I'm sorry if this isn't related to this PR but I figured I would double check. Is the plan to also adjust the job-archive DB to match the schema you have proposed in the PR description? I only ask because I currently use this query to fetch data from the job-archive DB:

SELECT userid,id,t_submit,t_run,t_inactive,ranks,R,jobspec FROM jobs

If so, that's totally fine - I will just need to tweak the query slightly to get the data I need.

@chu11
Copy link
Member Author

chu11 commented May 19, 2022

I'm sorry if this isn't related to this PR but I figured I would double check. Is the plan to also adjust the job-archive DB to match the schema you have proposed in the PR description?

The plan is that we would leave the current job-archive the way it is until flux-accounting can adjust to the new database, everything gets deployed, etc. then we can eventually retire job-archive.

@cmoussa1
Copy link
Member

Sounds good! Thanks for clarifying.

@chu11 chu11 closed this May 19, 2022
@chu11 chu11 deleted the issue4273_job_db branch May 19, 2022 23:33
@chu11
Copy link
Member Author

chu11 commented May 19, 2022

oh crap, i accidentally did git push origin :issue4273_job_db, thus deleting remotely deleting the branch :P (i guess I missed that middle click for the left hand side). re-opening

@codecov
Copy link

codecov bot commented Aug 10, 2022

Codecov Report

Merging #4336 (ef200e1) into master (8134e0c) will increase coverage by 0.15%.
The diff coverage is 70.53%.

❗ Current head ef200e1 differs from pull request most recent head b46a315. Consider uploading reports for the commit b46a315 to get more accurate results

@@            Coverage Diff             @@
##           master    #4336      +/-   ##
==========================================
+ Coverage   83.37%   83.53%   +0.15%     
==========================================
  Files         401      392       -9     
  Lines       67517    65936    -1581     
==========================================
- Hits        56291    55077    -1214     
+ Misses      11226    10859     -367     
Impacted Files Coverage Δ
src/modules/job-list/stats.c 82.14% <ø> (+7.14%) ⬆️
src/modules/job-list/util.c 0.00% <0.00%> (ø)
src/modules/job-list/job_util.c 87.07% <54.54%> (-5.79%) ⬇️
src/modules/job-list/job_db.c 65.16% <65.16%> (ø)
src/modules/job-list/list.c 71.55% <73.14%> (-0.08%) ⬇️
src/modules/job-list/job_state.c 72.70% <82.92%> (-0.83%) ⬇️
src/modules/job-list/job_data.c 92.10% <92.10%> (ø)
src/modules/job-list/job-list.c 80.72% <100.00%> (+0.97%) ⬆️
src/common/libsubprocess/local.c 78.72% <0.00%> (-5.67%) ⬇️
src/modules/job-info/allow.c 71.42% <0.00%> (-5.24%) ⬇️
... and 117 more

@chu11 chu11 force-pushed the issue4273_job_db branch 2 times, most recently from 19ca457 to 80ce9ba Compare September 1, 2022 19:19
@chu11
Copy link
Member Author

chu11 commented Sep 1, 2022

just re-based and and re-pushed fixing up merge conflicts due to recent changes

@chu11
Copy link
Member Author

chu11 commented Sep 3, 2022

rebased - fixing conflicts from #4542

@chu11
Copy link
Member Author

chu11 commented Sep 16, 2022

repushed, rebased on PR #4575

@chu11
Copy link
Member Author

chu11 commented Sep 20, 2022

rebased & re-pushed given #4575 going in

@chu11 chu11 force-pushed the issue4273_job_db branch 2 times, most recently from 250ae40 to 0eee2e4 Compare October 7, 2022 18:15
@chu11
Copy link
Member Author

chu11 commented Apr 14, 2023

Recently the topic of storing inactive and possibly purged jobs in a database and then being able to retrieve them later on via flux-jobs was brought up. Some of that work was done in this PR. This PR was initially submitted nearly a year ago, so I think it was just forgotten. So hopefully this message restarts conversation on it :-)

Admittedly needs to be re-looked at given possible new needs / requirements / changes in flux-core / etc.

See additional conversation given #4914

@chu11 chu11 force-pushed the issue4273_job_db branch 2 times, most recently from 712dd10 to 95ea6e8 Compare April 18, 2023 22:54
@chu11 chu11 changed the title job-list: store inactive job data in job database WIP: job-list: store inactive job data in job database Apr 28, 2023
@chu11
Copy link
Member Author

chu11 commented Apr 28, 2023

making this WIP, just wanted to point out a few points mostly from #4914

  • an updated version of this PR this should go in after the constraint support and some additional features are added from idea: augment job-list "filter" with RFC 31 constraint support #4914, would hate to have a sqlite database table that we need to re-do

  • there is concern of creating a database table that is ultimately one we don't like and have to re-do several times. I do see two longer out solutions:

    • use a nosql database, but one that supports sql queries than can be converted from jsonlogic
    • use sqlite with json support, but dumping all data into one json blob in a field. By doing this we sort of get a "sql table doesn't matter" effect (i.e. if there is only 1 column, we don't care about making a bad table), although the devil is in the details until prototyping is done
      • bad part, json support in sqlite isn't supported in most distro's versions of sqlite right now
  • several of the database points I mention above are why I decided adding "constraints" support to job-list would be a better "medium term" solution than going with a everything in job-list in a database approach. right choice? we will see.

  • Sooo I'm going to keep this PR open for the time being, make it WIP, could be something we still want to merge medium term if we just deem the pros outweigh the cons.

Problem: There was an errant double semicolon.

Solution: Remove it!
Problem: Several helper functions take variable arguments, which are
unnecessary.

Remove the variable args to these helper functions.
Problem: In the new future the array_to_results_bitmask() will be
needed in multiple files.

Move it from match.c to match_util.c.
Problem: In the near future we will need access to the job's
eventlog when a job goes inactive.

Solution: Rebuild the job eventlog from the events journal and
store it internally in struct job.
@chu11 chu11 force-pushed the issue4273_job_db branch 4 times, most recently from ca593ec to d5a5d7f Compare June 12, 2024 20:47
@chu11 chu11 force-pushed the issue4273_job_db branch 2 times, most recently from bd2def4 to 9f98830 Compare June 12, 2024 21:19
Problem: We would like to store inactive jobs on disk for retrieval
at a later time.

Solution: Add an sqlite job db to the job-list module.  If the
job-list module is configured with a database path or if the flux
broker is configured with a statedir, store all inactive job data
to the db.
Problem: In the near future we would like to query jobs from the
job db.  In order to do so efficiently, we need to convert the
job list constraint into equivalent SQL that can be used in a
WHERE statement.

Support a new helper library "constraint_sql" that supports converting
a constraint into conditions that can be passed in via an SQL
WHERE.

Add unit tests as well.
Problem: Now that inactive jobs are stored to an sqlite database,
it is possible to retrieve additional job information that is no
longer stored in memory.

Solution: When possible, read from sqlite to get additional job
information about inactive jobs.
Problem: The job-list module now stores / recovers inactive job
data from its internal database.  Tests that assume job-list
only stores jobs in memory do not account for this, leading to
errors.

Solution: Update tests to read the job-list module stats instead
of using `flux jobs`.  This gathers internal data to the job-list
module, which allows purge tests to pass.
Problem: There are no tests to cover the new job-list DB.

Add coverage in new t/t2263-job-list-db.t.
@chu11
Copy link
Member Author

chu11 commented Jun 12, 2024

This job-list w/ a database has been re-worked:

  • the DB now has a bunch of data stored in various columns, this is so ...

  • constraints are now mapped into SQL queries using these new columns (e.g. WHERE ((userid = 42) AND (queue = 'pbatch')))

    • per discussion in WIP/experimental: add job-sql module #5847 we could have the SQL query use JSON and dig into the jobspec, but that requires us to use a localized copy of sqlite since the JSON support isn't in many distributed versions of sqlite. This could be an alternate implementation path to take. Either way we need a constraint2sql() function, so I just prototyped a non-JSON one to start.
  • I should add some stress testing of sorts where the db holds like thousands of jobs from different users, to see if some performance impact might exist. Although if we assume users won't increase the max number of returned jobs to (lets say) INT_MAX, maybe this isn't a huge deal.

    • We could also limit things in other ways, like guest users cannot access user != themselves in the older archived jobs (only in memory ones, i.e. jobs that are pending) ... although now that I think about it, perhaps a user should not be able to see != any inactive job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants