This was originally a fork of a similar project, but it has been altered to the point of being completely different and became it's own thing. The purpose of this work is to provide a containerised means of accessing existing slurm accounting data for (possibly offline) analytics work without having to use sacct on a production system with all the possible risks and limitations that may involve.
The scenario:
- You have been asked to do a forensic analysis of activity data--no problem.
- The activity data is shipped as a MySQL database dump--no problem.
- The database and all the data was created and populated by slurm--uh oh.
For mere mortals like myself, the last point is a bit of an issue since the table structures that slurm has developed works well with sacct, but not as well for us bald apes.
This project solves this problem by providing a containerized instance of the slurm database daemon that you, dear reader, may run sacct queries on.
Note that convenience is favoured over security and it is assumed that all steps are being run locally or in a similarly isolated system. DO NOT expose any of the containers to the world at large if you care about your data at all. You have been warned.
We assume that a MySQL dump file has been provided. Connecting this container to an existing MySQL database as a non-root user is beyond the scope of this work.
Use the official MySQL docker image to make use of the database dump first:
docker compose -f docker-compose-db-setup.yml build && \
docker compose -f docker-compose-db-setup.yml up -d
This will create a database:
- database:
slurm_data
- user:
root
- password:
H3ll012Three
Secure? Not even close! But that is OK becasue this is only for experimental purposes, right?
Verify that the slurm_data
database exists and is accessible by root
. This
step is especially important if you have changed any of the above details:
$ docker run -it \
> --network docker-slurm_default \
> mysql:9.0 \
> mysql -hslurm_db -Dslurm_data -uroot -pH3ll012Three
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 9.0.0 MySQL Community Server - GPL
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> \q
Bye
Fantastic! We now populate slurm_data
with our dump data:
docker exec -i slurm_db sh -c 'exec mysql -uroot -pH3ll012Three -Dslurm_data' < /path/to/the/database/dump
Depending on the size of the data and the speed of your hardware, this step may take anywhere from a few minutes to several days or more.
Once the process completes, be sure to to shut down the container:
docker compose -f docker-compose-db-setup.yml down
IF in the previous step you decided you didn't like to be root, used a
different database name, created a different password or anything along these
lines then you will first need to update the ./build/slurm.conf
and
./buid/slurmdbd.conf
files (at a minimum) in order to account for the
changes.
Build and start the containers:
docker compose build && \
docker compose up -d
The slurm container(s) use supervisord
in an ad-hoc manner to initialize
the services, so it can take up to a minute or so for the services to settle
in and function correctly.
Once this happens, you can run sacct
queries on
the slurmdbd
container. For example:
$ docker exec slurmdbd-root-build \
> sacct -X -P -a -M all \
> -S 2023-04-01T00:00:00 \
> -E 2023-04-01T01:00:00 \
> -o UID,JobID,NCPUS,NNodes,Submit,End,State,ConsumedEnergy,CPUTime \
> > demo_response.psv
It is worth noting that the first time the slurmdbd-root-build
container is
run, it may quietly do a large, slow optimization process on the database tables.
The only indication of this happening is via the logs at /var/log/slurmdbd.log
(on the container). During this process sacct
queries will fail with a rather
uninformative Connection refused
error. If this happens, simply wait and keep
an eye on the logs--the process can take minutes to hours or more depending on
hardware and the size of the database.
Once sacct
queries can be run, the above example allows you to use your
favorite analytics tools on the resulting CSV data. For example:
import polars as pl
pl.read_csv(
"demo_response.psv",
separator="|",
infer_schema_length=10_000,
)
# output:
# shape: (5_974, 9)
# ┌───────┬─────────┬───────┬────────┬───┬─────────────────────┬───────────┬────────────────┬──────────────┐
# │ UID ┆ JobID ┆ NCPUS ┆ NNodes ┆ … ┆ End ┆ State ┆ ConsumedEnergy ┆ CPUTime │
# │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ str ┆ str ┆ str │
# ╞═══════╪═════════╪═══════╪════════╪═══╪═════════════════════╪═══════════╪════════════════╪══════════════╡
# │ 23676 ┆ 999836 ┆ 672 ┆ 3 ┆ … ┆ 2023-04-01T06:02:59 ┆ COMPLETED ┆ 14.50M ┆ 57-11:05:36 │
# │ 24744 ┆ 1001275 ┆ 512 ┆ 2 ┆ … ┆ 2023-04-02T02:16:03 ┆ COMPLETED ┆ 104.67M ┆ 474-12:01:04 │
# │ 24744 ┆ 1001276 ┆ 512 ┆ 2 ┆ … ┆ 2023-04-02T02:34:15 ┆ COMPLETED ┆ 103.95M ┆ 480-00:25:36 │
# │ 24744 ┆ 1001278 ┆ 512 ┆ 2 ┆ … ┆ 2023-04-02T02:22:07 ┆ COMPLETED ┆ 102.13M ┆ 475-03:31:12 │
# │ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
# │ 22592 ┆ 3992208 ┆ 18 ┆ 1 ┆ … ┆ 2023-04-01T01:04:38 ┆ COMPLETED ┆ 0 ┆ 01:29:06 │
# │ 22592 ┆ 3992209 ┆ 18 ┆ 1 ┆ … ┆ 2023-04-01T01:04:16 ┆ COMPLETED ┆ 0 ┆ 01:22:30 │
# │ 22592 ┆ 3992210 ┆ 18 ┆ 1 ┆ … ┆ 2023-04-01T01:04:12 ┆ COMPLETED ┆ 0 ┆ 01:21:18 │
# │ 22592 ┆ 3992211 ┆ 18 ┆ 1 ┆ … ┆ 2023-04-01T01:03:56 ┆ COMPLETED ┆ 0 ┆ 01:07:30 │
# └───────┴─────────┴───────┴────────┴───┴─────────────────────┴───────────┴────────────────┴──────────────┘
Your results will, of course, be different.