Skip to content

Commit

Permalink
Added tools
Browse files Browse the repository at this point in the history
  • Loading branch information
jdh4 committed Nov 4, 2024
1 parent 8607ee6 commit 2d55aaf
Show file tree
Hide file tree
Showing 22 changed files with 535 additions and 385 deletions.
56 changes: 55 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,59 @@
# Configuration File

Use `config.py` as the starting point for your configuration file.
Use `config.py` in the [Jobstats GitHub repository](https://github.com/PrincetonUniversity/jobstats) as the starting point for your configuration file.

Here is an explanation:

```python
# prometheus server address and port
PROM_SERVER = "http://vigilant2:8480"
```

The number of seconds between measurements by the exporters on the compute nodes:

```
# number of seconds between measurements
SAMPLING_PERIOD = 30
```

One can use the Python blessed package to produce bold and colorized text. This
helps to draw the user's attention to specific lines of the report. This part
of the configuration sets the various thresholds:

```
# threshold values for red versus black notes
GPU_UTIL_RED = 15 # percentage
GPU_UTIL_BLACK = 25 # percentage
CPU_UTIL_RED = 65 # percentage
CPU_UTIL_BLACK = 80 # percentage
TIME_EFFICIENCY_RED = 40 # percentage
TIME_EFFICIENCY_BLACK = 70 # percentage
MIN_MEMORY_USAGE = 70 # percentage
MIN_RUNTIME_SECONDS = 10 * SAMPLING_PERIOD # seconds
```

```
# translate cluster names in Slurm DB to informal names
CLUSTER_TRANS = {"tiger":"tiger2"}
#CLUSTER_TRANS = {} # if no translations then use an empty dictionary
CLUSTER_TRANS_INV = dict(zip(CLUSTER_TRANS.values(), CLUSTER_TRANS.keys()))
# maximum number of characters to display in jobname
MAX_JOBNAME_LEN = 64
# default CPU memory per core in bytes for each cluster
# if unsure then use memory per node divided by cores per node
DEFAULT_MEM_PER_CORE = {"adroit":3355443200,
"della":4194304000,
"stellar":7864320000,
"tiger":4294967296,
"traverse":7812500000}
# number of CPU-cores per node for each cluster
# this will eventually be replaced with explicit values for each node
CORES_PER_NODE = {"adroit":32,
"della":28,
"stellar":96,
"tiger":40,
"traverse":32}
```
4 changes: 2 additions & 2 deletions docs/developers.md → docs/contributions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Developers
# Contributions

Contributions are welcome. To work with the code, build a Conda environment:
Contributions to the Jobstats platform and its tools are welcome. To work with the code, build a Conda environment:

```
$ conda create --name jobstats-dev requests blessed pytest-mock mkdocs-material -c conda-forge
Expand Down
8 changes: 0 additions & 8 deletions docs/external_tools.md

This file was deleted.

33 changes: 31 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What is Jobstats?

Jobstats is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager. It was released in 2023 under the GNU GPL v2 license.
Jobstats is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager. It was released in 2023 under the GNU GPL v2 license. Visit the [Jobstats GitHub repository](https://github.com/PrincetonUniversity/jobstats).

## What are the main benefits of Jobstats over other platforms?

Expand All @@ -17,7 +17,11 @@ The main advantages of Jobstats are:

## How does Jobstats work?

Jobstats is composed of data exporters, Prometheus database, Grafana visualization interface, and the Slurm database. Measurements made on the compute nodes are stored in the time-series Prometheus database. Job efficiency reports are generate from this data and Slurm.
Job and node statistics are exposed by four different Prometheus exporters (Node, cgroups, NVIDIA, GPFS):

![Schematic diagram](jobstats_schematics.png)

The exporters serve to make data available to the Prometheus database. Users interact with the Prometheus and Slurm data via the web interface (i.e., Grafana) and external tools (e.g., `gpudash`).

## Which institutions are using Jobstats?

Expand Down Expand Up @@ -97,6 +101,31 @@ $ jobstats 39798795
https://mydella.princeton.edu/pun/sys/jobstats (VPN required off-campus)
```

### What data does Jobstats make available?

Job-level metrics:

- CPU Utilization
- CPU Memory Utilization
- GPU Utilization
- GPU Memory Utilization
- GPU Power Usage
- GPU Temperature

Node-level metrics:

- CPU Percentage Utilization
- Total Memory Utilization
- Mean Frequency Over All CPUs
- NFS Statistics
- Local Disc R/W
- GPFS Bandwidth Statistics
- Local Disc IOPS
- GPFS Operations per Second Statistics
- Infiniband Throughput
- Infiniband Packet Rate
- Infiniband Errors

## Other Job Monitoring Platforms

Consider these alternatives to Jobstats:
Expand Down
19 changes: 0 additions & 19 deletions docs/installation.md

This file was deleted.

174 changes: 0 additions & 174 deletions docs/jobstats.md

This file was deleted.

70 changes: 70 additions & 0 deletions docs/setup/cgroups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# cgroups

Slurm has to be configured to track job accounting data via the cgroup plugin. This requires the following line in slurm.conf:

```
JobAcctGatherType=jobacct_gather/cgroup
```

The above is in addition to the other usual cgroup related plugins/settings:

```
ProctrackType=proctrack/cgroup
TaskPlugin=affinity,cgroup
```

Slurm will then create two top-level cgroup directories for each job, one for CPUutilization and one for CPU memory [17]. Within each directory there will be subdirectories: step_extern, step_batch, step_0, step_1, and so on. Within these directories one finds task_0, task_1, and so on. These cgroups are scraped by a cgroup exporter [14]. Table 1 lists all of the collected fields.

The cgroup exporter used here is based on Ref. [3] with additional parsing of the jobid, steps, tasks and UID number. This produces an output that resembles (e.g., for system seconds):

```
cgroup_cpu_system_seconds{jobid="247463", step="batch", task="0"}
160.92
```

Note that the UID of the owning user is stored as a gauge in cgroup_uid:

```
cgroup_uid{jobid="247463"} 334987
```

This is because accounting is job-oriented and having a UID of the user as a label would needlessly increase the cardinality of the data in Prometheus. All other fields are alike with jobid, step and task labels.

The totals for a job have an empty step and task, for example:

```
cgroup_cpu_user_seconds{jobid="247463", step="", task=""}
202435.71
```

This is due to the organization of the cgroup hierarchy. Consider the directory:

```
/sys/fs/cgroup/cpu,cpuacct/slurm/uid_334987
```

Within this directory, one finds the following subdirectories:

```
job_247463/cpuacct.usage_user
job_247463/step_extern/cpuacct.usage_user
job_247463/step_extern/task_0/cpuacct.usage_user
```

This is the data most often retrieved and parsed for overall job efficiency which is why by default the cgroup_exporter does not parse step or task data. To collect all of it, add the--collect.fullslurm option. We run the cgroup_exporter with these options: /usr/sbin/cgroup_exporter--config.paths /slurm \--collect.fullslurm The--config.paths /slurm has to match the path used by Slurm under the top cgroup directory. This is usually a path that is something like `/sys/fs/cgroup/memory/slurm`.

| Name | Description | Type |
| ---- | ----------- | ---- |
| cgroup_cpu_system_seconds | Cumulative CPU system seconds for jobid | gauge |
| cgroup_cpu_total_seconds | Cumulative CPU total seconds for jobid | gauge |
| cgroup_cpu_user_seconds | Cumulative CPU user seconds for jobid | gauge |
| cgroup_cpus | Number of CPUs in the jobid | gauge |
| cgroup_memory_cache_bytes | Memory cache used in bytes | gauge |
| cgroup_memory_fail_count | Memory fail count | gauge |
| cgroup_memory_rss_bytes | Memory RSS used in bytes | gauge |
| cgroup_memory_total_bytes | Memory total given to jobid in bytes | gauge |
| cgroup_memory_used_bytes | Memory used in bytes | gauge |
| cgroup_memsw_fail_count | Swap fail count | gauge |
| cgroup_memsw_total_bytes | Swap total given to jobid in bytes | gauge |
| cgroup_memsw_used_bytes | Swap used in bytes | gauge |
| cgroup_uid | UID number of user running this job | gauge |
Loading

0 comments on commit 2d55aaf

Please sign in to comment.