Added tools

PrincetonUniversity · Nov 4, 2024 · 2d55aaf · 2d55aaf
1 parent 8607ee6
commit 2d55aaf
Show file tree

Hide file tree

Showing 22 changed files with 535 additions and 385 deletions.
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -1,5 +1,59 @@
 # Configuration File
 
-Use `config.py` as the starting point for your configuration file.
+Use `config.py` in the [Jobstats GitHub repository](https://github.com/PrincetonUniversity/jobstats) as the starting point for your configuration file.
 
 Here is an explanation:
+
+```python
+# prometheus server address and port
+PROM_SERVER = "http://vigilant2:8480"
+```
+
+The number of seconds between measurements by the exporters on the compute nodes:
+
+```
+# number of seconds between measurements
+SAMPLING_PERIOD = 30
+```
+
+One can use the Python blessed package to produce bold and colorized text. This
+helps to draw the user's attention to specific lines of the report. This part
+of the configuration sets the various thresholds:
+
+```
+# threshold values for red versus black notes
+GPU_UTIL_RED   = 15  # percentage
+GPU_UTIL_BLACK = 25  # percentage
+CPU_UTIL_RED   = 65  # percentage
+CPU_UTIL_BLACK = 80  # percentage
+TIME_EFFICIENCY_RED   = 40  # percentage
+TIME_EFFICIENCY_BLACK = 70  # percentage
+MIN_MEMORY_USAGE      = 70  # percentage
+MIN_RUNTIME_SECONDS   = 10 * SAMPLING_PERIOD  # seconds
+```
+
+```
+# translate cluster names in Slurm DB to informal names
+CLUSTER_TRANS = {"tiger":"tiger2"}
+#CLUSTER_TRANS = {}  # if no translations then use an empty dictionary
+CLUSTER_TRANS_INV = dict(zip(CLUSTER_TRANS.values(), CLUSTER_TRANS.keys()))
+
+# maximum number of characters to display in jobname
+MAX_JOBNAME_LEN = 64
+
+# default CPU memory per core in bytes for each cluster
+# if unsure then use memory per node divided by cores per node
+DEFAULT_MEM_PER_CORE = {"adroit":3355443200,
+                        "della":4194304000,
+                        "stellar":7864320000,
+                        "tiger":4294967296,
+                        "traverse":7812500000}
+
+# number of CPU-cores per node for each cluster
+# this will eventually be replaced with explicit values for each node
+CORES_PER_NODE = {"adroit":32,
+                  "della":28,
+                  "stellar":96,
+                  "tiger":40,
+                  "traverse":32}
+```
diff --git a/docs/developers.md → docs/contributions.md b/docs/developers.md → docs/contributions.md
@@ -1,6 +1,6 @@
-# Developers
+# Contributions
 
-Contributions are welcome. To work with the code, build a Conda environment:
+Contributions to the Jobstats platform and its tools are welcome. To work with the code, build a Conda environment:
 
 ```
 $ conda create --name jobstats-dev requests blessed pytest-mock mkdocs-material -c conda-forge

diff --git a/docs/external_tools.md b/docs/external_tools.md
diff --git a/docs/index.md b/docs/index.md
@@ -1,6 +1,6 @@
 # What is Jobstats?
 
-Jobstats is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager. It was released in 2023 under the GNU GPL v2 license.
+Jobstats is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager. It was released in 2023 under the GNU GPL v2 license. Visit the [Jobstats GitHub repository](https://github.com/PrincetonUniversity/jobstats).
 
 ## What are the main benefits of Jobstats over other platforms?
 
@@ -17,7 +17,11 @@ The main advantages of Jobstats are:
 
 ## How does Jobstats work?
 
-Jobstats is composed of data exporters, Prometheus database, Grafana visualization interface, and the Slurm database. Measurements made on the compute nodes are stored in the time-series Prometheus database. Job efficiency reports are generate from this data and Slurm.
+Job and node statistics are exposed by four different Prometheus exporters (Node, cgroups, NVIDIA, GPFS):
+
+![Schematic diagram](jobstats_schematics.png)
+
+The exporters serve to make data available to the Prometheus database. Users interact with the Prometheus and Slurm data via the web interface (i.e., Grafana) and external tools (e.g., `gpudash`).
 
 ## Which institutions are using Jobstats?
 
@@ -97,6 +101,31 @@ $ jobstats 39798795
     https://mydella.princeton.edu/pun/sys/jobstats  (VPN required off-campus)
 ```
 
+### What data does Jobstats make available?
+
+Job-level metrics:
+
+- CPU Utilization
+- CPU Memory Utilization
+- GPU Utilization
+- GPU Memory Utilization
+- GPU Power Usage
+- GPU Temperature
+
+Node-level metrics:
+
+- CPU Percentage Utilization
+- Total Memory Utilization
+- Mean Frequency Over All CPUs
+- NFS Statistics
+- Local Disc R/W
+- GPFS Bandwidth Statistics
+- Local Disc IOPS
+- GPFS Operations per Second Statistics
+- Infiniband Throughput
+- Infiniband Packet Rate
+- Infiniband Errors
+
 ## Other Job Monitoring Platforms
 
 Consider these alternatives to Jobstats:

diff --git a/docs/installation.md b/docs/installation.md
diff --git a/docs/jobstats.md b/docs/jobstats.md
diff --git a/docs/setup/cgroups.md b/docs/setup/cgroups.md
@@ -0,0 +1,70 @@
+# cgroups
+
+Slurm has to be configured to track job accounting data via the cgroup plugin. This requires the following line in slurm.conf:
+
+```
+JobAcctGatherType=jobacct_gather/cgroup
+```
+
+The above is in addition to the other usual cgroup related plugins/settings:
+
+```
+ProctrackType=proctrack/cgroup
+TaskPlugin=affinity,cgroup
+```
+
+Slurm will then create two top-level cgroup directories for each job, one for CPUutilization and one for CPU memory [17]. Within each directory there will be subdirectories: step_extern, step_batch, step_0, step_1, and so on. Within these directories one finds task_0, task_1, and so on. These cgroups are scraped by a cgroup exporter [14]. Table 1 lists all of the collected fields.
+
+The cgroup exporter used here is based on Ref. [3] with additional parsing of the jobid, steps, tasks and UID number. This produces an output that resembles (e.g., for system seconds):
+
+```
+cgroup_cpu_system_seconds{jobid="247463", step="batch", task="0"}
+160.92
+```
+
+Note that the UID of the owning user is stored as a gauge in cgroup_uid:
+
+```
+cgroup_uid{jobid="247463"} 334987
+```
+
+This is because accounting is job-oriented and having a UID of the user as a label would needlessly increase the cardinality of the data in Prometheus. All other fields are alike with jobid, step and task labels.
+
+The totals for a job have an empty step and task, for example:
+
+```
+cgroup_cpu_user_seconds{jobid="247463", step="", task=""}
+202435.71
+```
+
+This is due to the organization of the cgroup hierarchy. Consider the directory:
+
+```
+/sys/fs/cgroup/cpu,cpuacct/slurm/uid_334987
+```
+
+Within this directory, one finds the following subdirectories:
+
+```
+job_247463/cpuacct.usage_user
+job_247463/step_extern/cpuacct.usage_user
+job_247463/step_extern/task_0/cpuacct.usage_user
+```
+
+This is the data most often retrieved and parsed for overall job efficiency which is why by default the cgroup_exporter does not parse step or task data. To collect all of it, add the--collect.fullslurm option. We run the cgroup_exporter with these options: /usr/sbin/cgroup_exporter--config.paths /slurm \--collect.fullslurm The--config.paths /slurm has to match the path used by Slurm under the top cgroup directory. This is usually a path that is something like `/sys/fs/cgroup/memory/slurm`.
+
+| Name | Description | Type |
+| ---- | ----------- | ---- |
+| cgroup_cpu_system_seconds | Cumulative CPU system seconds for jobid | gauge |
+| cgroup_cpu_total_seconds  | Cumulative CPU total seconds for jobid | gauge |
+| cgroup_cpu_user_seconds   | Cumulative CPU user seconds for jobid | gauge |
+| cgroup_cpus | Number of CPUs in the jobid | gauge |
+| cgroup_memory_cache_bytes | Memory cache used in bytes | gauge |
+| cgroup_memory_fail_count | Memory fail count | gauge |
+| cgroup_memory_rss_bytes  | Memory RSS used in bytes | gauge |
+| cgroup_memory_total_bytes | Memory total given to jobid in bytes | gauge |
+| cgroup_memory_used_bytes  | Memory used in bytes | gauge |
+| cgroup_memsw_fail_count   | Swap fail count | gauge |
+| cgroup_memsw_total_bytes  | Swap total given to jobid in bytes | gauge |
+| cgroup_memsw_used_bytes   | Swap used in bytes | gauge |
+| cgroup_uid                | UID number of user running this job | gauge |