This step is also fairly straightforward. During the time of this writing, NVIDIA container toolkit does not support v2 cgroups, so Nixos will have to be configured to use the previous version.
As previously mentioned, the configuration for docker is in a separate file as below.
{ config, lib, pkgs, ... }:
{
virtualisation.docker.enable = true;
virtualisation.docker.enableOnBoot = true;
# Nvidia Docker
virtualisation.docker.enableNvidia = true;
# libnvidia-container does not support cgroups v2 (prior to 1.8.0)
# https://github.com/NVIDIA/nvidia-docker/issues/1447
systemd.enableUnifiedCgroupHierarchy = false;
}
Include the gpu config in the main configuration
{ config, lib, pkgs, ... }:
{
imports = [
./hardware-configuration.nix
.
.
# virtualisation
./docker.nix
.
.
];
.
.
.
Rebuild the nixos configuration and run tests as mentioned below.
Having made a note of the CUDA version running on your NixOS host, get an image from docker hub with a matching CUDA version.
If there’s no exact match, you can try to get an image that has a slightly older minor version, but not the major one. For eg. if you have CUDA 12.3 on your NixOS host, getting an image with CUDA 12.1 should be compatible. Always match the CUDA major version, and get the least old minor version image.
Note that you will not be able to see all the pids as you could on the host, unless you use the –pid=host flag.
> docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Fri Jan 19 00:47:29 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 On | N/A |
| 0% 50C P8 40W / 370W | 480MiB / 24576MiB | 8% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
Please read the documentation at http://wili.cc/blog/gpu-burn.html, before you continue this stress test.
git clone https://github.com/wilicc/gpu-burn.git
cd gpu-burn
CUDA_IMAGE=12.1.0 make image
# Run the default stress test as packages in the image
docker run --rm --gpus all gpu-burn
# For custom parameters get a shell on the image and run the test as you please
# Please read the documentation before you run long difficult stress tests
docker run --rm --gpus all -it gpu-burn /bin/bash
./gpu-burn --help