Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting GPU container with rootless runsc: operation not permitted #11076

Open
sfc-gh-lshi opened this issue Oct 22, 2024 · 5 comments
Open
Labels
area: gpu Issue related to sandboxed GPU access type: bug Something isn't working

Comments

@sfc-gh-lshi
Copy link

Description

In #11069 I obtained a working OCI runtime spec for using sudo runsc to directly access the GPU. The same configuration fails to start in rootless mode though:

$ unshare -Ur runsc --rootless --nvproxy --strace --debug --debug-log=/home/$USER/runsc.log --network=host --host-uds=all run "container"
running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

This error seems to originate from libnvidia-container either here or here.

Changes made

nvidia-container-cli is invoked through gofer, so I started adding capabilities there to see if that is the issue. Unfortunately, that didn't help, but here is what I did:

  1. Get the corresponding version of the repo: wget https://github.com/google/gvisor/archive/refs/tags/release-20240807.0.zip && unzip release-20240807.0.zip.
  2. Edit runsc/container/container.go to provide gofer with all capabilities and make them inheritable, so that nvidia-container-cli configure is called with the same capabilities.
# Add `import "github.com/syndtr/gocapability/capability"`
# In nvproxySetupAfterGoferUserns
...
return func() error {
  defer ourEnd.Close()

  newCaps, err := capability.NewPid2(0)
  if err != nil {
  	return err
  }

  allCapTypes := []capability.CapType{
    capability.BOUNDS,
    capability.EFFECTIVE,
    capability.PERMITTED,
    capability.INHERITABLE,
    capability.AMBIENT,
  }
  for _, c := range allCapTypes {
    if !newCaps.Empty(c) {
    	panic("unloaded capabilities must be empty")
    }
    set := make([]capability.Cap, 0)
    for i := 0; i <= 40; i++ {
    	set = append(set, capability.Cap(i))
    }
    newCaps.Set(c, set...)
  }
  
  if err := newCaps.Apply(capability.CAPS | capability.BOUNDS | capability.AMBS); err != nil {
    return err
  }
  log.Infof("Capabilities applied: %+v", newCaps)
  
  argv := []string{
    cliPath,
    "--load-kmods",
    "configure",
  ...
  1. Update Bazel build.
    • In runsc/container/BUILD, add @com_github_syndtr_gocapability//capability:go_default_library to deps for container.
  2. Build custom runsc: mkdir -p bin && make copy TARGETS=runsc DESTINATION=bin/.
  3. Use the custom runsc with the reproduction steps below:
$ unshare -Ur ../gvisor-release-20240807.0/bin/runsc --rootless --nvproxy --strace --debug --debug-log=/home/$USER/runsc.log --network=host --host-uds=all run "container"
running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

/home/$USER/runsc.log will show that all capabilities are set and inheritable:

D1022 23:50:02.981772   83665 container.go:1365] Starting gofer: /proc/self/exe [runsc-gofer --host-uds=all --network=host --strace=true --rootless=true --nvproxy=true --root=/run/user/1001/runsc --debug=true --debug-log=/home/lshi/runsc.log --debug-log-fd=3 gofer --bundle /home/lshi/tmp --gofer-mount-confs=lisafs:none --spec-fd=4 --mounts-fd=5 --io-fds=6 --dev-io-fd=7 --sync-nvproxy-fd=8]
I1022 23:50:02.983338   83665 container.go:1369] Gofer started, PID: 83681
I1022 23:50:02.983419   83665 container.go:2033] Capabilities applied: { effective="full" permitted="full" inheritable="full" bounding="full" }
D1022 23:50:02.983435   83665 container.go:2046] Executing ["/usr/bin/nvidia-container-cli" "--load-kmods" "configure" "--ldconfig=@/sbin/ldconfig.real" "--no-cgroups" "--utility" "--compute" "--pid=83681" "--device=all" "/home/lshi/tmp/rootfs"]
D1022 23:50:02.985635   83665 container.go:791] Destroy container, cid: container
D1022 23:50:02.985667   83665 container.go:1102] Killing gofer for container, cid: container, PID: 83681
W1022 23:50:02.986570   83665 util.go:64] FATAL ERROR: running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

W1022 23:50:02.986660   83665 main.go:231] Failure to execute command, err: 1

Steps to reproduce

  1. Follow steps in Starting container directly with runsc: GPU access blocked by operating system #11069 to create a GCP VM with GPU and install gVisor.
  2. Follow Starting container directly with runsc: GPU access blocked by operating system #11069 (comment) to create a folder with a rootfs, and add this config.json.
{
  "ociVersion": "1.0.0",
  "process": {
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "nvidia-smi",
      "-L"
    ],
    "env": [
      "PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "HOSTNAME=1da1c41ed033",
      "NVARCH=x86_64",
      "NVIDIA_REQUIRE_CUDA=cuda\u003e=11.6 brand=tesla,driver\u003e=470,driver\u003c471 brand=unknown,driver\u003e=470,driver\u003c471 brand=nvidia,driver\u003e=470,driver\u003c471 brand=nvidiartx,driver\u003e=470,driver\u003c471 brand=geforce,driver\u003e=470,driver\u003c471 brand=geforcertx,driver\u003e=470,driver\u003c471 brand=quadro,driver\u003e=470,driver\u003c471 brand=quadrortx,driver\u003e=470,driver\u003c471 brand=titan,driver\u003e=470,driver\u003c471 brand=titanrtx,driver\u003e=470,driver\u003c471",
      "NV_CUDA_CUDART_VERSION=11.6.55-1",
      "NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6",
      "CUDA_VERSION=11.6.2",
      "LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64",
      "NVIDIA_VISIBLE_DEVICES=all",
      "NVIDIA_DRIVER_CAPABILITIES=compute,utility",
      "NVIDIA_VISIBLE_DEVICES=all"
    ],
    "cwd": "/"
  },
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    }
  ],
  "hooks": {
    "prestart": [
      {
        "path": "/usr/bin/nvidia-container-runtime-hook",
        "args": [
          "nvidia-container-runtime-hook",
          "prestart"
        ],
        "env": [
          "LANG=C.UTF-8",
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
        ]
      }
    ]
  },
  "linux": {
    "namespaces": [
      {
        "type": "mount"
      },
      {
        "type": "network"
      },
      {
        "type": "uts"
      },
      {
        "type": "pid"
      },
      {
        "type": "ipc"
      },
      {
        "type": "cgroup"
      }
    ]
  }
}
  1. Try to start the container.
$ unshare -Ur runsc --rootless --nvproxy --strace --debug --debug-log=/home/$USER/runsc.log --network=host --host-uds=all run "container"
running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

runsc version

runsc version release-20240807.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux lshi-gvisor-gpu 6.5.0-1025-gcp #27~22.04.1-Ubuntu SMP Tue Jul 16 23:03:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

D1022 23:27:09.028129 36265 container.go:544] Run container, cid: container, rootDir: "/run/user/1001/runsc"
D1022 23:27:09.028148 36265 container.go:200] Create container, cid: container, rootDir: "/run/user/1001/runsc"
D1022 23:27:09.028238 36265 container.go:262] Creating new sandbox for container, cid: container
D1022 23:27:09.028285 36265 cgroup.go:428] New cgroup for pid: self, *cgroup.cgroupV2: &{Mountpoint:/sys/fs/cgroup Path:/container Controllers:[cpuset cpu io memory hugetlb pids rdma misc] Own:[]}
D1022 23:27:09.028326 36265 cgroup_v2.go:132] Installing cgroup path "/sys/fs/cgroup/container"
D1022 23:27:09.028345 36265 cgroup_v2.go:177] Deleting cgroup "/sys/fs/cgroup/container"
W1022 23:27:09.028369 36265 container.go:1767] Skipping cgroup configuration in rootless mode: open /sys/fs/cgroup/cgroup.subtree_control: permission denied
D1022 23:27:09.028428 36265 container.go:1919] Executing ["/sbin/modprobe" "nvidia"]
D1022 23:27:09.030535 36265 container.go:1919] Executing ["/sbin/modprobe" "nvidia-uvm"]
D1022 23:27:09.032889 36265 donation.go:32] Donating FD 3: "/home/lshi/runsc.log"
D1022 23:27:09.032907 36265 donation.go:32] Donating FD 4: "/home/lshi/tmp/config.json"
D1022 23:27:09.032918 36265 donation.go:32] Donating FD 5: "|1"
D1022 23:27:09.032923 36265 donation.go:32] Donating FD 6: "gofer IO FD"
D1022 23:27:09.032927 36265 donation.go:32] Donating FD 7: "gofer dev IO FD"
D1022 23:27:09.032931 36265 donation.go:32] Donating FD 8: "nvproxy sync gofer FD"
D1022 23:27:09.032935 36265 container.go:1364] Starting gofer: /proc/self/exe [runsc-gofer --nvproxy=true --root=/run/user/1001/runsc --debug=true --debug-log=/home/lshi/runsc.log --host-uds=all --network=host --strace=true --rootless=true --debug-log-fd=3 gofer --bundle /home/lshi/tmp --gofer-mount-confs=lisafs:none --spec-fd=4 --mounts-fd=5 --io-fds=6 --dev-io-fd=7 --sync-nvproxy-fd=8]
I1022 23:27:09.034687 36265 container.go:1368] Gofer started, PID: 36273
D1022 23:27:09.034714 36265 container.go:2017] Executing ["/usr/bin/nvidia-container-cli" "--load-kmods" "configure" "--ldconfig=@/sbin/ldconfig.real" "--no-cgroups" "--utility" "--compute" "--pid=36273" "--device=all" "/home/lshi/tmp/rootfs"]
D1022 23:27:09.037032 36265 container.go:790] Destroy container, cid: container
D1022 23:27:09.037081 36265 container.go:1101] Killing gofer for container, cid: container, PID: 36273
W1022 23:27:09.038009 36265 util.go:64] FATAL ERROR: running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

W1022 23:27:09.038116 36265 main.go:231] Failure to execute command, err: 1

@sfc-gh-lshi sfc-gh-lshi added the type: bug Something isn't working label Oct 22, 2024
@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 29, 2024

Can you get it to work without gVisor in rootless mode (with runc)? Are you setting up the nvidia-container-runtime with rootless mode: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#rootless-mode?

runsc is trying to emulate the nvidia-container-runtime by injecting the prestart hook and invoking nvidia-container-cli. We need to look at this more, but maybe nvidia-container-runtime is passing some special flags to the CLI or something.

@ayushr2 ayushr2 added the area: gpu Issue related to sandboxed GPU access label Oct 29, 2024
@sfc-gh-lshi
Copy link
Author

sfc-gh-lshi commented Oct 30, 2024

Can you get it to work without gVisor in rootless mode (with runc)?

Not yet. Using the same rootfs and config.json that works with gVisor + sudo, here's what I did:

  1. Set no-cgroups = true under [nvidia-container-cli] in /etc/nvidia-container-runtime/config.toml.

    • I think this corresponds to sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place, though nvidia-ctk config doesn't seem to be an available command anymore.
  2. Created etc under rootfs, which is needed for /sbin/ldconfig.real to succeed: mkdir rootfs/etc.

  3. Using runc with --rootless,

    $ unshare -Ur runc --rootless=true --log /home/$USER/runc.log --debug run "runc-test"
    runc run failed: unable to start container process: error during container init: error mounting "sysfs" to rootfs at "/sys": mount sysfs:/sys (via /proc/self/fd/6), flags: 0x9: operation not permitted
    
    $ cat ~/runc.log
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec[39855]: => nsexec container setup"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: ~> nsexec stage-0"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: spawn stage-1"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: -> stage-1 synchronisation loop"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-1[39857]: ~> nsexec stage-1"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-1[39857]: unshare remaining namespaces (except cgroupns)"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-1[39857]: spawn stage-2"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-1[39857]: request stage-0 to forward stage-2 pid (39858)"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: stage-1 requested pid to be forwarded"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: forward stage-1 (39857) and stage-2 (39858) pids to runc"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-1[39857]: signal completion to stage-0"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-1[39857]: <~ nsexec stage-1"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-2[1]: ~> nsexec stage-2"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: stage-1 complete"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: <- stage-1 synchronisation loop"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: -> stage-2 synchronisation loop"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: signalling stage-2 to run"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-2[1]: signal completion to stage-0"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-2[1]: <= nsexec container setup"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-2[1]: booting up go runtime ..."
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: stage-2 complete"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: <- stage-2 synchronisation loop"
    time="2024-10-30T20:35:22Z" level=debug msg="nsexec-0[39855]: <~ nsexec stage-0"
    time="2024-10-30T20:35:22Z" level=debug msg="child process in init()"
    time="2024-10-30T20:35:22Z" level=error msg="runc run failed: unable to start container process: error during container init: error mounting \"sysfs\" to rootfs at \"/sys\": mount sysfs:/sys (via /proc/self/fd/6), flags: 0x9: operation not permitted" func="main.fatalWithCode()" file="utils.go:61"
    
  4. Using runc with sudo,

    $ sudo runc --log /home/$USER/runc.log --debug run "runc-test"
    Failed to initialize NVML: Unknown Error
    
    $ cat ~/runc.log
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec[40009]: => nsexec container setup"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: ~> nsexec stage-0"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: spawn stage-1"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: -> stage-1 synchronisation loop"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-1[40012]: ~> nsexec stage-1"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-1[40012]: unshare remaining namespaces (except cgroupns)"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-1[40012]: spawn stage-2"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-1[40012]: request stage-0 to forward stage-2 pid (40013)"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: stage-1 requested pid to be forwarded"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: forward stage-1 (40012) and stage-2 (40013) pids to runc"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-2[1]: ~> nsexec stage-2"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-1[40012]: signal completion to stage-0"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-1[40012]: <~ nsexec stage-1"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: stage-1 complete"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: <- stage-1 synchronisation loop"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: -> stage-2 synchronisation loop"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: signalling stage-2 to run"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-2[1]: signal completion to stage-0"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-2[1]: <= nsexec container setup"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-2[1]: booting up go runtime ..."
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: stage-2 complete"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: <- stage-2 synchronisation loop"
    time="2024-10-30T20:36:03Z" level=debug msg="nsexec-0[40009]: <~ nsexec stage-0"
    time="2024-10-30T20:36:03Z" level=debug msg="child process in init()"
    time="2024-10-30T20:36:04Z" level=debug msg="init: closing the pipe to signal completion"
    time="2024-10-30T20:36:04Z" level=debug msg="sending signal to process urgent I/O condition" func="main.(*signalHandler).forward()" file="signals.go:102"
    time="2024-10-30T20:36:04Z" level=debug msg="process exited" func="main.(*signalHandler).forward()" file="signals.go:92" pid=40013 status=255
    

runsc is trying to emulate the nvidia-container-runtime by injecting the prestart hook and invoking nvidia-container-cli.

Yeah, it's nvidia-container-cli configure that is borking. It's either trying to adjust or drop privileges which is getting disallowed. Here's a comment I found in their code:

 /*
  * Prevent the kernel from adjusting capabilities on UID change.
  * This is necessary if we want to keep our ambient capabilities.
  */

@sfc-gh-lshi
Copy link
Author

sfc-gh-lshi commented Nov 12, 2024

Ok, I think I've sorted it out in NVIDIA/libnvidia-container#288. tl;dr, the nvidia-container-cli configure arguments need to change when using --rootless:

argv := []string{
    cliPath,
    "--load-kmods",
    "--user=root:root", // Additional flag
    "configure",
    fmt.Sprintf("--ldconfig=%s", ldconfigPath), // Edited to remove '@'
    "--no-cgroups",
    "--utility",
    "--compute",
    fmt.Sprintf("--pid=%d", goferCmd.Process.Pid),
    fmt.Sprintf("--device=%s", devices),
    spec.Root.Path,
}

Then unshare -mUr runsc --nvproxy --strace --debug --debug-log=/tmp/rootless-logs/runsc.log --network=host --host-uds=all --rootless --ignore-cgroups run "container" works!
Image

Should gVisor pass the updated flags when --rootless is provided so it works out-of-the-box?

@ayushr2
Copy link
Collaborator

ayushr2 commented Nov 13, 2024

Thanks @sfc-gh-lshi for doing the investigation!

Is the --rootless flag needed here? You are already running runsc in a new userns. So MaybeRunAsRoot() will be a no-op:

gvisor/runsc/cmd/run.go

Lines 82 to 91 in 94aa652

if conf.Rootless {
if conf.Network == config.NetworkSandbox {
return util.Errorf("sandbox network isn't supported with --rootless, use --network=none or --network=host")
}
if err := specutils.MaybeRunAsRoot(); err != nil {
return util.Errorf("Error executing inside namespace: %v", err)
}
// Execution will continue here if no more capabilities are needed...
}
.

There are 2 ways in which Rootless containers can be run:

  1. You can configure the userns before runsc is invoked. You are doing that with unshare -mUr here. This can alternatively be done by --rootless flag which calls MaybeRunAsRoot() -> unshares the userns -> runs as root -> re-execs runsc.
  2. Run runsc with a non-root user.

(2) is handled in different ways in runsc. You can see checks like this:

rootlessEUID := unix.Geteuid() != 0
// Setup any uid/gid mappings, and create or join the configured user
// namespace so the gofer's view of the filesystem aligns with the
// users in the sandbox.
if !rootlessEUID {
.

I assume the changes you have mentioned will only work in (1). It is possible to achieve (1) without the --rootless flag, so conditionally changing nvidia-container-cli based on --rootless might not be complete.

@sfc-gh-lshi
Copy link
Author

Is the --rootless flag needed here? You are already running runsc in a new userns.

unshare without --rootless does not work without the argument changes. In fact, both the --user and --ldconfig changes are required for unshare to work without --rootless.

# No changes
$ unshare -mUr runsc --nvproxy --strace --debug --debug-log=/tmp/rootless-logs/runsc.log --network=host --host-uds=all --ignore-cgroups run "container"
running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

# Only `--user` argument addition.
$ unshare -mUr runsc --nvproxy --strace --debug --debug-log=/tmp/rootless-logs/runsc.log --network=host --host-uds=all --ignore-cgroups run "container"
running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1

There are 2 ways in which Rootless containers can be run:

  1. You can configure the userns before runsc is invoked. You are doing that with unshare -mUr here. This can alternatively be done by --rootless flag which calls MaybeRunAsRoot() -> unshares the userns -> runs as root -> re-execs runsc.
  2. Run runsc with a non-root user.

(2) is handled in different ways in runsc. You can see checks like this:

I agree that --rootless is a no-op if you unshare, and you can instead pass --rootless if you don't unshare first. However, in all cases the argument changes to nvidia-container-cli configure are required. Since nvproxySetupAfterGoferUserns takes conf *config.Config, relying on --rootless is probably the easiest way to let the user explicitly request modified behavior. The only caveat is you'd have to document that you need to pass --rootless to use the GPU even if you unshare, but at least GPU access will always work when you pass --rootless, which is what you would expect.

There are other ways to approach this, you can introduce an additional --unprivileged-gpu flag or maybe try to detect if you're in a child namespace. The bottom line is that since this can work, it should work without the user having to modify gVisor code - the specific manner of presenting this option isn't what I'm advocating for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: gpu Issue related to sandboxed GPU access type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants