Add privileged mode for SST #105

changzhi1990 · 2023-08-16T08:09:33Z

The NRI pod needs to access the /dev/isst_interface device in the privileged mode.

changzhi1990 · 2023-08-16T08:10:04Z

This pr addresses #101

changzhi1990 · 2023-09-11T01:31:35Z

@marquiz Do you have any comments for this pr?

marquiz · 2023-09-11T08:17:22Z

deployment/helm/resource-management-policies/topology-aware/templates/daemonset.yaml

@@ -66,6 +70,10 @@ spec:
            mountPath: /var/lib/nri-resource-policy
          - name: hostsysfs
            mountPath: /host/sys
+          {{ if .Values.privilegedMode }}
+          - name: hostdev
+            mountPath: /host/dev


I think mounting /dev should be a separate setting.

marquiz · 2023-09-11T08:27:19Z

deployment/helm/resource-management-policies/topology-aware/templates/daemonset.yaml

@@ -54,9 +54,13 @@ spec:
          image: {{ .Values.image.name }}:{{ .Values.image.tag | default .Chart.AppVersion }}
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          securityContext:
+            {{ if .Values.privilegedMode }}
+            privileged: true


I'd also suggest to restrict the capabilities: to the minimum that is actually required.

I'm not sure what would be the nicest way to tie all the pieces (privileged mode, capabilities, /dev mount) together. Would there be three separate parameters or a signle devAccess or smth 🤨 Thoughts?

changzhi1990 · 2023-09-13T01:16:20Z

Hi, @marquiz . I tried to use capabilities to grant privileges to access the /dev/isst_interface device. But I met a problem when using capabilities like:

          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              add:
                - AUDIT_CONTROL
                - AUDIT_READ
                - AUDIT_WRITE
                - BLOCK_SUSPEND
                - CHOWN
                - DAC_OVERRIDE
                - DAC_READ_SEARCH
                - FOWNER
                - FSETID
                - IPC_LOCK
                - IPC_OWNER
                - KILL
                - LEASE
                - LINUX_IMMUTABLE
                - MAC_ADMIN
                - MAC_OVERRIDE
                - MKNOD
                - NET_ADMIN
                - NET_BIND_SERVICE
                - NET_BROADCAST
                - NET_RAW
                - SETGID
                - SETFCAP
                - SETPCAP
                - SETUID
                - SYS_ADMIN
                - SYS_BOOT
                - SYS_CHROOT
                - SYS_MODULE
                - SYS_NICE
                - SYS_PACCT
                - SYS_PTRACE
                - SYS_RAWIO
                - SYS_RESOURCE
                - SYS_TIME
                - SYS_TTY_CONFIG
                - SYSLOG

In this securitycontext, I didn't know which capabilities I should add so I added all the capabilities to the nri pod. But the nri pod still reported an error:

W0912 08:17:25.346297       1 system.go:297] failed to get SST info for package 1: failed to read SST PP info: Mbox command failed with failed to open is
st device "/host/dev/isst_interface": open /host/dev/isst_interface: operation not permitted

Do you have any ideas on that?

changzhi1990 · 2023-09-13T08:24:32Z

If I don't add extra capabilities to the nri pod. The grep command output is:

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ grep Cap /proc/1340183/status
CapInh: 0000000000000000
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ capsh --decode=00000000a80425fb
WARNING: libcap needs an update (cap=40 should have a name).
0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap

And, if I add the extra capabilities to the nri pod. The grep command output is:

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ grep Cap /proc/1334365/status
CapInh: 0000000000000000                                                                                                                                                        CapPrm: 00000037ffffffff
CapEff: 00000037ffffffff
CapBnd: 00000037ffffffff
CapAmb: 0000000000000000

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ capsh --decode=00000037ffffffff
WARNING: libcap needs an update (cap=40 should have a name).
0x00000037ffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_
net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_block_suspe
nd,cap_audit_read

So, it seems that all the capabilities have already added to the nri pod. But the nri pod still can't access the /dev/isst_interface device.

marquiz · 2023-09-13T09:28:16Z

So, it seems that all the capabilities have already added to the nri pod. But the nri pod still can't access the /dev/isst_interface device.

Thanks @changzhi1990 for debugging this. Fair enough, seems like we need the privileged mode, then.

I'd still like to understand why that fails. Maybe its the ambient capabilities somehow being required (although I don't quite understand why)...

marquiz · 2023-09-13T13:50:53Z

So, it seems that all the capabilities have already added to the nri pod. But the nri pod still can't access the /dev/isst_interface device.

Thanks @changzhi1990 for debugging this. Fair enough, seems like we need the privileged mode, then.

I'd still like to understand why that fails. Maybe its the ambient capabilities somehow being required (although I don't quite understand why)...

Responding to myself: I think it's because runc uses cgroups to control access to device nodes. To make this work without privileged /dev/isst_interface would need to be added as a device to the container, not just mount :/

changzhi1990 · 2023-09-13T14:10:54Z

So, it seems that all the capabilities have already added to the nri pod. But the nri pod still can't access the /dev/isst_interface device.

Thanks @changzhi1990 for debugging this. Fair enough, seems like we need the privileged mode, then.
I'd still like to understand why that fails. Maybe its the ambient capabilities somehow being required (although I don't quite understand why)...

Responding to myself: I think it's because runc uses cgroups to control access to device nodes. To make this work without privileged /dev/isst_interface would need to be added as a device to the container, not just mount :/

Thanks for your response. I will try this.

changzhi1990 · 2023-09-14T01:31:24Z

I have compared the capabilities of two scenarios.

Use "privileged: true"
Add all capabilities

In the scenario1,

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ ps -ef|grep nri
root       29920   29869 15 18:24 ?        00:00:02 /bin/nri-resource-policy-topology-aware --host-root /host --fallback-config /etc/nri-resource-policy/nri-resource-policy.cfg --pid-file /tmp/nri-resource-policy.pid -metrics-interval 5s
sdp        30432    3890  0 18:24 pts/0    00:00:00 grep --color=auto nri
sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ grep Cap /proc/29920/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ grep Cap /proc/29869/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ capsh --decode=000001ffffffffff
WARNING: libcap needs an update (cap=40 should have a name).
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39,40

In the scenario2,

sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ ps -ef|grep nri
root       25523   25395 11 18:22 ?        00:00:03 /bin/nri-resource-policy-topology-aware --host-root /host --fallback-config /etc/nri-resource-policy/nri-resource-policy.cfg --pid-file /tmp/nri-resource-policy.pid -metrics-interval 5s
sdp        27017    3890  0 18:22 pts/0    00:00:00 grep --color=auto nri
sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ grep Cap /proc/25523/status
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ grep Cap /proc/25395/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ capsh --decode=000003ffffffffff
WARNING: libcap needs an update (cap=40 should have a name).
0x000003ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39,40,41
sdp@b49691a74bec:~/zhi/own/nri-plugins/deployment/helm/resource-management-policies/topology-aware$ capsh --decode=000001ffffffffff
WARNING: libcap needs an update (cap=40 should have a name).
0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39,40

The only difference between these two scenarios is that there is an extra "41" capabilities in scenario2.

changzhi1990 · 2023-09-14T02:53:21Z

Maybe.... we need an sst device plugin, like dsb device plugin, gpu device plugin, etc?

marquiz · 2023-09-14T06:51:03Z

Thanks for your response. I will try this.

@changzhi1990, ah sorry, I was a bit "equivocal" on this. I was merely speaking to myself because there is nothing else than privileged mode in the current setup that can make it work.

Maybe.... we need an sst device plugin, like dsb device plugin, gpu device plugin, etc?

Yes, this would be a way to do it. Another possible solution would be running another NRI plugin before this one that would insert the isst_interface device in the resource-policy container but let's not go there yet...

changzhi1990 · 2023-09-14T06:57:29Z

Thanks for your response. I will try this.

@changzhi1990, ah sorry, I was a bit "equivocal" on this. I was merely speaking to myself because there is nothing else than privileged mode in the current setup that can make it work.

Maybe.... we need an sst device plugin, like dsb device plugin, gpu device plugin, etc?

Yes, this would be a way to do it. Another possible solution would be running another NRI plugin before this one that would insert the isst_interface device in the resource-policy container but let's not go there yet...

Alright, Is worth to create an sst plugin? I mean maybe there are a lot of works to do

marquiz · 2023-09-14T07:33:00Z

Alright, Is worth to create an sst plugin? I mean maybe there are a lot of works to do

Talking about another NRI plugin @klihub had an idea about this (it seems that there already exists a sample plugin for this purpose, i.e. injecting a device) so that could be the solution, indeed. Let @klihub provide the details here

klihub · 2023-09-14T08:53:57Z

@changzhi1990 You should be able to get read-only access to /dev/sst_interface using NRI itself and an(other) NRI plugin and without requiring any cpabilities or using a privileged container. We have a sample NRI plugin which can inject devices and the necessary cgroup access rules for them based on annotations.

For instance, I used that plugin and this following pod spec to test if I can do 'SST discovery':

apiVersion: v1
kind: Pod
metadata:
  name: sst-test
  annotations:
    devices.nri.io/container.c0: |+
      - path: /dev/isst_interface
        type: c
        major: 10
        minor: 123
        file_mode: 0o600
spec:
  containers:
  - name: c0
    image: quay.io/marquiz/goresctrl:test
    command:
      - sh
      - -c
      - sleep 3600
    resources:
      requests:
        cpu: 250m
        memory: 200M
      limits:
        cpu: 500m
        memory: 200M
    securityContext:
      privileged: false
      capabilities:
        drop:
          - all
    imagePullPolicy: IfNotPresent

Just run the device-injector plugin (for instance ./device-injector --idx 10 -name device-injector manually for testing) before starting the pod. Then you should be able to do this:

klitkey1@emr-1:~/xfer$ kubectl apply -f sst-test.yaml
klitkey1@emr-1:~/xfer$ kubectl exec -ti sst-test -c c0 -- /bin/bash --login
root@sst-test:/go/builder# /go/bin/sst-ctl info
...
  PPCurrentLevel: 0
  PPLocked: true
  PPMaxLevel: 4
  PPSupported: true
  PPVersion: 3
  TFEnabled: false
  TFSupported: true
...

This sample plugin was not really meant for production. It's merely a sample plugin which demonstrate some of NRI's capabilities. Anyway, if you'd like to use this plugin or you roll your own, you can enable it permanently on your cluster nodes by symlinking it into /opt/nri/plugins:

klitkey1@emr-1:~/xfer$ sudo mkdir -p /opt/nri/plugins
sudo ln -s $(pwd)/device-injector /opt/nri/plugins/10-device-injector
klitkey1@emr-1:~/xfer$ sudo systemctl restart containerd # or crio

After this you should be able to annotate your pods for device injection for testing...

In a production environment you might want to restrict somehow (for instance by namespaces) which pods can be annotated with injected devices. Also, you might want to deploy that plugin itself as a DaemonSet instead of having to install it separately on each of your worker nodes. If there is enough interest, we can consider polishing that plugin, adding any necessary mechanisms for restricting access to annotation-based device injection, etc. and creating images and other deployment artifacts for it within the nri-plugins repo.

changzhi1990 · 2023-09-18T00:32:30Z

@changzhi1990 You should be able to get read-only access to /dev/sst_interface using NRI itself and an(other) NRI plugin and without requiring any cpabilities or using a privileged container. We have a sample NRI plugin which can inject devices and the necessary cgroup access rules for them based on annotations.

For instance, I used that plugin and this following pod spec to test if I can do 'SST discovery':
apiVersion: v1
kind: Pod
metadata:
  name: sst-test
  annotations:
    devices.nri.io/container.c0: |+
      - path: /dev/isst_interface
        type: c
        major: 10
        minor: 123
        file_mode: 0o600
spec:
  containers:
  - name: c0
    image: quay.io/marquiz/goresctrl:test
    command:
      - sh
      - -c
      - sleep 3600
    resources:
      requests:
        cpu: 250m
        memory: 200M
      limits:
        cpu: 500m
        memory: 200M
    securityContext:
      privileged: false
      capabilities:
        drop:
          - all
    imagePullPolicy: IfNotPresent
Just run the device-injector plugin (for instance ./device-injector --idx 10 -name device-injector manually for testing) before starting the pod. Then you should be able to do this:
klitkey1@emr-1:~/xfer$ kubectl apply -f sst-test.yaml
klitkey1@emr-1:~/xfer$ kubectl exec -ti sst-test -c c0 -- /bin/bash --login
root@sst-test:/go/builder# /go/bin/sst-ctl info
...
  PPCurrentLevel: 0
  PPLocked: true
  PPMaxLevel: 4
  PPSupported: true
  PPVersion: 3
  TFEnabled: false
  TFSupported: true
...
This sample plugin was not really meant for production. It's merely a sample plugin which demonstrate some of NRI's capabilities. Anyway, if you'd like to use this plugin or you roll your own, you can enable it permanently on your cluster nodes by symlinking it into /opt/nri/plugins:
klitkey1@emr-1:~/xfer$ sudo mkdir -p /opt/nri/plugins
sudo ln -s $(pwd)/device-injector /opt/nri/plugins/10-device-injector
klitkey1@emr-1:~/xfer$ sudo systemctl restart containerd # or crio
After this you should be able to annotate your pods for device injection for testing...

In a production environment you might want to restrict somehow (for instance by namespaces) which pods can be annotated with injected devices. Also, you might want to deploy that plugin itself as a DaemonSet instead of having to install it separately on each of your worker nodes. If there is enough interest, we can consider polishing that plugin, adding any necessary mechanisms for restricting access to annotation-based device injection, etc. and creating images and other deployment artifacts for it within the nri-plugins repo.

Hi, Thanks for your detailed reply. According to your message, maybe we have two options for this issue.

Start a device-injector plugin before the NRI pod starts.
Polishing the plugin and adding some necessary mechanisms.

klihub · 2024-08-26T11:04:46Z

Closing this as ancient stale.

Add privileged mode for SST

7d61349

The NRI pod needs to access the /dev/isst_interface device in the privileged mode.

Merge branch 'main' into add_privileged_mode

6809515

marquiz reviewed Sep 11, 2023

View reviewed changes

klihub closed this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add privileged mode for SST #105

Add privileged mode for SST #105

changzhi1990 commented Aug 16, 2023

changzhi1990 commented Aug 16, 2023

changzhi1990 commented Sep 11, 2023

marquiz Sep 11, 2023

marquiz Sep 11, 2023

changzhi1990 commented Sep 13, 2023 •

edited

Loading

changzhi1990 commented Sep 13, 2023 •

edited

Loading

marquiz commented Sep 13, 2023

marquiz commented Sep 13, 2023

changzhi1990 commented Sep 13, 2023

changzhi1990 commented Sep 14, 2023

changzhi1990 commented Sep 14, 2023

marquiz commented Sep 14, 2023

changzhi1990 commented Sep 14, 2023

marquiz commented Sep 14, 2023

klihub commented Sep 14, 2023 •

edited

Loading

changzhi1990 commented Sep 18, 2023

klihub commented Aug 26, 2024

Add privileged mode for SST #105

Add privileged mode for SST #105

Conversation

changzhi1990 commented Aug 16, 2023

changzhi1990 commented Aug 16, 2023

changzhi1990 commented Sep 11, 2023

marquiz Sep 11, 2023

Choose a reason for hiding this comment

marquiz Sep 11, 2023

Choose a reason for hiding this comment

changzhi1990 commented Sep 13, 2023 • edited Loading

changzhi1990 commented Sep 13, 2023 • edited Loading

marquiz commented Sep 13, 2023

marquiz commented Sep 13, 2023

changzhi1990 commented Sep 13, 2023

changzhi1990 commented Sep 14, 2023

changzhi1990 commented Sep 14, 2023

marquiz commented Sep 14, 2023

changzhi1990 commented Sep 14, 2023

marquiz commented Sep 14, 2023

klihub commented Sep 14, 2023 • edited Loading

changzhi1990 commented Sep 18, 2023

klihub commented Aug 26, 2024

changzhi1990 commented Sep 13, 2023 •

edited

Loading

changzhi1990 commented Sep 13, 2023 •

edited

Loading

klihub commented Sep 14, 2023 •

edited

Loading