VPA updater errors with messages ~"fail to get pod controller: pod=kube-scheduler-XYZ err=Unhandled targetRef v1 / Node / XYZ, last error node is not a valid owner" #7378

apilny-akamai · 2024-10-10T12:55:04Z

Which component are you using?: vertical-pod-autoscaler

What version of the component are you using?: 1.1.2

Component version:

What k8s version are you using (kubectl version)?: kubectl 1.25

What did you expect to happen?: VPA updater does not error with fail to get pod controller: pod=kube-scheduler-XYZ err=Unhandled targetRef v1 / Node / XYZ, last error node is not a valid owner

What happened instead?: vpa-updater log contains
`
│ E1010 12:38:44.476232 1 api.go:153] fail to get pod controller: pod=kube-apiserver-x-master-1 err=Unhandled targetRef v1 / Node / x-master-1, last error node is not a valid owner │

│ E1010 12:38:44.477788 1 api.go:153] fail to get pod controller: pod=kube-controller-manager-master-1 err=Unhandled targetRef v1 / Node / x-master-1, last error node is not a valid owner │

│ E1010 12:38:44.547767 1 api.go:153] fail to get pod controller: pod=etcd-x-master-1 err=Unhandled targetRef v1 / Node / x-master-1, last error node is not a valid owner │

│ E1010 12:38:44.554646 1 api.go:153] fail to get pod controller: pod=kube-scheduler-x-master-1 err=Unhandled targetRef v1 / Node / x-master-1, last error node is not a valid owner │
`

How to reproduce it (as minimally and precisely as possible):
Update VPA from 0.4 to 1.1.2 and observ the vpa-updater log.

Anything else we need to know?: I've tried to update to 1.2.1 and the error is in the log again. Did not happen with vpa 0.4. I can see this error message also in already fixed issue with panic/SIGSEGV problem but nowhere else.

kube-controller-manager Pod Spec (generated by kubeadm with a very little patch in IPs)

spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cloud-provider=external
    - --cluster-cidr=10.1.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.254.0.0/16
    - --use-service-account-credentials=true
    image: registry.k8s.io/kube-controller-manager:v1.25.16
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10257
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-controller-manager
    resources:
      requests:
        cpu: 200m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10257
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/ca-certificates
      name: etc-ca-certificates
      readOnly: true
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      name: flexvolume-dir
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
    - mountPath: /etc/kubernetes/controller-manager.conf
      name: kubeconfig
      readOnly: true
    - mountPath: /usr/local/share/ca-certificates
      name: usr-local-share-ca-certificates
      readOnly: true
    - mountPath: /usr/share/ca-certificates
      name: usr-share-ca-certificates
      readOnly: true
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/ca-certificates
      type: DirectoryOrCreate
    name: etc-ca-certificates
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
      type: DirectoryOrCreate
    name: flexvolume-dir
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
  - hostPath:
      path: /etc/kubernetes/controller-manager.conf
      type: FileOrCreate
    name: kubeconfig
  - hostPath:
      path: /usr/local/share/ca-certificates
      type: DirectoryOrCreate
    name: usr-local-share-ca-certificates
  - hostPath:
      path: /usr/share/ca-certificates
      type: DirectoryOrCreate
    name: usr-share-ca-certificates

The text was updated successfully, but these errors were encountered:

adrianmoisey · 2024-10-10T16:39:48Z

/area vertical-pod-autoscaler

adrianmoisey · 2024-10-10T16:43:44Z

Would it be possible to see the spec of the Pod that this is failing on?
Which variant of Kubernetes are you running this on?

adrianmoisey · 2024-10-10T19:27:21Z

/triage needs-information

apilny-akamai · 2024-10-11T14:40:27Z

We use standard kubeadm, K8s Rev: v1.25.16. I've updated description with an example Pod Spec.

adrianmoisey · 2024-10-11T17:44:41Z

Hi. It seems like you added the VPA spec. I'm looking for the spec of the Pod kube-controller-manager-master-1

apilny-akamai · 2024-10-14T13:00:36Z

Hi. It seems like you added the VPA spec. I'm looking for the spec of the Pod kube-controller-manager-master-1

Thank you and sorry, fixed in description.

adrianmoisey · 2024-10-14T15:07:02Z

Sorry, I need the metadata too.
I need to see the Owner of this Pod, since that is what the VPA seems to be erroring about

apilny-akamai · 2024-10-15T12:22:47Z

No problem, here are the metadata:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system

adrianmoisey · 2024-10-18T13:44:07Z

The problem here is that this Pod doesn't have an ownerReferences field.
For example:

$ kubectl get pod local-metrics-server-7d8c48bbd8-v5sp5 -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-09-26T10:07:15Z"
  generateName: local-metrics-server-7d8c48bbd8-
  labels:
    app.kubernetes.io/instance: local-metrics-server
    app.kubernetes.io/name: metrics-server
    pod-template-hash: 7d8c48bbd8
  name: local-metrics-server-7d8c48bbd8-v5sp5
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: local-metrics-server-7d8c48bbd8
    uid: 4381b7b3-4206-4ece-aab4-f91b3beceb71
  resourceVersion: "570"
  uid: 0281b5a4-d7dc-4b4a-b59e-f561f3207b31

The VPA requires a Pod to have an owner.

adrianmoisey · 2024-10-30T19:52:36Z

/close

k8s-ci-robot · 2024-10-30T19:52:41Z

@adrianmoisey: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

adrianmoisey · 2024-10-30T19:52:46Z

/assign

Michkov · 2024-11-18T12:11:40Z

We are getting this error with static pods:

  - apiVersion: v1
    controller: true
    kind: Node
    name: test-master-1
    uid: ff9885c0-8c3d-4c59-998e-f8aa7213e65f

It's handled in the code here -

autoscaler/vertical-pod-autoscaler/pkg/target/controller_fetcher/controller_fetcher.go

Lines 289 to 293 in b01bff1

    
           if wellKnownController(groupKind.Kind) == node { 
        
           	// Some pods specify nodes as their owners. This causes performance problems 
        
           	// in big clusters when VPA tries to get all nodes. We know nodes aren't 
        
           	// valid controllers so we can skip trying to fetch them. 
        
           	return nil, fmt.Errorf("node is not a valid owner")

Based on the comment the node controller is skipped on purpose -> in that case it could provide info message with some higher log level, or can be ignored completely. Reporting this as error is confusing.

adrianmoisey · 2024-11-18T12:20:49Z

We are getting this error with static pods:
  - apiVersion: v1
    controller: true
    kind: Node
    name: test-master-1
    uid: ff9885c0-8c3d-4c59-998e-f8aa7213e65f
It's handled in the code here -

autoscaler/vertical-pod-autoscaler/pkg/target/controller_fetcher/controller_fetcher.go

Lines 289 to 293 in b01bff1

if wellKnownController(groupKind.Kind) == node {

// Some pods specify nodes as their owners. This causes performance problems

// in big clusters when VPA tries to get all nodes. We know nodes aren't

// valid controllers so we can skip trying to fetch them.

return nil, fmt.Errorf("node is not a valid owner")

Based on the comment the node controller is skipped on purpose -> in that case it could provide info message with some higher log level, or can be ignored completely. Reporting this as error is confusing.

Correct me if I'm wrong, but the error message is only produced when a VPA object exists that targets Pods that are owned by the Node?
If that's the case, I think the error message is valid, since it's saying that there's a problem.

adrianmoisey · 2024-11-18T12:35:01Z

Also, would it be possible for someone to create steps to reproduce this using kind?

Michkov · 2024-11-18T13:36:44Z

This error is produced when any VPA object exists -> not pointing to static pods.

Unable to reproduce with kind but easy to reproduce with kubeadm. Example how to install - https://blog.radwell.codes/2022/07/single-node-kubernetes-cluster-via-kubeadm-on-ubuntu-22-04/ (kubeadm installation is using old non-existing repos - instead use https://v1-30.docs.kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl)

adrianmoisey · 2024-11-18T13:38:35Z

/reopen

k8s-ci-robot · 2024-11-18T13:38:41Z

@adrianmoisey: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Michkov · 2024-11-18T14:07:23Z

with the kubeadm I can see that ownerReference on node, but the error is not there. Trying to find reproducer.

adrianmoisey · 2024-11-18T14:12:35Z

I can reproduce it in kind.

Start kind cluster
Apply VPA example hamster.yaml
Delete kube-scheduler-kind-control-plane pod in kube-system namespace

I get the following error in the admission-controller logs:

E1118 13:45:09.044165       1 api.go:153] fail to get pod controller: pod=kube-system/kube-scheduler-kind-control-plane err=Unhandled targetRef v1 / Node / kind-control-plane, last error node is not a valid owner

adrianmoisey · 2024-11-18T14:12:54Z

I agree that that shouldn't be bubbled up as an error

apilny-akamai added the kind/bug Categorizes issue or PR as related to a bug. label Oct 10, 2024

k8s-ci-robot added the area/vertical-pod-autoscaler label Oct 10, 2024

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Oct 10, 2024

k8s-ci-robot closed this as completed Oct 30, 2024

k8s-ci-robot assigned adrianmoisey Oct 30, 2024

k8s-ci-robot reopened this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VPA updater errors with messages ~"fail to get pod controller: pod=kube-scheduler-XYZ err=Unhandled targetRef v1 / Node / XYZ, last error node is not a valid owner" #7378

VPA updater errors with messages ~"fail to get pod controller: pod=kube-scheduler-XYZ err=Unhandled targetRef v1 / Node / XYZ, last error node is not a valid owner" #7378

apilny-akamai commented Oct 10, 2024 •

edited

Loading

adrianmoisey commented Oct 10, 2024

adrianmoisey commented Oct 10, 2024

adrianmoisey commented Oct 10, 2024

apilny-akamai commented Oct 11, 2024

adrianmoisey commented Oct 11, 2024

apilny-akamai commented Oct 14, 2024

adrianmoisey commented Oct 14, 2024

apilny-akamai commented Oct 15, 2024 •

edited

Loading

adrianmoisey commented Oct 18, 2024

adrianmoisey commented Oct 30, 2024

k8s-ci-robot commented Oct 30, 2024

adrianmoisey commented Oct 30, 2024

Michkov commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

Michkov commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

k8s-ci-robot commented Nov 18, 2024

Michkov commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

VPA updater errors with messages ~"fail to get pod controller: pod=kube-scheduler-XYZ err=Unhandled targetRef v1 / Node / XYZ, last error node is not a valid owner" #7378

VPA updater errors with messages ~"fail to get pod controller: pod=kube-scheduler-XYZ err=Unhandled targetRef v1 / Node / XYZ, last error node is not a valid owner" #7378

Comments

apilny-akamai commented Oct 10, 2024 • edited Loading

adrianmoisey commented Oct 10, 2024

adrianmoisey commented Oct 10, 2024

adrianmoisey commented Oct 10, 2024

apilny-akamai commented Oct 11, 2024

adrianmoisey commented Oct 11, 2024

apilny-akamai commented Oct 14, 2024

adrianmoisey commented Oct 14, 2024

apilny-akamai commented Oct 15, 2024 • edited Loading

adrianmoisey commented Oct 18, 2024

adrianmoisey commented Oct 30, 2024

k8s-ci-robot commented Oct 30, 2024

adrianmoisey commented Oct 30, 2024

Michkov commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

Michkov commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

k8s-ci-robot commented Nov 18, 2024

Michkov commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

adrianmoisey commented Nov 18, 2024

apilny-akamai commented Oct 10, 2024 •

edited

Loading

apilny-akamai commented Oct 15, 2024 •

edited

Loading