Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatefulSet update doesn't wait for pods to initialize #3329

Open
daniel-wtfoxtrot opened this issue Nov 25, 2024 · 2 comments
Open

StatefulSet update doesn't wait for pods to initialize #3329

daniel-wtfoxtrot opened this issue Nov 25, 2024 · 2 comments
Labels
kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team

Comments

@daniel-wtfoxtrot
Copy link

What happened?

I experience strange behaviour in our Deployments, when updating StatefulSets. The initial pulumi up behaves as expected, everything is deployed and running fine. However when I run a subsequent pulumi up to change the StatefulSets Pod's Image, the change of the StatefulSet is immediately considered successful, without waiting for any of the new updated Pods to start.

@ updating....
 ~  kubernetes:apps/v1:StatefulSet foo-store updating (0s) [diff: ~spec]; [1/2] Waiting for StatefulSet to create Pods (0/0 Pods ready)
 ~  kubernetes:apps/v1:StatefulSet foo-store updating (0s) [diff: ~spec]; StatefulSet initialization complete
 ~  kubernetes:apps/v1:StatefulSet foo-store updating (0s) [diff: ~spec]; 
 ~  kubernetes:apps/v1:StatefulSet foo-store updated (0.85s) [diff: ~spec]; 

At this point new Pods are still pending or haven't even been started, but the StatefulSet update is already considered successful.

Example

  1. Deploy this StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: foo-store
  namespace: stateful-set-issue
  labels:
    app: foo-store
spec:
  replicas: 2
  selector:
    matchLabels:
      app: foo-store
  template:
    metadata:
      labels:
        app: foo-store
    spec:
      containers:
        - name: foo-store-web
          image: registry.k8s.io/e2e-test-images/agnhost:2.40
          resources:
            limits:
              memory: 2Gi
            requests:
              memory: 1Gi
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            timeoutSeconds: 3
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 20
          imagePullPolicy: Always
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      automountServiceAccountToken: false
      imagePullSecrets:
        - name: artifactory
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: foo-store
                topologyKey: kubernetes.io/hostname
      schedulerName: default-scheduler
  serviceName: foo-store
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0
  revisionHistoryLimit: 0
  minReadySeconds: 30
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  1. Change pod image and run pulumi up

Output of pulumi about

CLI
Version 3.110.0
Go Version go1.22.1
Go Compiler gc
Plugins
NAME VERSION
kubernetes 4.13.1
nodejs unknown
postgresql 3.11.1
Host
OS debian
Version 12.5
Arch x86_64
This project is written in nodejs: executable='/usr/local/bin/node' version='v20.11.0'
Current Stack: mr-fix-9004-ops-deployment-sollte-rot-sein-wenn-nicht-alle-p
Found no resources associated with mr-fix-9004-ops-deployment-sollte-rot-sein-wenn-nicht-alle-p
Found no pending operations associated with mr-fix-9004-ops-deployment-sollte-rot-sein-wenn-nicht-alle-p
Backend
Name runner-bar-project-foo-concurrent-0
URL s3://wtf-app-foo-nonprod
User root
Organizations
Token type personal
Dependencies:
NAME VERSION
@gitbeaker/node 35.8.1
@pulumi/kubernetes 4.13.1
@pulumi/postgresql 3.11.1
@pulumi/pulumi 3.121.0
ajv 8.12.0
ajv-formats 2.1.1
git-semver-tags 7.0.1
node-fetch 2.7.0
semver 7.6.2
yaml 2.4.5
zod 3.23.8
@typescript-eslint/eslint-plugin 7.0.1
@typescript-eslint/parser 7.0.1
casual 1.6.2
eslint 8.56.0
eslint-plugin-jest 28.9.0
eslint-plugin-prettier 5.1.3
fetch-mock 9.11.0
jest 29.7.0
p-limit 5.0.0
p-locate 6.0.0
prettier 3.2.5
typescript 5.3.3
Pulumi locates its logs in /tmp by default

Additional context

This behavior started after switching to pulumi-kubernetes v4. So I thought it had something to do with Server-Side apply. As a result I tried to disable SSA, but the behaviour didn't change.

I also tried adding a startupProbe and tried different updateStrategy/minReadySeconds configurations, but nothing helped so far.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@daniel-wtfoxtrot daniel-wtfoxtrot added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 25, 2024
@blampe
Copy link
Contributor

blampe commented Nov 25, 2024

@pulumi/kubernetes 4.13.1

@daniel-wtfoxtrot can you please try the latest v4.18.3 release to see if the problem still exists there?

If that doesn't fix it, please try adding a pulumi.com/waitFor annotation (docs), probably something like pulumi.com/waitFor: "jsonpath={.status.readyReplicas}=1" (or however many replicas you want to wait for).

The default await logic for StatefulSet is somewhat old and might be buggy. The waitFor logic is newer and more generic, and if there is a bug it should be able to work around it.

@blampe blampe added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Nov 25, 2024
@daniel-wtfoxtrot
Copy link
Author

Thanks for your reply!I tried both and it unfortunately didn't help. I think we'll try to migrate to k8s Deployments instead as a workaround.

@pulumi-bot pulumi-bot added needs-triage Needs attention from the triage team and removed awaiting-feedback Blocked on input from the author labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team
Projects
None yet
Development

No branches or pull requests

3 participants