Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted node is not able to attach to existing PVC once is created again #334

Open
connde opened this issue Jul 10, 2020 · 20 comments
Open

Comments

@connde
Copy link

connde commented Jul 10, 2020

What did you do? (required. The issue will be closed when not provided.)

Deleted a random node in my Rancher cluster to see how Percona Xtradb cluster behaved

What did you expect to happen?

Node to be recreated and Percona node attached to existing PVC

Configuration (MUST fill this out):

  • system logs:

Please provide the following logs:


kubectl cluster-info dump > kubernetes-dump.log

This will output everthing from your cluster. Please use a private gist via
https://gist.github.com/ to share this dump with us
Not able to create a gist, is generating an error on the site but happy to send to an email if needed.

  • manifests, such as pvc, deployments, etc.. you used to reproduce:
    Deployed Percona using OperatorHub.io
    cr.zip

Please provide the total set of manifests that are needed to reproduce the
issue. Just providing the pvc is not helpful. If you cannot provide it due
privacy concerns, please try creating a reproducible case.

  • CSI Version: https://github.com/digitalocean/csi-digitalocean/tree/master/deploy/kubernetes/releases/csi-digitalocean-latest

  • Kubernetes Version: 1.18.3

  • Cloud provider/framework version, if applicable (such as Rancher):
    RancherOS 2.4.5 -> DigitalOcean -> 3 nodes
    Not using DOKS.

    Normal Scheduled 76s default-scheduler Successfully assigned my-percona-xtradb-cluster-operator/cluster-01-pxc-2 to worker-pool2
    Normal SuccessfulAttachVolume 76s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992"
    Warning FailedMount 28s (x7 over 60s) kubelet, worker-pool2 MountVolume.MountDevice failed for volume "pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992" : rpc error: code = Internal desc = formatting disk failed: exit status 1 cmd: 'mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992' output: "mke2fs 1.45.5 (07-Jan-2020)\nThe file /dev/disk/by-id/scsi-0DO_Volume_pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992 does not exist and no size was specified.\n"

Hi, to reproduce create a Percona operator than a CR with 3 nodes, after cluster is running delete a node manually and wait for recreation. Volume will not bind correctly.

image

If I manually attach the volume in DO dashboard and terminate the pod the new pod gets created correctly.

Any help is appreciated.

@timoreimann
Copy link
Contributor

I just tested this on DOKS by directly deleting the droplet hosting a PVC-using pod (managed by a StatefulSet). After the node removal was detected (by our cloud-controller-manager component), the Node object was removed and the workload transferred to a different node, along with the PVC.

To clarify: did you delete the droplet or just the Node object in the cluster?

@connde
Copy link
Author

connde commented Jul 10, 2020

Hi @timoreimann , I'm NOT using DOKS, using RancherOS and deploying the nodes to droplets.

I deleted the node from Rancher UI, it got deleted and created correctly as expected but the PVC did not get attached.

@timoreimann
Copy link
Contributor

@connde thanks. Understood you're not on DOKS -- the behavior should be identical though: as soon as the control plane detects that a node is gone, the workload should be moved elsewhere, including volumes.

To troubleshoot this further, we'll need the logs from your Controller and Node services. Could you share those?

@connde
Copy link
Author

connde commented Jul 10, 2020

@timoreimann I have cluster dump, is it enough?

kubernetes-dump.zip

@timoreimann
Copy link
Contributor

That's perfect, thank you @connde. I'll need a bit of time to work through it, will report back once I'm done.

@connde
Copy link
Author

connde commented Jul 10, 2020 via email

@dlebee
Copy link

dlebee commented Oct 8, 2021

It's ok, no problem if it takes some time. I don't expect that the node will restart, I was just testing what would happen if a node failed.

On Fri, 10 Jul 2020 at 13:39, Timo Reimann @.***> wrote: That's perfect, thank you @connde https://github.com/connde. I'll need a bit of time to work through it, will report back once I'm done. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#334 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALRYPZQ45J6D24YPLCQ72TR24733ANCNFSM4OWFMC2Q .

The issue also occurs when updating the cluster, it's really frustrating that it takes a super long time for Kubernetes to know that a volume is no longer attached.

As anything changed on this issue?

@timoreimann
Copy link
Contributor

@dlebee do you experience the issue when you upgrade using DOKS or a self-hosted Kubernetes?

@dlebee
Copy link

dlebee commented Oct 8, 2021

@dlebee do you experience the issue when you upgrade using DOKS or a self-hosted Kubernetes?

DOKS, I have multiple clusters and it always occurs, before the pvc stays attached to a old node and I have to manually go unmount the volume and wait quite some time each upgrade of k8s, that takes the systems down.

@timoreimann
Copy link
Contributor

timoreimann commented Oct 8, 2021

@dlebee that's definitely not expected. What kind of workload do you use to reference the PVCs? Is it StatefulSets?

Regular Deployments bear the risk of getting to a situation where two replicas are trying to come up, which cannot work when volumes are associated. Just double-checking this isn't the case for you here.

@dlebee
Copy link

dlebee commented Oct 8, 2021

@dlebee that's definitely not expected. What kind of workload do you use to reference the PVCs? Is it StatefulSets?

Regular Deployments bear the risk of getting to a situation where two replicas are trying to come up, which cannot work when volumes are associated. Just double-checking this isn't the case for you here.

Yes they are statefulset like mongodb/mariadb created by helm charts.

@timoreimann
Copy link
Contributor

@dlebee got it. Is it a particular set of DOKS/Kubernetes versions where you saw this happening, or across the board? How old was the oldest version?

@dlebee
Copy link

dlebee commented Oct 8, 2021

@dlebee got it. Is it a particular set of DOKS/Kubernetes versions where you saw this happening, or across the board? How old was the oldest version?

Not really, I've had this issue migrating, and via the web you can only go up one version at a time, and it happened every single time I upgraded a version.

I can give you the details of the k8s, maybe upgrading the kubernetes does not update the CSI driver?

davidlebee@Davids-MacBook-Pro ~ % kubectl get CSIDriver -o yaml
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: CSIDriver
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"storage.k8s.io/v1beta1","kind":"CSIDriver","metadata":{"annotations":{},"name":"dobs.csi.digitalocean.com"},"spec":{"attachRequired":true,"podInfoOnMount":true}}
    creationTimestamp: "2021-02-19T16:26:43Z"
    name: dobs.csi.digitalocean.com
    resourceVersion: "316"
    uid: 9754dccf-b1df-4986-ad6c-a63c228261f8
  spec:
    attachRequired: true
    fsGroupPolicy: ReadWriteOnceWithFSType
    podInfoOnMount: true
    volumeLifecycleModes:
    - Persistent
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

@timoreimann
Copy link
Contributor

@dlebee the CSI components should definitely get upgraded as well. It might be more of an issue of how the upgrade proceeds in your case.

I'll run some extra tests. Appreciate any additional details you may be able to provide (either here, via mail, or on the Kubernetes Slack).

@dlebee
Copy link

dlebee commented Oct 10, 2021

@dlebee the CSI components should definitely get upgraded as well. It might be more of an issue of how the upgrade proceeds in your case.

I'll run some extra tests. Appreciate any additional details you may be able to provide (either here, via mail, or on the Kubernetes Slack).

I have to update a cluster soon, it is currently running 16.16.6-do-2, I’ll let you know how it went.

if you have any questions I’ll be following thread

@dlebee
Copy link

dlebee commented Oct 11, 2021

@dlebee the CSI components should definitely get upgraded as well. It might be more of an issue of how the upgrade proceeds in your case.

I'll run some extra tests. Appreciate any additional details you may be able to provide (either here, via mail, or on the Kubernetes Slack).

I have upgraded a cluster today and did not have the same issue, is the CSI driver updated automatically on upgrades or is it a manual operation that needs to be done if the cluster is older?

@timoreimann
Copy link
Contributor

@dlebee all components are always upgraded automatically, including the CSI driver. You don't have to upgrade or install any of the managed components yourself.

What's worth pointing out is that older CSI driver and Kubernetes versions still contained certain bugs that got addressed in more recent versions. Chances are you are now past the point where those affect you.

@dlebee
Copy link

dlebee commented Oct 11, 2021

I’ll upgrade that cluster specifically and let you know as soon as I can if it the issue is still present.

@dlebee
Copy link

dlebee commented Oct 11, 2021

@timoreimann So the older cluster had pods stuck on terminating and not moving pods stuck in terminating for a long time.

I think the reason is that cluster is actually weaker in resources so it takes longer and I got impatient so I terminated the pods with --force which probably did not alert the CSI driver that the PVC that its no longer bound.

Is there a way to tell k8s faster to release a PVC when a pod is killed by force?

Also another pvc was not reattached during the upgrade that I did not force close, once I released the volume on the website it attached but I had to go release it on the website.

Thank you,
David.

@timoreimann
Copy link
Contributor

@dlebee if I had to guess, I wouldn't think that it's a resource problem: bringing pods down should happen fairly quickly. How long was "long time" for you?

If anything, --force should speed up the detachment process: the CSI driver cannot detach volumes if pods using it are still up (including the terminating state). By removing the pod, the CSI-/volume-related controllers should notice that the volume user has gone away and move forward with detaching.

What would be ideal to have if this happens again is all the events that occurred (kubectl -n <involved namespace> get events), the current node state (kubectl get nodes -o yaml), the involved PVCs / PVs (kubectl -n <involved namespace> get pvc -o yaml / kubectl get pv -o yaml), and the current volume attachments (kubectl get volumeattachment -o yaml); all at the time the pods are stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants