Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Identify and apply changes on ray-cluster #2534

Open
1 of 2 tasks
AvihaiSam opened this issue Nov 12, 2024 · 8 comments
Open
1 of 2 tasks

[Feature] Identify and apply changes on ray-cluster #2534

AvihaiSam opened this issue Nov 12, 2024 · 8 comments
Labels
enhancement New feature or request triage

Comments

@AvihaiSam
Copy link

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

when using helm upgrade on ray-cluster, the operator does not identify+apply changes.
instead - it does nothing.

workaround:
helm uninstall + helm install

Use case

change helm values -> helm upgrade

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@AvihaiSam AvihaiSam added enhancement New feature or request triage labels Nov 12, 2024
@win5923
Copy link
Contributor

win5923 commented Nov 12, 2024

Do you have a reproduction script?

@AvihaiSam
Copy link
Author

helm install --set image.tag=X ray-cluster ...
helm upgrade --set image.tag=Y ....

head pod doesn't get replaced...

@win5923
Copy link
Contributor

win5923 commented Nov 16, 2024

When I tried to perform helm upgrade to modify the image tag, I noticed the following log:

{"level":"info","ts":"2024-11-16T02:14:55.421Z","logger":"controllers.RayCluster","msg":"reconcileHeadService","RayCluster":{"name":"raycluster-kuberay","namespace":"default"},"reconcileID":"a6147489-026b-411c-bc99-c4cd92ad2d0a","1 head service found":"raycluster-kuberay-head-svc"}
{"level":"info","ts":"2024-11-16T02:14:55.422Z","logger":"controllers.RayCluster","msg":"reconcilePods","RayCluster":{"name":"raycluster-kuberay","namespace":"default"},"reconcileID":"a6147489-026b-411c-bc99-c4cd92ad2d0a","Found 1 head Pod":"raycluster-kuberay-head-knskh","Pod status":"Running","Pod status reason":"","Pod restart policy":"Always","Ray container terminated status":"nil"}
{"level":"info","ts":"2024-11-16T02:14:55.422Z","logger":"controllers.RayCluster","msg":"reconcilePods","RayCluster":{"name":"raycluster-kuberay","namespace":"default"},"reconcileID":"a6147489-026b-411c-bc99-c4cd92ad2d0a","head Pod":"raycluster-kuberay-head-knskh","shouldDelete":false,"reason":"KubeRay does not need to delete the head Pod raycluster-kuberay-head-knskh. The Pod status is Running, and the Ray container terminated status is nil."}
{"level":"info","ts":"2024-11-16T02:14:55.422Z","logger":"controllers.RayCluster","msg":"reconcilePods","RayCluster":{"name":"raycluster-kuberay","namespace":"default"},"reconcileID":"a6147489-026b-411c-bc99-c4cd92ad2d0a","desired workerReplicas (always adhering to minReplicas/maxReplica)":1,"worker group":"workergroup","maxReplicas":3,"minReplicas":1,"replicas":1}
{"level":"info","ts":"2024-11-16T02:14:55.422Z","logger":"controllers.RayCluster","msg":"reconcilePods","RayCluster":{"name":"raycluster-kuberay","namespace":"default"},"reconcileID":"a6147489-026b-411c-bc99-c4cd92ad2d0a","worker Pod":"raycluster-kuberay-workergroup-worker-4xsf9","shouldDelete":false,"reason":"KubeRay does not need to delete the worker Pod raycluster-kuberay-workergroup-worker-4xsf9. The Pod status is Running, and the Ray container terminated status is nil."}

This causes the old RayCluster pod to not be deleted. @kevin85421 Do we need to implement this functionality?

@andrewsykim
Copy link
Collaborator

It is expected beheavior that RayCluster does not automatically recreate Pods on changes to the API. However, it is supported to delete the old Pods and have KubeRay automatically recreate with the new spec. RayCluster automatically recreating Pods would be a new feature request we can consider.

@andrewsykim
Copy link
Collaborator

We are introducing a spec.upgradeStrategy API to RayService, I think we can adapt something similar for RayCluster?

@AvihaiSam
Copy link
Author

We are introducing a spec.upgradeStrategy API to RayService, I think we can adapt something similar for RayCluster?

That sounds like a good plan.
I'd expect the operator to replace the cluster on every change...

@andrewsykim
Copy link
Collaborator

@kevin85421 @MortalHappiness what do we think about having the new upgrade strategy API in RayCluster too?

@MortalHappiness
Copy link
Member

MortalHappiness commented Nov 21, 2024

I think strategy==Recreate can be easily implemented, but it may be a little bit tricky to implement RollingUpgrade

Another solution is to introduce a feature gate or a boolean toggle in the RayCluster spec, indicating to the KubeRay operator whether the underlying pods should be recreated whenever the spec changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

4 participants