Cronjob Name to long #425

didlawowo · 2024-10-20T19:07:45Z

Hello,
should be nice to setup cronjob name or provide fullnameoveride to prevent this error in helm chart when enable persistence 👍

forbidden,CronJob.batch "in-cluster-kube-image-keeper-registry-garbage-collection" is invalid: metadata.name: Invalid value: "in-cluster-kube-image-keeper-registry-garbage-collection": must be no more than 52 characters

Nicolasgouze · 2024-11-08T13:10:32Z

Hi @didlawowo , we'll shorty fix it. In the timeframe, using a shorter relaease name will unlock you.

didlawowo · 2024-11-13T15:30:24Z

yes i do that .
perhaps i have disabled kuik because often i have a problem with image not found localhost:portkuik. then often it cause crash for deployment.
i'll be happy to show that .

Nicolasgouze · 2024-11-13T15:33:38Z

Hi @didlawowo , would be great to get the logs associated to "image not found localhost:portkuik", yes !

didlawowo · 2024-11-16T08:40:33Z

Hi @didlawowo , would be great to get the logs associated to "image not found localhost:portkuik", yes !

look like

Name:             raycluster-kuberay-workergroup-worker-69rsq
Namespace:        ray
Priority:         0
Service Account:  default
Node:             rtx/192.168.1.29
Start Time:       Sat, 16 Nov 2024 09:11:19 +0100
Labels:           app.kubernetes.io/created-by=kuberay-operator
                  app.kubernetes.io/instance=raycluster
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=kuberay
                  helm.sh/chart=ray-cluster-1.2.2
                  kuik.enix.io/managed=true
                  ray.io/cluster=raycluster-kuberay
                  ray.io/group=workergroup
                  ray.io/identifier=raycluster-kuberay-worker
                  ray.io/is-ray-node=yes
                  ray.io/node-type=worker
Annotations:      ad.datadoghq.com/ray.checks:
                    {
                      "ray": {
                        "instances": [
                          {
                            "openmetrics_endpoint": "http://%%host%%:8080"
                          }
                        ]
                      }
                    }
                  kuik.enix.io/rewrite-images: true
                  original-image-ray-worker: rayproject/ray:2.38.0-py311-gpu
                  original-init-image-wait-gcs-ready: rayproject/ray:2.38.0-py311-gpu
                  ray.io/ft-enabled: false
Status:           Pending
IP:               10.0.3.94
IPs:
  IP:           10.0.3.94
Controlled By:  RayCluster/raycluster-kuberay
Init Containers:
  wait-gcs-ready:
    Container ID:  
    Image:         localhost:7439/rayproject/ray:2.38.0-py311-gpu
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -lc
      --
    Args:
      
                            SECONDS=0
                            while true; do
                              if (( SECONDS <= 120 )); then
                                if ray health-check --address raycluster-kuberay-head-svc.ray.svc.cluster.local:6379 > /dev/null 2>&1; then
                                  echo "GCS is ready."
                                  break
                                fi
                                echo "$SECONDS seconds elapsed: Waiting for GCS to be ready."
                              else
                                if ray health-check --address raycluster-kuberay-head-svc.ray.svc.cluster.local:6379; then
                                  echo "GCS is ready. Any error messages above can be safely ignored."
                                  break
                                fi
                                echo "$SECONDS seconds elapsed: Still waiting for GCS to be ready. For troubleshooting, refer to the FAQ at https://github.com/ray-project/kuberay/blob/master/docs/guidance/FAQ.md."
                              fi
                              sleep 5
                            done
                          
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:     200m
      memory:  256Mi
    Environment:
      FQ_RAY_IP:  raycluster-kuberay-head-svc.ray.svc.cluster.local
      RAY_IP:     raycluster-kuberay-head-svc
    Mounts:
      /tmp/ray from log-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pftj7 (ro)
Containers:
  ray-worker:
    Container ID:  
    Image:         localhost:7439/rayproject/ray:2.38.0-py311-gpu
    Image ID:      
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/bash
      -lc
      --
    Args:
      ulimit -n 65536; ray start  --num-cpus=1  --memory=5000000000  --num-gpus=1  --address=raycluster-kuberay-head-svc.ray.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365 
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:             1
      memory:          5G
      nvidia.com/gpu:  1
    Requests:
      cpu:             1
      memory:          1G
      nvidia.com/gpu:  1
    Liveness:          exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success] delay=30s timeout=2s period=5s #success=1 #failure=120
    Readiness:         exec [bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success] delay=10s timeout=2s period=5s #success=1 #failure=10
    Environment:
      FQ_RAY_IP:                            raycluster-kuberay-head-svc.ray.svc.cluster.local
      RAY_IP:                               raycluster-kuberay-head-svc
      RAY_CLUSTER_NAME:                      (v1:metadata.labels['ray.io/cluster'])
      RAY_CLOUD_INSTANCE_ID:                raycluster-kuberay-workergroup-worker-69rsq (v1:metadata.name)
      RAY_NODE_TYPE_NAME:                    (v1:metadata.labels['ray.io/group'])
      KUBERAY_GEN_RAY_START_CMD:            ray start  --num-cpus=1  --memory=5000000000  --num-gpus=1  --address=raycluster-kuberay-head-svc.ray.svc.cluster.local:6379  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365 
      RAY_PORT:                             6379
      RAY_ADDRESS:                          raycluster-kuberay-head-svc.ray.svc.cluster.local:6379
      RAY_USAGE_STATS_KUBERAY_IN_USE:       1
      REDIS_PASSWORD:                       
      RAY_DASHBOARD_ENABLE_K8S_DISK_USAGE:  1
    Mounts:
      /dev/shm from shared-mem (rw)
      /tmp/ray from log-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pftj7 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  log-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  shared-mem:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1G
  kube-api-access-pftj7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/hostname=rtx
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  27m                  default-scheduler  Successfully assigned ray/raycluster-kuberay-workergroup-worker-69rsq to rtx
  Warning  Failed     26m                  kubelet            Failed to pull image "localhost:7439/rayproject/ray:2.38.0-py311-gpu": failed to pull and unpack image "localhost:7439/rayproject/ray:2.38.0-py311-gpu": failed to copy: httpReadSeeker: failed open: unexpected status code https://localhost:7439/v2/rayproject/ray/blobs/sha256:4bc954eb910af405a9ee95a0d504f14370aa97e028f19f50779604c01b4ea00b: 401 Unauthorized
  Warning  Failed     24m                  kubelet            Failed to pull image "localhost:7439/rayproject/ray:2.38.0-py311-gpu": failed to pull and unpack image "localhost:7439/rayproject/ray:2.38.0-py311-gpu": failed to copy: httpReadSeeker: failed open: unexpected status code https://localhost:7439/v2/rayproject/ray/blobs/sha256:a14a8a8a6ebc3813d37a448205bf2c059e7b0dde5dda741babfffc327f32638c: 401 Unauthorized
  Normal   Pulling    24m (x3 over 27m)    kubelet            Pulling image "localhost:7439/rayproject/ray:2.38.0-py311-gpu"
  Warning  Failed     21m (x4 over 26m)    kubelet            Error: ErrImagePull
  Warning  Failed     21m                  kubelet            Failed to pull image "localhost:7439/rayproject/ray:2.38.0-py311-gpu": failed to pull and unpack image "localhost:7439/rayproject/ray:2.38.0-py311-gpu": failed to resolve reference "localhost:7439/rayproject/ray:2.38.0-py311-gpu": unexpected status from HEAD request to http://localhost:7439/v2/rayproject/ray/manifests/2.38.0-py311-gpu: 504 Gateway Timeout
  Warning  Failed     20m (x8 over 26m)    kubelet            Error: ImagePullBackOff
  Normal   BackOff    111s (x70 over 26m)  kubelet            Back-off pulling image "localhost:7439/rayproject/ray:2.38.0-py311-gpu"

in event of the pod

in kube image keeper no relevant logs

paullaffitte mentioned this issue Nov 15, 2024

Helm installation fails using FlexCD. #437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cronjob Name to long #425

Cronjob Name to long #425

didlawowo commented Oct 20, 2024

Nicolasgouze commented Nov 8, 2024

didlawowo commented Nov 13, 2024

Nicolasgouze commented Nov 13, 2024

didlawowo commented Nov 16, 2024 •

edited

Loading

Cronjob Name to long #425

Cronjob Name to long #425

Comments

didlawowo commented Oct 20, 2024

Nicolasgouze commented Nov 8, 2024

didlawowo commented Nov 13, 2024

Nicolasgouze commented Nov 13, 2024

didlawowo commented Nov 16, 2024 • edited Loading

didlawowo commented Nov 16, 2024 •

edited

Loading