You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes kubectl_manifest TF resources are removed from TF state incorrectly which prevents them from being destroyed by TF. This becomes even more problematic in case a custom destroy TF provisioner is attached to the resource which never fires.
Please consider the log snippet below which captures part of the terraform destroy (specifically when the resource state is refreshed):
2024-06-13T14:59:09.982+0300 [TRACE] Completed graph transform *terraform.RootTransformer with new graph:
kubectl_manifest.karpenter_provisioner[0] - *terraform.NodePlannableResourceInstance
root - terraform.graphNodeRoot
kubectl_manifest.karpenter_provisioner[0] - *terraform.NodePlannableResourceInstance
------
2024-06-13T14:59:09.982+0300 [TRACE] vertex "kubectl_manifest.karpenter_provisioner (expand)": entering dynamic subgraph
2024-06-13T14:59:09.982+0300 [TRACE] vertex "kubectl_manifest.karpenter_provisioner[0]": starting visit (*terraform.NodePlannableResourceInstance)
2024-06-13T14:59:09.983+0300 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-13T14:59:09.985+0300 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.10.1/darwin_amd64/terraform-provider-helm_v2.10.1_x5 pid=68043
2024-06-13T14:59:09.986+0300 [DEBUG] provider: plugin exited
2024-06-13T14:59:09.986+0300 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/helm\"] (close)": visit complete
2024-06-13T14:59:09.986+0300 [TRACE] readResourceInstanceState: reading state for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.986+0300 [TRACE] readResourceInstanceState: reading state for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.986+0300 [TRACE] upgradeResourceState: schema version of kubectl_manifest.karpenter_provisioner[0] is still 1; calling provider "kubectl" for any other minor fixups
2024-06-13T14:59:09.986+0300 [TRACE] GRPCProvider: UpgradeResourceState
2024-06-13T14:59:09.986+0300 [TRACE] upgradeResourceState: schema version of kubectl_manifest.karpenter_node_group_template[0] is still 1; calling provider "kubectl" for any other minor fixups
2024-06-13T14:59:09.986+0300 [TRACE] GRPCProvider: UpgradeResourceState
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to prevRunState for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to prevRunState for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to refreshState for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to refreshState for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for kubectl_manifest.karpenter_node_group_template[0]
kubectl_manifest.karpenter_provisioner[0]: Refreshing state... [id=/apis/karpenter.sh/v1alpha5/provisioners/default]
kubectl_manifest.karpenter_node_group_template[0]: Refreshing state... [id=/apis/karpenter.k8s.aws/v1alpha1/awsnodetemplates/default]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResourceInstance.refresh for kubectl_manifest.karpenter_provisioner[0]
2024-06-13T14:59:09.987+0300 [TRACE] NodeAbstractResourceInstance.refresh for kubectl_manifest.karpenter_node_group_template[0]
2024-06-13T14:59:09.987+0300 [TRACE] GRPCProvider: ReadResource
2024-06-13T14:59:09.987+0300 [TRACE] GRPCProvider: ReadResource
2024-06-13T14:59:09.988+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default Unstructed YAML: map[apiVersion:karpenter.k8s.aws/v1alpha1 kind:AWSNodeTemplate metadata:map[name:default] spec:map[instanceProfile:AmazonEKSTFKarpenterNodeRole-eks-dbaas-sivanovdestroytest securityGroupSelector:map[karpenter.sh/discovery:eks-dbaas-sivanovdestroytest] subnetSelector:map[karpenter.sh/discovery:eks-dbaas-sivanovdestroytest] tags:map[dbaas.nuodb.com/cluster:eks-dbaas-sivanovdestroytest]]]
2024-06-13T14:59:09.988+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default Unstructed YAML: map[apiVersion:karpenter.sh/v1alpha5 kind:Provisioner metadata:map[name:default] spec:map[consolidation:map[enabled:true] limits:map[resources:map[cpu:16 memory:32Gi]] providerRef:map[name:default] requirements:[map[key:node.kubernetes.io/instance-type operator:In values:[t3.medium t4g.medium t3a.medium]] map[key:karpenter.sh/capacity-type operator:In values:[spot]]] ttlSecondsUntilExpired:604800]]
2024-06-13T14:59:09.992+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default fetch from kubernetes
2024-06-13T14:59:09.994+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:09 [DEBUG] default fetch from kubernetes
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.k8s.aws/v1alpha1/awsnodetemplates/default) not found, removing from state
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.sh/v1alpha5/provisioners/default) not found, removing from state
2024-06-13T14:59:10.525+0300 [WARN] Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for kubectl_manifest.karpenter_node_group_template[0] during refresh.
- Root resource was present, but now absent
2024-06-13T14:59:10.525+0300 [WARN] Provider "registry.terraform.io/gavinbunney/kubectl" produced an unexpected new value for kubectl_manifest.karpenter_provisioner[0] during refresh.
- Root resource was present, but now absent
Notice that TF resources kubectl_manifest.karpenter_provisioner[0] and kubectl_manifest.karpenter_node_group_template[0] are reported as not found and removed from state. They are never selected for destruction by TF during the destroy phase which leaves them behind.
Expected behavior
The Kubernetes resource should not be left behind.
Troubleshooting
After doing terraform destroy, the Kubernetes resource is left behind.
$ kubectl get provisioners.karpenter.sh
NAME AGE
default 87m
The relevant code here treats 404 (Not Found) and 410 (Gone) errors equally as resource not found. I was expecting that this might be problematic and looked in the K8s audit logs.
In the K8s API server Audit Trail in CloudWatch, you can see that there are no requests for this resource around 11:59:10 (the terraform client is running in GMT+3 timezone) which makes me think that this is some kind of client caching problem or incorrect URI is used.
The resource was created at 10:46:46.530:
This is the only NOK response (404) before the resource has been created.
This doesn't seem to be specific to cluster-scoped resources since I can find examples for other TF resources with the same behaviour.
$ grep "not found, removing from state" destroy.log
2024-06-13T14:55:26.408+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/kube-system/podlogses/kube) not found, removing from state
2024-06-13T14:55:26.409+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/grafanaagents/agent) not found, removing from state
2024-06-13T14:55:26.508+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/integrations/kube-events) not found, removing from state
2024-06-13T14:55:26.885+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/prometheus/podlogses/logs-prometheus) not found, removing from state
2024-06-13T14:55:26.885+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:26 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/platform-system/podlogses/logs-platform-system) not found, removing from state
2024-06-13T14:55:27.026+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/logsinstances/loki) not found, removing from state
2024-06-13T14:55:27.026+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/karpenter/podlogses/logs-karpenter) not found, removing from state
2024-06-13T14:55:27.028+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/monitoring.grafana.com/v1alpha1/namespaces/loki/podlogses/loki) not found, removing from state
2024-06-13T14:55:27.678+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:27 [WARN] kubernetes resource (/apis/snapshot.storage.k8s.io/v1/volumesnapshotclasses/snap-ebs-delete) not found, removing from state
2024-06-13T14:55:30.121+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:30 [WARN] kubernetes resource (/apis/cert-manager.io/v1/clusterissuers/letsencrypt) not found, removing from state
2024-06-13T14:55:30.313+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:55:30 [WARN] kubernetes resource (/apis/cert-manager.io/v1/namespaces/platform-system/certificates/haproxy-tls-cert) not found, removing from state
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.k8s.aws/v1alpha1/awsnodetemplates/default) not found, removing from state
2024-06-13T14:59:10.525+0300 [DEBUG] provider.terraform-provider-kubectl_v1.14.0: 2024/06/13 14:59:10 [WARN] kubernetes resource (/apis/karpenter.sh/v1alpha5/provisioners/default) not found, removing from state
Environment
EKS version: v1.29
gavinbunney kubectl version: 1.14.0
terraform: v1.5.7
Actual behaviour
Sometimes
kubectl_manifest
TF resources are removed from TF state incorrectly which prevents them from being destroyed by TF. This becomes even more problematic in case a custom destroy TF provisioner is attached to the resource which never fires.Please consider the log snippet below which captures part of the
terraform destroy
(specifically when the resource state is refreshed):Notice that TF resources
kubectl_manifest.karpenter_provisioner[0]
andkubectl_manifest.karpenter_node_group_template[0]
are reported as not found and removed from state. They are never selected for destruction by TF during the destroy phase which leaves them behind.Expected behavior
The Kubernetes resource should not be left behind.
Troubleshooting
After doing
terraform destroy
, the Kubernetes resource is left behind.The relevant code here treats 404 (Not Found) and 410 (Gone) errors equally as resource not found. I was expecting that this might be problematic and looked in the K8s audit logs.
In the K8s API server Audit Trail in CloudWatch, you can see that there are no requests for this resource around 11:59:10 (the terraform client is running in GMT+3 timezone) which makes me think that this is some kind of client caching problem or incorrect URI is used.
The resource was created at 10:46:46.530:
This is the only NOK response (404) before the resource has been created.
This doesn't seem to be specific to cluster-scoped resources since I can find examples for other TF resources with the same behaviour.
Mentioned in #270 (comment)
The text was updated successfully, but these errors were encountered: