diff --git a/README.md b/README.md index f8056022..0653a967 100644 --- a/README.md +++ b/README.md @@ -192,6 +192,9 @@ The CR includes the following parameters: * `retrycount` - number of times to retry the fence agent in case of failure. The default is 5. * `retryinterval` - interval between retries in seconds. The default is "5s". * `timeout` - timeout for the fence agent in seconds. The default is "60s". +* `remediationStrategy` - either `OutOfServiceTaint` or `ResourceDeletion`: + * `OutOfServiceTaint`: This remediation strategy implicitly causes the deletion of the pods and the detachment of the associated volumes on the node. It achieves this by placing the [`OutOfServiceTaint` taint](https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service) on the node. + * `ResourceDeletion`: This remediation strategy deletes the pods on the node. The FenceAgentsRemediation CR is created by the administrator and is used to trigger the fence agent on a specific node. The CR includes an *agent* field for the fence agent name, *sharedparameters* field with all the shared, not specific to a node, parameters, and a *nodeparameters* field to specify the parameters for the fenced node. For better understanding please see the below example of FenceAgentsRemediation CR for node `worker-1` (see it also as the [sample FAR](https://github.com/medik8s/fence-agents-remediation/blob/main/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml)): @@ -220,6 +223,7 @@ spec: worker-0: "6233" worker-1: "6234" worker-2: "6235" + remediationStrategy: ResourceDeletion ``` ## Tests diff --git a/api/v1alpha1/fenceagentsremediation_types.go b/api/v1alpha1/fenceagentsremediation_types.go index 89ef2a00..f0c21605 100644 --- a/api/v1alpha1/fenceagentsremediation_types.go +++ b/api/v1alpha1/fenceagentsremediation_types.go @@ -96,6 +96,7 @@ type FenceAgentsRemediationSpec struct { // that enables automatic deletion of pv-attached pods on failed nodes, "out-of-service" taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version 4.13+. // +kubebuilder:default:="ResourceDeletion" // +kubebuilder:validation:Enum=ResourceDeletion;OutOfServiceTaint + // +operator-sdk:csv:customresourcedefinitions:type=spec RemediationStrategy RemediationStrategyType `json:"remediationStrategy,omitempty"` } diff --git a/bundle/manifests/fence-agents-remediation.clusterserviceversion.yaml b/bundle/manifests/fence-agents-remediation.clusterserviceversion.yaml index 4b658842..34f82019 100644 --- a/bundle/manifests/fence-agents-remediation.clusterserviceversion.yaml +++ b/bundle/manifests/fence-agents-remediation.clusterserviceversion.yaml @@ -22,6 +22,7 @@ metadata: "worker-2": "6235" } }, + "remediationStrategy": "ResourceDeletion", "retrycount": 5, "retryinterval": "5s", "sharedparameters": { @@ -83,6 +84,16 @@ spec: node that is fenced, since they are node specific displayName: Node Parameters path: nodeparameters + - description: RemediationStrategy is the remediation method for unhealthy nodes. + Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion". + ResourceDeletion will iterate over all pods related to the unhealthy node + and delete them. OutOfServiceTaint will add the out-of-service taint which + is a new well-known taint "node.kubernetes.io/out-of-service" that enables + automatic deletion of pv-attached pods on failed nodes, "out-of-service" + taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version + 4.13+. + displayName: Remediation Strategy + path: remediationStrategy - description: RetryCount is the number of times the fencing agent will be executed displayName: Retry Count path: retrycount @@ -129,6 +140,16 @@ spec: node that is fenced, since they are node specific displayName: Node Parameters path: template.spec.nodeparameters + - description: RemediationStrategy is the remediation method for unhealthy nodes. + Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion". + ResourceDeletion will iterate over all pods related to the unhealthy node + and delete them. OutOfServiceTaint will add the out-of-service taint which + is a new well-known taint "node.kubernetes.io/out-of-service" that enables + automatic deletion of pv-attached pods on failed nodes, "out-of-service" + taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version + 4.13+. + displayName: Remediation Strategy + path: template.spec.remediationStrategy - description: RetryCount is the number of times the fencing agent will be executed displayName: Retry Count path: template.spec.retrycount diff --git a/config/manifests/bases/fence-agents-remediation.clusterserviceversion.yaml b/config/manifests/bases/fence-agents-remediation.clusterserviceversion.yaml index b1539a23..d75d98b4 100644 --- a/config/manifests/bases/fence-agents-remediation.clusterserviceversion.yaml +++ b/config/manifests/bases/fence-agents-remediation.clusterserviceversion.yaml @@ -39,6 +39,16 @@ spec: node that is fenced, since they are node specific displayName: Node Parameters path: nodeparameters + - description: RemediationStrategy is the remediation method for unhealthy nodes. + Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion". + ResourceDeletion will iterate over all pods related to the unhealthy node + and delete them. OutOfServiceTaint will add the out-of-service taint which + is a new well-known taint "node.kubernetes.io/out-of-service" that enables + automatic deletion of pv-attached pods on failed nodes, "out-of-service" + taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version + 4.13+. + displayName: Remediation Strategy + path: remediationStrategy - description: RetryCount is the number of times the fencing agent will be executed displayName: Retry Count path: retrycount @@ -85,6 +95,16 @@ spec: node that is fenced, since they are node specific displayName: Node Parameters path: template.spec.nodeparameters + - description: RemediationStrategy is the remediation method for unhealthy nodes. + Currently, it could be either "OutOfServiceTaint" or "ResourceDeletion". + ResourceDeletion will iterate over all pods related to the unhealthy node + and delete them. OutOfServiceTaint will add the out-of-service taint which + is a new well-known taint "node.kubernetes.io/out-of-service" that enables + automatic deletion of pv-attached pods on failed nodes, "out-of-service" + taint is only supported on clusters with k8s version 1.26+ or OCP/OKD version + 4.13+. + displayName: Remediation Strategy + path: template.spec.remediationStrategy - description: RetryCount is the number of times the fencing agent will be executed displayName: Retry Count path: template.spec.retrycount diff --git a/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml b/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml index 84f021fa..41002450 100644 --- a/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml +++ b/config/samples/fence-agents-remediation_v1alpha1_fenceagentsremediation.yaml @@ -21,3 +21,4 @@ spec: worker-0: "6233" worker-1: "6234" worker-2: "6235" + remediationStrategy: ResourceDeletion