Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CSV description #79

Merged
merged 8 commits into from
Jan 30, 2024
Merged

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Aug 23, 2023

Improve FAR CSV description to include how FAR works, why it is needed, what makes it a good remediator, and a recommendation to use it with NHC.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 23, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@razo7 razo7 changed the title [WIP] Update CSV description Update CSV description Sep 19, 2023
@razo7
Copy link
Member Author

razo7 commented Sep 19, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

Improve FAR CSV short description to include that it provide HA and it is done automatically
@razo7 razo7 force-pushed the csv-description branch 2 times, most recently from 22ee7f4 to 2e587ae Compare September 19, 2023 12:55
@razo7
Copy link
Member Author

razo7 commented Sep 19, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

1 similar comment
@razo7
Copy link
Member Author

razo7 commented Sep 20, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

Improve FAR CSV description better introduction to FAR, other remediatiors and NHC usability, and some pros for using FAR
Use FAR TBA link rather than general medik8s.io
@@ -352,7 +360,7 @@ spec:
- baremetal
links:
- name: Fence Agents Remediation
url: https://medik8s.io
url: https://medik8s.io/fence-agents-remediation/fence-agents-remediation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is broken I'd probably wait to merge docs PR first
medik8s/docs#63


FAR is one of the remediator operators by [Medik8s](https://www.medik8s.io/remediation/remediation/),
such as [Self Node Remediation](https://github.com/medik8s/self-node-remediation) and [Machine Deletion Remediation](https://github.com/medik8s/machine-deletion-remediation),
that were designed to run with the Node HealthCheck Operator [(NHC)](https://github.com/medik8s/node-healthcheck-operator) as an external remediator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that external remediator in the context of NHC is confusing.
IIRC it is relevant for MHC who both have an "internal" remediation mechanism and can also use different "external" remediators.
Since NHC is not a remediator by itself it would probably be best not to use this terminology here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it a little bit, let me know if it looks better.

FAR is one of the remediator operators by [Medik8s](https://www.medik8s.io/remediation/remediation/),
such as [Self Node Remediation](https://github.com/medik8s/self-node-remediation) and [Machine Deletion Remediation](https://github.com/medik8s/machine-deletion-remediation),
that were designed to run with the Node HealthCheck Operator [(NHC)](https://github.com/medik8s/node-healthcheck-operator) as an external remediator.
It is recommended to use FAR with NHC for an easier and smoother experience, but it can be used as a standalone remediator for the more experienced user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think the main role of NHC in this context is giving the user a fully automated remediation process

@razo7
Copy link
Member Author

razo7 commented Sep 29, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@razo7
Copy link
Member Author

razo7 commented Sep 29, 2023

/test 4.14-openshift-e2e

hack/auotomate_far_cr.sh Outdated Show resolved Hide resolved
Replace 'availability of workloads' with 'workload availability'
@razo7
Copy link
Member Author

razo7 commented Oct 4, 2023

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@razo7
Copy link
Member Author

razo7 commented Oct 5, 2023

/test 4.14-openshift-e2e

description: Fence Agents Remediation Operator uses well-known agents to fence
and remediate unhealthy nodes. In this process it can minimize downtime for
stateful applications, restores compute capacity if transient failures occur,
and increases the availability of workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "availability" sounds like uncountable to me, so I'd say "improve", more than "increase"

smoother experience,\nbut it can also work as a standalonde remediator for the
more advanced user.\n"
description: |
Fence Agents Remediation (*FAR*) is a Kubernetes operator that *fence* and remediate unhealthy to healthy nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos
"fence" -> "fences"
"remediate" -> "remediates"

more advanced user.\n"
description: |
Fence Agents Remediation (*FAR*) is a Kubernetes operator that *fence* and remediate unhealthy to healthy nodes.
Using a traditional Application Programming Interface (API) FAR runs uses well-known fence agents to remediate a node from an unhealthy state by power cycling the node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: either runs or uses

description: |
Fence Agents Remediation (*FAR*) is a Kubernetes operator that *fence* and remediate unhealthy to healthy nodes.
Using a traditional Application Programming Interface (API) FAR runs uses well-known fence agents to remediate a node from an unhealthy state by power cycling the node,
and afterwards it delets the node's resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: delets -> deletes

Fence Agents Remediation (*FAR*) is a Kubernetes operator that *fence* and remediate unhealthy to healthy nodes.
Using a traditional Application Programming Interface (API) FAR runs uses well-known fence agents to remediate a node from an unhealthy state by power cycling the node,
and afterwards it delets the node's resources.
By doing so, FAR can minimize downtime for stateful applications, restores compute capacity if transient failures occur, and increases the availability of workloads.
Copy link
Contributor

@clobrano clobrano Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. The sentence begins with "FAR can..." so all the other verbs shouldn't use the third form present tense (FAR can minimize, can restore, can increase).
Maybe, you could just remove the first "can" and begin with "FAR minimizes..."?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence begins with "FAR can..." so all the other verbs shouldn't use the third form present tense (FAR can minimize, can restore, can increase).

The can is intended as a best effort from FAR, I will just use V1 of the verbs.


FAR is recommended when a node becomes unhealthy, and we want to completely fence/isolate the node from a cluster, since we can not “trust” the unhealthy node,
to prevent it from accessing the shared resources like [RWO volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes).
Moreover, FAR is *robust* as it can remediate an unhealthy node using a traditional API (e.g., IPMI) while still keeping control plane connectivity,
Copy link
Contributor

@clobrano clobrano Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "traditional API" 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As regular/known API. But maybe the traditional isn't useful here.

@slintes
Copy link
Member

slintes commented Jan 24, 2024

what about using what we already have here: https://www.medik8s.io/remediation/fence-agents-remediation/fence-agents-remediation/ ? That one went through review and discussion already 🙂

@razo7
Copy link
Member Author

razo7 commented Jan 25, 2024

what about using what we already have here: medik8s.io/remediation/fence-agents-remediation/fence-agents-remediation ? That one went through review and discussion already 🙂

Yes, I will use most of it with the addition of Compatability section for awareness of other Medik8s operators and a recommendation for using FAR with NHC

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 25, 2024
@clobrano
Copy link
Contributor

/hold

give time to close other discussions

@razo7
Copy link
Member Author

razo7 commented Jan 25, 2024

The other discussions are old or by you, so I removing the draft for now, and waiting for more comments until all the tests are green (and the PR is still on hold).

@razo7 razo7 marked this pull request as ready for review January 25, 2024 15:18
FAR is one of the remediator operators by [Medik8s](https://www.medik8s.io/remediation/remediation/),
such as [Self Node Remediation](https://github.com/medik8s/self-node-remediation) and [Machine Deletion Remediation](https://github.com/medik8s/machine-deletion-remediation),
that were designed to run with the Node HealthCheck Operator [(NHC)](https://github.com/medik8s/node-healthcheck-operator) which detects an unhealthy node and creates remediation CR.
It is recommended to use FAR with NHC for an easier and smoother experience by fully automate the remediation process, but it can be used as a standalone remediator for the more experienced user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/automate/automating

s/automate/automating
@openshift-ci openshift-ci bot removed the lgtm label Jan 28, 2024
@razo7
Copy link
Member Author

razo7 commented Jan 28, 2024

/test all

@razo7
Copy link
Member Author

razo7 commented Jan 29, 2024

/retest

1 similar comment
@razo7
Copy link
Member Author

razo7 commented Jan 29, 2024

/retest

Copy link
Member

@slintes slintes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

Copy link
Contributor

openshift-ci bot commented Jan 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano, razo7, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [clobrano,razo7,slintes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7
Copy link
Member Author

razo7 commented Jan 30, 2024

/retest

@openshift-merge-bot openshift-merge-bot bot merged commit 2a1d3bd into medik8s:main Jan 30, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants