Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: start moving the resource management docs to concepts #48797

Open
wants to merge 1 commit into
base: dev-1.32
Choose a base branch
from

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Nov 21, 2024

Description

Move the cpu management policies and options docs from tasks to concepts
xref: #48340 (review)

In the 1.32 cycle I don't have capacity to move except for CPU manager, which is the worst offender anyway. Other managers should follow suite, hopefully in the 1.33 cycle already.

Issue

Closes: #38121 (albeit in a different and IMO better way)

@k8s-ci-robot k8s-ci-robot added this to the 1.32 milestone Nov 21, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 21, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nate-double-u for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the language/en Issues or PRs related to English language label Nov 21, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 21, 2024
Copy link

netlify bot commented Nov 21, 2024

👷 Deploy Preview for kubernetes-io-vnext-staging processing.

Name Link
🔨 Latest commit 84d616d
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/673f1cb37f4e95000808f924

@ffromani ffromani changed the title node: move resource management docs to concepts node: start moving the resource management docs to concepts Nov 21, 2024
We have reached a point where the existing CPU management task page is quite hard to follow.
Start moving the resource management concepts to the concept page.

We begin with the CPU management policies, the worst offender right now.
Over time, the plan is to move all the concepts from tasks in the
concepts page.

Signed-off-by: Francesco Romani <[email protected]>
Copy link

netlify bot commented Nov 21, 2024

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit dbdd3ec
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/673f1c378d677600096cf597
😎 Deploy Preview https://deploy-preview-48797--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Nov 21, 2024

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 84d616d
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/673f1cb49f654200082aced6
😎 Deploy Preview https://deploy-preview-48797--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tengqm
Copy link
Contributor

tengqm commented Nov 21, 2024

Totally agree to this refactoring, however, I'd suggest we postpone this because 1.32 is about to be released soon. Propagating this change to localization teams takes some time.

@ffromani
Copy link
Contributor Author

Totally agree to this refactoring, however, I'd suggest we postpone this because 1.32 is about to be released soon. Propagating this change to localization teams takes some time.

cc @sftim so we go ahead with #48340 as-is?

@ffromani
Copy link
Contributor Author

also note this change will impact PRs #48469 and #48356 other than #48340

@ffromani
Copy link
Contributor Author

Totally agree to this refactoring, however, I'd suggest we postpone this because 1.32 is about to be released soon. Propagating this change to localization teams takes some time.

If this helps, what I want to do here is 90% content movement (which I honestly believe helps anyway). @sftim could you please provide your take here about @tengqm 's concern, also considering #48797 (comment) ?

@sftim
Copy link
Contributor

sftim commented Nov 22, 2024

Totally agree to this refactoring, however, I'd suggest we postpone this because 1.32 is about to be released soon. Propagating this change to localization teams takes some time.

The decision is about what to postpone

  • adding the new code
  • fixing the documentation

In an ideal world, we avoid adding features where we don't also have capacity to document them well. That's what commercial product docs often aim for. In the same ideal world, we planned in that refactoring early.

See #48340 (comment) for a compromise option. How does that sound?

@sftim
Copy link
Contributor

sftim commented Nov 22, 2024

If we can get the refactoring done within a few days of now, I'd prefer to land it and redo the feature PRs.

@ffromani
Copy link
Contributor Author

If we can get the refactoring done within a few days of now, I'd prefer to land it and redo the feature PRs.

ok, fair point. Let's timebox this attempt. I'll need some sig-node reviews and I asked for them: https://kubernetes.slack.com/archives/C0BP8PW9G/p1732284959539839
If i can get them and get #48797 in good shape we can see how we can serialize the merges here. Otherwise I'll push to make it ready for early 1.33 merge (or when the floodgates open anyway)

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

responsible for these optimizations. The the overall resource management process is governed using
its [policy](/docs/tasks/administer-cluster/topology-manager/).

## CPU Management Policies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a tweak on this one:

Suggested change
## CPU Management Policies
## Policies for assigning CPUs to Pods

@@ -13,10 +13,234 @@ In order to support latency-critical and high-throughput workloads, Kubernetes o

<!-- body -->

The main manager, the Topology Manager, is a Kubelet component that co-ordinates the overall resource management process through its [policy](/docs/tasks/administer-cluster/topology-manager/).
## Hardware Topology Alignment policies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(extra optional change)

Suggested change
## Hardware Topology Alignment policies
## Hardware topology alignment policies

Comment on lines +18 to +20
_Topology Manager_ is a kubelet component that aims to coordinate the set of components that are
responsible for these optimizations. The the overall resource management process is governed using
its [policy](/docs/tasks/administer-cluster/topology-manager/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra optional change

Suggested change
_Topology Manager_ is a kubelet component that aims to coordinate the set of components that are
responsible for these optimizations. The the overall resource management process is governed using
its [policy](/docs/tasks/administer-cluster/topology-manager/).
the policy you specify. To learn more, read
[Control Topology Management Policies on a Node](/docs/tasks/administer-cluster/topology-manager/).

## CPU Management Policies

{{< feature-state for_k8s_version="v1.26" state="stable" >}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional extra change

Suggested change
Once a Pod is bound to a Node, the kubelet on that node may need to either multiplex the existing
hardware (for example, sharing CPUs across multiple Pods) or allocate hardware by dedicating some
resource (for example, assigning one of more CPUs for a Pod's exclusive use).

Comment on lines +50 to +52
{{< note >}}
CPU Manager doesn't support offlining and onlining of CPUs at runtime.
{{< /note >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional extra change

Suggested change
{{< note >}}
CPU Manager doesn't support offlining and onlining of CPUs at runtime.
{{< /note >}}
CPU Manager doesn't support offlining and onlining of CPUs at runtime.

policy and does not apply to hardware where the number of sockets is greater
than number of NUMA nodes.

##### distribute-cpus-across-cores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional extra change

Suggested change
##### distribute-cpus-across-cores
##### `distribute-cpus-across-cores`

management policies to determine some placement preferences on the node.

### Configuration
## CPU Management Policies configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional extra change

Suggested change
## CPU Management Policies configuration
## Configuring CPU management policies

state file `cpu_manager_state` in the kubelet root directory.
{{< /note >}}

#### None policy configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### None policy configuration
#### `none` policy configuration

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is right.

container's resource limit for the CPU resource is an integer greater than or
equal to one. The `nginx` container is granted 2 exclusive CPUs.

#### Static policy options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional extra change

Suggested change
#### Static policy options
#### Static policy options {#cpu-policy-static--options}

(sic)

#### Static policy options

The behavior of the static policy can be fine-tuned using the CPU Manager policy options.
The following policy options exist for the static `CPUManager` policy.
Copy link
Contributor

@sftim sftim Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional extra change

Suggested change
The following policy options exist for the static `CPUManager` policy.
The behavior of the static policy can be fine-tuned using CPU manager policy options.
The following policy options exist for the static CPU management policy:
{{/* options in alphabetical order */}}
`align-by-socket` (alpha, hidden by default)
: Align CPUs by physical package / socket boundary, rather than logical NUMA boundaries (available since Kubernetes v1.25)
`distribute-cpus-across-cores` (alpha, hidden by default)
: allocate virtual cores, sometimes called hardware threads, across different physical cores (1.31 or higher)
`distribute-cpus-across-numa` (alpha, hidden by default)
: spread CPUs across different NUMA domains, aiming for an even balance between the selected domains (available since Kubernetes v1.23)
`full-pcpus-only` (beta, visible by default)
: Always allocate full physical cores (available since Kubernetes v1.22)
You can toggle groups of options on and off based upon their maturity level
using the following feature gates:
* `CPUManagerPolicyBetaOptions` (default enabled). Disable to hide beta-level options.
* `CPUManagerPolicyAlphaOptions` (default disabled). Enable to show alpha-level options.
You will still have to enable each option using the `cpuManagerPolicyOptions` field in the
kubelet configuration file.
For more detail about the individual options you can configure, read on.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: df88e29f6338dc3d61435521fba556715bf73816

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants