Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Balance based on assigned resources instead of current usage #16

Closed
daanbosch opened this issue Jul 11, 2024 · 11 comments · Fixed by #23
Closed

Feature: Balance based on assigned resources instead of current usage #16

daanbosch opened this issue Jul 11, 2024 · 11 comments · Fixed by #23
Assignees
Labels
Milestone

Comments

@daanbosch
Copy link
Contributor

Overview

For my use case, virtual machines (VMs) often exhibit bursty behavior, and moving them is not always feasible due to business constraints. Therefore, I request the ability to balance load based on the assigned CPU and memory resources instead of the current usage metrics.

Task

Implement functionality in Proxmox that allows load balancing to consider the assigned CPU and memory resources for VMs, rather than relying solely on current usage values.

  • Modify the load balancing algorithm to incorporate the assigned CPU and memory resources of VMs.
    Ensure the algorithm can dynamically allocate VMs to hosts based on these assigned resource values.
    Configuration Options:

  • Provide configuration settings to toggle between using current usage and assigned resource values for load balancing.

@gyptazy
Copy link
Owner

gyptazy commented Jul 12, 2024

Hey @daanbosch,

thanks for this feature request. I will check how much changes are required to implement this and check if this is doable for release 1.0.0 or 1.1.0. Will update this request soon with more information.

Thanks,
gyptazy

@gyptazy
Copy link
Owner

gyptazy commented Jul 12, 2024

Hey @daanbosch

VM Rebalancing by Total Value

With the new param mode which can be defined in the config file, it can now be defined if rebalancing should be done by used (default) or total resources.

This is currently available in PR #19 and should be merged soon. It will take place with release 1.0.0.

@daanbosch Can you please give it a try and let me know if I fully understood your request for this feature? Thanks!

Cheers,
gyptazy

@gyptazy gyptazy self-assigned this Jul 12, 2024
@gyptazy gyptazy added this to the Release 1.0.0 milestone Jul 12, 2024
@daanbosch
Copy link
Contributor Author

Oh amazing! Going to test this right away!

@daanbosch
Copy link
Contributor Author

Hmm the number I'm getting are pretty odd:

<6> ProxLB: Info: [logger]: Logger verbosity got updated to: INFO.
<4> ProxLB: Warning: [api-connection]: API connection does not verify SSL certificate.
<6> ProxLB: Info: [api-connection]: API connection succeeded to host: <redacted>.
<6> ProxLB: Info: [node-statistics]: Added node node2.
<6> ProxLB: Info: [node-statistics]: Added node node1.
<6> ProxLB: Info: [node-statistics]: Added node node3.
<6> ProxLB: Info: [node-statistics]: Created node statistics.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb2.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb3.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb.
<6> ProxLB: Info: [vm-statistics]: Created VM statistics.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done for method: memory.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done by: total resources.
<6> ProxLB: Info: [rebalancing-calculator]: Balanciness is set to: 1.
<6> ProxLB: Info: [balancing-method-validation]]: Valid balancing method: memory
<6> ProxLB: Info: [balanciness-validation]: Rebalancing is for memory is not needed. Highest usage: 98% | Lowest usage: 98
<6> ProxLB: Info: [rebalancing-calculator]: Balancing calculations done.
<6> ProxLB: Info: [rebalancing-executor]: Starting dry-run to rebalance vms to their new nodes.
<6> ProxLB: Info: [rebalancing-executor]: No rebalancing needed according to the defined balanciness.
No rebalancing needed according to the defined balanciness.
<6> ProxLB: Info: [post-validations]: All post-validations succeeded.
<6> ProxLB: Info: [daemon]: Not running in daemon mode. Quitting.

Settings:

[proxmox]
api_host: <redacted>
api_user: <redacted>
api_pass: <redacted>
verify_ssl: 0
[balancing]
method: memory
ignore_nodes: none
ignore_vms: none
balanciness: 1
mode: total
[service]
daemon: 0
schedule: 24
log_verbosity: INFO

Also tried it with CPU:

<6> ProxLB: Info: [logger]: Logger verbosity got updated to: INFO.
<4> ProxLB: Warning: [api-connection]: API connection does not verify SSL certificate.
<6> ProxLB: Info: [api-connection]: API connection succeeded to host: <redacted>.
<6> ProxLB: Info: [node-statistics]: Added node node3.
<6> ProxLB: Info: [node-statistics]: Added node node1.
<6> ProxLB: Info: [node-statistics]: Added node node2.
<6> ProxLB: Info: [node-statistics]: Created node statistics.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb3.
<6> ProxLB: Info: [api-get-vm-tags]: Got VM comment from API.
<6> ProxLB: Info: [vm-statistics]: Added vm testproxlb2.
<6> ProxLB: Info: [vm-statistics]: Created VM statistics.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done for method: cpu.
<6> ProxLB: Info: [rebalancing-calculator]: Rebalancing will be done by: total resources.
<6> ProxLB: Info: [rebalancing-calculator]: Balanciness is set to: 1.
<6> ProxLB: Info: [balancing-method-validation]]: Valid balancing method: cpu
<6> ProxLB: Info: [balanciness-validation]: Rebalancing is for cpu is not needed. Highest usage: 100% | Lowest usage: 100
<6> ProxLB: Info: [rebalancing-calculator]: Balancing calculations done.
<6> ProxLB: Info: [rebalancing-executor]: Starting dry-run to rebalance vms to their new nodes.
<6> ProxLB: Info: [rebalancing-executor]: No rebalancing needed according to the defined balanciness.
No rebalancing needed according to the defined balanciness.
<6> ProxLB: Info: [post-validations]: All post-validations succeeded.
<6> ProxLB: Info: [daemon]: Not running in daemon mode. Quitting.

VM's:

+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| id        | type | cgroup-mode | content |   cpu |   disk | hastate | level | maxcpu |   maxdisk |     maxmem |        mem | name           | node         | plugintype | pool | status  | storage |      uptime | vmid |
+===========+======+=============+=========+=======+========+=========+=======+========+===========+============+============+================+==============+============+======+=========+=========+=============+======+
| qemu/100  | qemu |             |         | 0.04% | 0.00 B |         |       |     10 |  2.20 GiB | 195.78 GiB | 819.39 MiB | testproxlb     | node1        |            |      | running |         | 22h 44m 37s |  100 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| qemu/101  | qemu |             |         | 0.12% | 0.00 B |         |       |      5 | 50.00 GiB | 195.78 GiB | 772.17 MiB | testproxlb2    | node1        |            |      | running |         |      4m 38s |  101 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| qemu/102  | qemu |             |         | 0.03% | 0.00 B |         |       |     12 |  2.20 GiB | 195.78 GiB | 836.94 MiB | testproxlb3    | node1        |            |      | running |         | 22h 44m 30s |  102 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+
| qemu/9000 | qemu |             |         | 0.00% | 0.00 B |         |       |      1 |  2.20 GiB |   2.00 GiB |     0.00 B | focal-template | node1        |            |      | stopped |         |          0s | 9000 |
+-----------+------+-------------+---------+-------+--------+---------+-------+--------+-----------+------------+------------+----------------+--------------+------------+------+---------+---------+-------------+------+

@gyptazy
Copy link
Owner

gyptazy commented Jul 12, 2024

Thanks, I just pushed a fix. Can you give it a try, please?

It does not make any sense to validate the current resources for balanciness when using total values:
https://github.com/gyptazy/ProxLB/compare/ef60124c286d9e346690b45650700677d79a5b31..f14b94f7584377675022d740a98279a4e777d42f

However, this should work but still requires additional changes. Current disadvantage of this one is, that it will rebalance almost always the VMs.

I need to adjust the test cluster and integrate further changes.

@daanbosch
Copy link
Contributor Author

Hmm now it wants to move every vm to node2 based on cpu (testproxlb2 is already on node 2) in this scenario.

            VM   Current Node   Rebalanced Node
    testproxlb   node1      node2
   testproxlb3   node1      node2

For the memory run:

            VM   Current Node   Rebalanced Node
   testproxlb2   node2          node1
   testproxlb3   node1          node2
    testproxlb   node1          node3

This would be correct, however it does not really make sense to swap testproxlb2 and testproxlb3.

However it seems to be going in the right direction! Thanks!

@gyptazy
Copy link
Owner

gyptazy commented Jul 12, 2024

This would be correct, however it does not really make sense to swap testproxlb2 and testproxlb3.
However it seems to be going in the right direction! Thanks!

Yeah, that was what I meant with:

However, this should work but still requires additional changes. Current disadvantage of this
one is, that it will rebalance almost always the VMs.

I'll probably have a look at this on Monday.

@gyptazy
Copy link
Owner

gyptazy commented Jul 13, 2024

Just had a look at it this morning and decided to integrate this in a proper way which requires more restructuring in the code than previously assumed with more validations because it also already killed a node in my cluster in my test ;)

I'm already working on that and will push it when it is ready in a usable way.

@gyptazy
Copy link
Owner

gyptazy commented Jul 15, 2024

Hey @daanbosch,

maybe you can give #23 a try by time. Currently, there's still a small issue included, where it might need to do an initial rebalance and works right away in the second run. This is something I'm still looking into...

Thanks,
gyptazy

@daanbosch
Copy link
Contributor Author

Hi @gyptazy,

I just tested #23 and it works fine for me. There are indeed some small things that make it not the quickest path to get the desired balance. However. It's already a great tool in the current state!

@gyptazy
Copy link
Owner

gyptazy commented Jul 16, 2024

Hey @daanbosch,

I just tested #23 and it works fine for me. There are indeed some small things that make it not the quickest path to get the desired balance. However. It's already a great tool in the current state!

Happy to hear! I'll add some more improvements asap so that this should also immediately work in the first run. I encountered additional issues with the API and I can only rely on the (updated) information in the API to recalculate the best placement for VMs. You might also see a race condition, when retriggering that command too fast that you get inconsistent/outdated data from the API. While ProxLB is working stateless, this is an issue (maybe solvable by writing some state files in the filesystem, because I really like to avoid using any databases for this small service).

Cheers,
gyptazy

gyptazy added a commit that referenced this issue Jul 16, 2024
…resources

feature: Add option to rebalance VMs by their assigned resources. [#16]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment