Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make resource allocation ( total number of machines ) exact as requested numbers. #82

Open
q131172019 opened this issue Jul 13, 2022 · 3 comments
Assignees
Milestone

Comments

@q131172019
Copy link
Collaborator

In test for 500K nodes / 2 regions / 20 schedulers / 25K nodes per scheduler, the first 19 schedulers are successfully allocated with requested machines greater than 25k nodes due to overhead so that the remaining nodes are less than 25K. The result is the 20th scheduler is not allocated with 25k requested machines due to "Not enough hosts"

I0711   18:53:40.867457   18611   installer.go:48] handle client registration
E0711 18:53:40.869402   18611 distributor.go:66] Error allocate   resource for client. Error Not enough hosts
I0711 18:53:40.869424   18611 installer.go:79] error register   client. error Not enough hosts
@q131172019 q131172019 changed the title The last scheduler cannot be allocated with requested machines due to overhead of other schedulers The last scheduler cannot be allocated with 25k requested machines due to overhead of other schedulers Jul 13, 2022
@yb01 yb01 added this to the 2022-0930 milestone Jul 14, 2022
@yb01 yb01 changed the title The last scheduler cannot be allocated with 25k requested machines due to overhead of other schedulers P2: The last scheduler cannot be allocated with 25k requested machines due to overhead of other schedulers Jul 14, 2022
@yb01
Copy link
Collaborator

yb01 commented Jul 14, 2022

thanks for filing this issue to track this issue.
this is due to the fact that distributor allocates machines in slices, so it is not exactly the number of machines the client requested, it is a bit over per the size of the slices being allocated to the client. which can be ~30 or so,

@yb01
Copy link
Collaborator

yb01 commented Aug 8, 2022

  1. first assume no new nodes or new RP added to a Region.
  2. then, add new RP to a region
  3. assume no new node change in an RP

@yb01
Copy link
Collaborator

yb01 commented Aug 8, 2022

evaluate first step for 930 for cost.

@yb01 yb01 changed the title P2: The last scheduler cannot be allocated with 25k requested machines due to overhead of other schedulers Make resource allocation ( total number of machines ) exact as requested numbers. Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants