Kubeshare prototyping and compute sharing deep dive #88

Fizzbb · 2022-01-26T18:17:22Z

No description provided.

pint1022 · 2022-02-02T18:02:25Z

Comparing with GaiaGPU solution, I think Kubeshare's architecture has some good designs to monitor/manage the compute resources.

Fizzbb · 2022-02-02T22:38:10Z

To do: Run a pod with 50% utilization request with KubeShare, and monitor GPU utilization fluctuation.

pint1022 · 2022-02-15T17:24:11Z

Gemini algorithm major features:

vgpu hook is assigned by container. it is event-driven; communicates with the gpu-isolation-module by ipc. major job: kernel watchdog, heartbeat, interrupt overuse.
gpu-isolation-module is a daemon process. it is event-driven; handles ipc requests; creates new sharepod; issue monitor threads by pods. major algorithm: usage-monitoring; sliding-window; request-queue; priority-management.

pint1022 · 2022-02-15T18:52:59Z

Gemini algorithm is deserved to be further explored and tested. I like it very much. The next step analysis involves with a lot of coding and testing. two main questions: 1 Will GPU be idle if one pod runs out of quota and there is no other pod/thread asking for GPU? answer: normally it doesn't run into situation. even it runs into that unique situation, it should be avoidable by changing parameters. 2. does cuda do context switching when the pod is changed? It is vGPU driver's question, not Kubeshare's question. In my research, vGPU driver should use one context among all sharing threads.

Fizzbb assigned pint1022 Jan 26, 2022

Fizzbb moved this to In Progress in Alnair 20220130 Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubeshare prototyping and compute sharing deep dive #88

Kubeshare prototyping and compute sharing deep dive #88

Fizzbb commented Jan 26, 2022

pint1022 commented Feb 2, 2022

Fizzbb commented Feb 2, 2022 •

edited

Loading

pint1022 commented Feb 15, 2022

pint1022 commented Feb 15, 2022

Kubeshare prototyping and compute sharing deep dive #88

Kubeshare prototyping and compute sharing deep dive #88

Comments

Fizzbb commented Jan 26, 2022

pint1022 commented Feb 2, 2022

Fizzbb commented Feb 2, 2022 • edited Loading

pint1022 commented Feb 15, 2022

pint1022 commented Feb 15, 2022

Fizzbb commented Feb 2, 2022 •

edited

Loading