-
Notifications
You must be signed in to change notification settings - Fork 12
Issues: CentaurusInfra/alnair
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Add prometheus export to report process-level GPU utilization and memory used size
#131
opened Jun 20, 2022 by
Fizzbb
vgpu-server container failed to start, "run/nvidia-persistenced/socket" no such device or address
#119
opened Apr 3, 2022 by
Fizzbb
intercept lib launched through LD_PRELOAD cannot intercept cuda driver API calls with pytorch version >=1.10
#114
opened Mar 30, 2022 by
Fizzbb
Add pre-start hook to all containers in container runtime to support GPU access
#110
opened Mar 15, 2022 by
Fizzbb
create an exporter to export burst, overuse and window-size metrics to prometheus.
#108
opened Mar 8, 2022 by
pint1022
setup tf-serving testing environment for kubeshare throughput testing
#106
opened Mar 8, 2022 by
pint1022
horovod mnist.py has higher utilization number. what does it do?
#105
opened Mar 8, 2022 by
pint1022
GPU sharing corner case: vGPUs spread to two or more physical GPUs
#98
opened Feb 18, 2022 by
Fizzbb
revise alnair devicepluginserver to connect the running pod/container info with the device
#92
opened Feb 2, 2022 by
YHDING23
fairseq multihead_attention, torch.cat cause RuntimeError: CUDA out of memory
#83
opened Jan 25, 2022 by
Fizzbb
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.