Skip to content

Deep Learning Training Service v1.2.0

Compare
Choose a tag to compare
@Anbang-Hu Anbang-Hu released this 07 Oct 20:15
· 696 commits to master since this release
8767c65

JobManager

  • Priority-adjustable job scheduling (by VC admin)
  • Job pausing and resuming
  • VC level user quota control
  • Infiniband topology aware scheduling
  • Support inference job type

WebPortal

  • New webportal in ReactJS and Koa
  • GPU fragmentation histogram on job submission page
  • Idle GPU count, monthly booked GPU hours and monthly idle GPU hours per user
  • Per VC GPU usage dashboard

Fundamental

  • Linux kernel upgrade from 4.x to 5.x
  • K8s upgrade from v1.9.0 to v1.15.2
  • NVIDIA driver upgrade to 430 series