-
Notifications
You must be signed in to change notification settings - Fork 15
(Deprecated) Development Notes
June 9, 2020
A discussion with the development team revolved around scheduling. Currently scheduling is a complex service that manages resources, resource allocation, job state transitions, and job launching.
Resources are managed via the ResourceManager
abstract interface.
Jobs are managed via the JobManager
interface.
The SchedulerService
is responsible for listening for request messages as well as job state update messages.
It also has an added async_task
for manage_job_processing
attached from the services constructed JobManager
.
A valid SchedulerRequestMessage
creates a RequestedJob
object. At some point the mange_job_processing
async task will execute, and the new job will be recognized as ready to allocate. manage_job_processing
implements semantics for a simple priority queue with starvation prevention using dynamic priority migration. Once an active
job is eligible for allocation, the ResourceManager
is asked to allocate the virtual resources. Once successfully allocated, the job manager request_scheduling
is responsible for launching the job, and if successful, marks the job state as SCHEDULED
and saves it to the backing store.
A question of scalability arises with this service definition, as well as questions around seperation of concerns. Advanced job scheduling semantics were also discussed, in particular if we want to include the Async IO
implementation of the advanced python scheduler, what would that look like and how would we maintain consistent job states in the backing store as well potential communication between the job manager and the scheduling.