Skip to content
James H. Fisher edited this page Oct 14, 2015 · 1 revision
External events:

* When we start the scheduler,

* When Mesos acknowledges our framework registration request,
    we get the task list [1],
      in order to start monitoring each task.

* When Mesos makes us N offers,
    for each of the offers,
      we get the task list [N],
        in order to test whether the offered host is already running a task.
      if we accept the offer,
        we get the task list [N],
          in order to test whether the task already exists.
            FIXME this is stupid. It doesn't exist, because we're creating it.
    
* When Mesos informs us about a lost executor (or rather, when our healthchecks inform us, because Mesos does not implement this yet),
    we get the task list [1],
      in order to get the task that was running on that executor (WHY)

* When Mesos sends the scheduler an update on a task,
    we get the task list [1],
      in order to test whether the task exists.
        FIXME this is stupid. It should exist, because we're getting an update on it.
    we then do the same thing again:
      get the task list [1],
        in order to test whether the task exists.
          FIXME this is even more stupid.
      

* When we receive an HTTP request for /_search,
    we get the task list [1],
      then for each N tasks in the list, we get its status which means:
        getting the task list again [N],
      in order to find a suitable host to forward the request to.
        
* When we receive an HTTP request for /_cluster/stats,
    we get the task list [1],
      then for each N tasks in the list, we get its status which means:
        getting the task list again [N],
      in order to find a suitable host to forward the request to.

* When every (configuration.getExecutorHealthDelay() / 2) milliseconds, we run the executor healthcheck,


SerializableState.get called by:
  ClusterState.getTaskList (one ZK GET)
    OfferStrategy.isHostAlreadyRunningTask with a Protos.Offer
    In the rule for our second acceptanceRule
      OfferStrategy.evaluate given a Protos.Offer
        ElasticsearchScheduler.resourceOffers, for each Offer
          XXX When Mesos offers us some resources
    a callback in new ClusterMonitor
      FrameworkState.markRegistered
        ElasticsearchScheduler.registered
          XXX When Mesos acknowledges our framework registration request.
    ClusterState.getGuiTaskList calls this once
      ElasticsearchScheduler.getTasks
        SearchProxyController.search
          XXX When we get an HTTP request to /_search
        SearchProxyController.stats
          XXX When we get an HTTP request to /_cluster/stats
    ClusterState.getTask for an ExecutorID
      ElasticsearchScheduler.executorLost for a given ExecutorID.
        XXX When Mesos informs us about a lost executor (or rather, when our healthchecks inform us, because Mesos does not implement this yet.)
    ClusterState.getTask for a TaskID
      ClusterState.exists for a TaskID
        ClusterState.addTask for a TaskInfo
          ClusterState.addTask for a ESTaskStatus
            ElasticsearchScheduler.resourceOffers when we accept an offer (potentially many)
              XXX When Mesos offers us resources
        ClusterState.update for a TaskStatus
          ClusterState.updateTask for a TaskStatus
            FrameworkState.announceStatusUpdate with a new TaskStatus
              ElasticsearchScheduler.statusUpdate with a new TaskStatus
                XXX When Mesos sends the scheduler an update on a task.
        ClusterState.updateTask for TaskStatus
          XXX see above
      ClusterState.getStatus for a TaskID
        ClusterState.getGuiTaskList calls this for every task in the task list.
          ElasticsearchScheduler.getTasks
            SearchProxyController.search
              XXX When we get an HTTP request to /_search
            SearchProxyController.stats
              XXX When we get an HTTP request to /_cluster/stats
        ClusterState.taskInError for a TaskStatus gets the task, then gets whether the task is in error.
          ClusterState.updateTask for a TaskStatus
            FrameworkState.announceStatusUpdate with a new TaskStatus
              ElasticsearchScheduler.statusUpdate with a new TaskStatus
                When Mesos sends the scheduler an update on a task.
        ClusterState.update with a TaskStatus
          see above
      ClusterState.updateTask for a TaskStatus
        see above
    ClusterState.removeTask for a TaskInfo
      ClusterState.addTask(TaskInfo) calls this before re-adding it if the task already exists
      ClusterState.updateTask(TaskInfo) calls this if the task is set to be "in error"
  ESTaskStatus.getStatus
    ExecutorHealth.run
      Runs every (configuration.getExecutorHealthDelay() / 2) milliseconds
    ClusterState.getGuiTaskList calls this for every task in the list
      see above
    `new ESTaskStatus` calls this when writing a log line
      ClusterMonitor.startMonitoringTask creates a new ESTaskStatus.
        FrameworkState.announceNewTask
          ElasticsearchScheduler.resourceOffers for a task, if we accept it.
      ClusterState.getStatus for a given TaskInfo.
        ClusterState.getStatus for a TaskID.
          see above
        ClusterState.removeTask(TaskInfo)
          see above
      ElasticsearchScheduler.resourceOffers if we accept an offer
        When Mesos offers us some resources.
    ESTaskStatus.taskInError
      see above
    ESTaskStatus.toString calls this twice!
      ????
  FrameworkState.getFrameworkID
    ElasticsearchScheduler.resourceOffers, for each offer that we accept
    FrameworkInfoFactory.setFrameworkId gets the previous ID to log it before changing it
      FrameworkInfoFactory.getBuilder
        ElasticsearchScheduler.run
          When we start the scheduler
    TaskInfoFactory.newExecutorInfo(Configuration) calls this once to set on the new ExecutorInfo
      TaskInfoFactory.createTask
        ElasticsearchScheduler.resourceOffers if we accept an offer        
    ClusterMonitor.startMonitoringTask(TaskInfo) calls this once to determine which framework the task is a member of ... this doesn't seem right
      lambda in `new ClusterMonitor` calls this once for each task
        when the framework is registered
    ClusterState.getKey calls this to determine the key in ZK for the tasks
    ClusterState.getStatus(TaskInfo) calls this once
  StatePath.exists
    StatePath.mkdir
      ClusterState.setTaskInfoList
        ClusterState.addTask with a TaskInfo
          see above
        ClusterStatus.removeTask(TaskInfo)
          see above
      ESTaskStatus.setStatus
        ClusterState.update with a new TaskStatus
          ClusterState.updateTask with a TaskStatus
            see above
        When we call `new ESTaskStatus` we test whether it is stored in ZK already; if not then we set a default TaskStatus.
          see above
      FrameworkState.markRegistered
        see above

SerializableState.set called by:
  ClusterState.setTaskInfoList sets the task list key
    ClusterState.addTask calls this once
      see above
    ClusterState.removeTask calls this once
      see above
  ESTaskStatus.setStatus(TaskStatus) sets the task status key for a given task
    ClusterState.update(TaskStatus)
      see above
    `new ESTaskStatus` calls this once if the task status does not exist in ZK
      see above
  FrameworkState.markRegistered sets the frameworkId key
    see above
  StatePath.mkdir makes each "directory" in order from the root
    see above

SerializableState.delete called by:
  ESTaskStatus.destroy
    ClusterState.removeTask
      see above
Clone this wiki locally