-
Notifications
You must be signed in to change notification settings - Fork 16
Distributed Node Operation #175
Comments
I would be one of the those supporting containers. When you work across nodes there are so many chances for things to go wrong (the node goes down, that portion of the flowgraph crashes, network throughput issues, etc) that I think it would be really hard to make something that "just works" without leveraging existing technologies. I have been looking at argo dataflows (which is built on top of kubernetes) https://github.com/argoproj-labs/argo-dataflow . It provides for distributed processing across containers and nodes and checks many (but not all) of the boxes you have above. One feature of particular interest is that it can auto scale the number of containers available based upon your workload. (Imagine a spectrum survey application, where once we detect a signal, it will start processing it. Argo could reduce/increase the processing resources available based upon the amount of traffic coming in.) The biggest drawback to this right now is the amount of boilerplate that must be written. A flowgraph has to be split into smaller flowgraphs. We have to create Dockerfiles, and kubernetes manifests. I think that most of the boilerplate could be automated. From the discussion at GRCon, I had in mind a kubernetes, argo, or docker-compose custom scheduler. Behind the scenes, it could split up the flowgraph, write Dockerfiles and manifests and launch them. To use the custom scheduler, you would have to install the underlying technology and add in some configuration, such as available nodes, image registry locations, etc. But with that info, I think it would be pretty straightforward to switch launching the flowgraph with the normal scheduler vs the distributed scheduler. |
@jsallay - thank you for the comments. I agree that leveraging existing technologies will be absolutely necessary other than proving some toy example. Let's run with the container example - even with fixed resources. If I can build a docker that has the GR libraries installed that will contain part of my flowgraph, I still need to get the docker set up to instantiate the flowgraph portion. Right now in newsched there is a |
I was thinking about the problem from a slightly different perspective. Let's take the simplest example of using docker-compose. The scheduler would convert the flowgraph into and a docker-compose file. For the time being, let's assume that the images needed already exist. The docker-compose file would essentially say launch a container of image x and run command y which would execute a portion of the flowgraph. If we wanted to run with kubernetes, then the same principles apply, but it would create manifest files rather than docker-compose files. With Argo Dataflows, it would create a dataflow template. So to now answer your question: The flowgraph portions would be communicated to the kubernetes instance. Kubernetes (or Argo) would then be told to launch a set of images. It would handle which portions get put on which nodes in the cluster and it would set up everything for them to be able to communicate. |
My gut feeling would be that you'd not serialize the graph partition going to a "worker" node on the "controller" node (which knows the full fg), but that the workers run a For example, the worker's firecracker VMs / containers / kubelets/ pods described above have their own daemon waiting for instructions like create_port("to control host", { "type": "accel/ethernet", "parameters": { "remote": 0x002091abcdef, "vlan": 0 }})
create_block("random source", { "name": "xoroshiro", "accel": "force", "parameters": {"seed": 0xdeadbeef}})
create_block("random source 2", { "name": "xoroshiro", "accel": "force", "parameters": {"seed": 0xcafe0x}})
create_block("adder", { "name": "saturating integer addition", "accel": "optional", "parameters": {"inputs": 2, "bounds": [0, 1000]}})
connect_blocks("my beautiful subgraph", [[["random source", 0], ["adder", 0]]])
connect_blocks("my beautiful subgraph", [[["random source 2", 0], ["adder", 1]]])
connect_blocks("my beautiful subgraph", [[["adder", 0], ["to_control_host"]]]) and answer synchronously with futures that already contain the name specified as first argument, or simply directly asynchronously. (the thing is that for example creating "random source" might require the worker to trigger an FPGA image fragment synthesis, and Advantage is that this RPC framework would be good to have, anyway, especially in case we'd want to reconfigure later on. Also, the fact that there's individual instructions that can go wrong rather than just a whole FG description that "doesn't work" makes the logging a bit easier (not much, probably). As a logical extension, the node themselves might decide to delegate things, like "hm, I don't have a hardware accelerated random number generator, but one of these 20 FPGA-accelerator equipped servers over there could have, so I just spin up a kubelet on one of them", and forward the block creation to that, and play proxy for all control-plane RPC responses. |
So then from a practical perspective, getting this type of RPC set up seems like the logical next step. And can be done apart from any containerization - though we will want to do that it seems it is a separable work item. |
The goal of this feature is to setup flowgraphs that can span multiple nodes.
A first pass at this feature, or proof of concept I think involves the following:
Other ideas for doing this have been proposed using containers
Not sure yet what this would mean for embedded host environments
The text was updated successfully, but these errors were encountered: