-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup new Node class to improve consistency of node usage in dlg-engine
#283
Conversation
…node description throughout dlg-engine.
- rest.py acts as the interface between the JSON/string representation of nodes, and the DropManagers, which should only use the Node class from now on. Note: This will not successfully deploy a graph, as the translator is not functional with the new information.
- Test with HelloWorld-Simple and ArrayLoop graphs. - Unittests appear to work as well
Reviewer's Guide by SourceryThis pull request introduces a new File-Level Changes
Tips
|
dlg-engine
dlg-engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @myxie - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider making the
Node
class immutable to prevent unexpected modifications. This could be achieved by using properties with only getters or by making it adataclass
withfrozen=True
. - Add more robust error handling and validation in the
Node
class constructor, such as checking if ports are within valid ranges. - Increase unit test coverage for the
Node
class to ensure correct behavior under various input scenarios and edge cases.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 3 issues found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
EVENTS_PORT = 2 | ||
RPC_PORT = 3 | ||
|
||
class Node: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider adding validation for port numbers in Node class
The Node class is a good addition for standardizing node information. Consider adding validation for port numbers to ensure they are within valid ranges. This could prevent potential issues with invalid port configurations.
class Node:
def __init__(self, events_port=EVENTS_PORT, rpc_port=RPC_PORT):
if not 1 <= events_port <= 65535 or not 1 <= rpc_port <= 65535:
raise ValueError("Port numbers must be between 1 and 65535")
self.events_port = events_port
self.rpc_port = rpc_port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback, we will generate more comments like this in the future according to the following instructions:
- Include comments that suggest improvements or enhancements to the code, such as adding validation checks.
- Ensure the comment provides a clear rationale for the suggestion, explaining potential issues that could be avoided.
- Provide specific examples or code snippets to illustrate the suggestion, making it easier for the developer to understand and implement the change.
daliuge-engine/dlg/manager/rest.py
Outdated
with NodeManagerClient(host=node.host, port=node.port) as dm: | ||
return dm.session_status(sessionId) | ||
except ValueError as e: | ||
raise Exception(f"{node_str} not in current list of nodes:", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error
)
raise Exception(f"{node_str} not in current list of nodes:", e) | |
raise Exception(f"{node_str} not in current list of nodes:", e) from e |
daliuge-engine/dlg/manager/rest.py
Outdated
with NodeManagerClient(host=node.host, port=node.port) as dm: | ||
return dm.graph(sessionId) | ||
except ValueError as e: | ||
raise Exception(f"{node_str} not in current list of nodes:", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error
)
raise Exception(f"{node_str} not in current list of nodes:", e) | |
raise Exception(f"{node_str} not in current list of nodes:", e) from e |
daliuge-engine/dlg/manager/rest.py
Outdated
return dm.graph_status(sessionId) | ||
|
||
except ValueError as e: | ||
raise Exception(f"{node_str} not in current list of nodes:", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error
)
raise Exception(f"{node_str} not in current list of nodes:", e) | |
raise Exception(f"{node_str} not in current list of nodes:", e) from e |
if isinstance(host, Node): | ||
endpoint = (host.host, port) | ||
else: | ||
endpoint = (host, port) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace if statement with if expression (assign-if-exp
)
if isinstance(host, Node): | |
endpoint = (host.host, port) | |
else: | |
endpoint = (host, port) | |
endpoint = (host.host, port) if isinstance(host, Node) else (host, port) |
if return_tuple: | ||
return "localhost", 5553 + n, 6666 + n | ||
else: | ||
return Node(f"localhost:{8000}:{5553+n}:{6666+n}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Simplify unnecessary nesting, casting and constant values in f-strings (simplify-fstring-formatting
)
return Node(f"localhost:{8000}:{5553+n}:{6666+n}") | |
return Node(f"localhost:8000:{5553 + n}:{6666 + n}") |
Hi @awicenec - I finally got the multiple-NodeManagers working locally, by using this approach to fix the default-ports issue. Hopefully there is enough information in the PR description; please let me know if you need me to clarify things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty massive amount of changes, but I think we are going in the right direction. For an even more robust solution we will need some registration DB, like Ray is using, which keeps information about all managers involved in the deployment.
Issue
In #277 I added additional port options to the CLI so that it is possible to specify non-default RPC and Event ports. The intention of this was to allow for:
The motivating factor for this was that we had difficulties identifying what was causing issues getting Setonix deployment to work (LIU-377 (#271)).
One of the things that came out of discussions about the issue is the use of a unit test that would catch that sort of issue when partitioning across multiple nodes, using some of the additions in #277 when adding non-default port support. This is currently a WIP in #281.
I was unable to progress work on the unittest in #281 because I stumbled on the following limitations in DALiuGE:
daliuge/daliuge-engine/dlg/manager/node_manager.py
Lines 434 to 436 in ea6cc06
This method (
node_manager.add_node_subscriptions
,) updates the drop-relationships in a given session to make sure that we can send events between drops on different nodes; this is essential to when notifying drops that, say, data has been written and you can start consuming that from another node.The issue is line 436:
daliuge/daliuge-engine/dlg/manager/node_manager.py
Line 436 in ea6cc06
Regardless of what ports the related host is actually running on, we will only ever be subscribing to events on the default port.
We do this too for RPC subscriptions in the session, as well - this causes the DropProxies to be incorrectly managed as well:
daliuge/daliuge-engine/dlg/manager/session.py
Lines 526 to 534 in ea6cc06
Summary
We need to find a way to get non-default RPC/Event port information passed between nodes. Ideally, we don't want to be relying on the default event port as the only port on which we communicate, especially if we want to have the flexibility to configure it in the future. Additionally, it would be ideal to not see the
DEFAULTS_*
variables anywhere except a default value in a constructor, as this can lead to situations where all behaviour follows the default behaviour.The challenge here is that our current approach to manager information about hosts and ports is strings; if we don't specify the port, then we use the defaults, and the way we detect if we haven't used defaults is splitting string to get the port number:
daliuge/daliuge-engine/dlg/manager/rest.py
Lines 536 to 538 in ea6cc06
This approach is fine when we only want to keep track of up to 1 port, but if we need to send around up to 3 ports, that's a lot of splitting and conditionals.
I propose a solution that (hopefully) improves on the current node-host-port-management situation, and also allows us to send complete information about the setup of the hosts.
Solution
This PR introduces the
manager_data.Node
class, which contains in it the 'Daliuge Node Protocol' (not it's actual name). The node protocol is incredibly basic and just extends the existing pattern:The idea is this class stores all information about the host, and performs all the necessary splits in the constructor only once, rather than having to do it every time we need it. Then, when the data needs to be stored or communicated 'over-the-wire', we convert it to a string and send it the way we normally have.
In my application of the
Nodes
class, I have tried to ensure is the job of the Manager REST APIs to do the conversion to/fromNode
, and then the*Managers
can use theNode
class as expected. This reduces the need for consistent checking of if we have a port; theNode
object will always have a port, and it will often be the default port (but now we don't have to worry about it!).I have confirmed that this works using the
ArrayLoop
graph and have ensured that all unittests are passing.