Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to guarantee PUB/SUB channel in SYNC_PUB #151

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Flood1993
Copy link
Member

@Flood1993 Flood1993 commented Jun 27, 2017

This function aims to guarantee the PUB/SUB channel to be already up and running after returning, ready for sending data right away without it being dropped (client will be guaranteed to be at the other end of the PUB socket).

@codecov
Copy link

codecov bot commented Jun 27, 2017

Codecov Report

Merging #151 into master will decrease coverage by 0.07%.
The diff coverage is 98.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #151      +/-   ##
==========================================
- Coverage   98.39%   98.31%   -0.08%     
==========================================
  Files          25       25              
  Lines        3049     3150     +101     
  Branches      232      233       +1     
==========================================
+ Hits         3000     3097      +97     
- Misses         37       41       +4     
  Partials       12       12
Impacted Files Coverage Δ
osbrain/agent.py 95.95% <100%> (+0.07%) ⬆️
osbrain/tests/test_helper.py 100% <100%> (ø) ⬆️
osbrain/tests/test_agent.py 100% <100%> (ø) ⬆️
osbrain/tests/test_agent_pubsub_topics.py 100% <100%> (ø) ⬆️
osbrain/helper.py 97.26% <87.5%> (-2.74%) ⬇️
osbrain/tests/test_agent_transport.py 94.87% <0%> (-1.24%) ⬇️
osbrain/tests/test_agent_async_requests.py 99.22% <0%> (-0.78%) ⬇️
osbrain/tests/test_agent_sync_publications.py 100% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bb9de7...97d204e. Read the comment docs.

@Flood1993
Copy link
Member Author

Flood1993 commented Jun 27, 2017

I added two more commits, refactoring the function a bit. I have also been thinking about whether the connection should be done in the synchronize_sync_pub function or not, and I came to the conclussion that it should NOT be done there.

Pending things before merging:

  • Change the docstring of synchronize_sync_pub.
  • Change Agent functions _subscribe and UGLY_get_handler names into more specific ones.
  • Add tests for the new non-private Agent functions: missing subscribe, get_handler.
  • Solve the issue with commit 7b7bd53
  • Squash commits.

Important: It is interesting that I got an error in test_agent.py::test_get_uuid_used_as_alias_for_sub_in_sync_pub_async. Might be related to the issue with 7b7bd53

Commit message of 7b7bd53 (so it doesn't get lost after rebasing):

Add tests for the uuid in SYNC_SUB sockets. Read below

However, there is a chance that the trace of the exception is printed
for the test of ASYNC_REP (line 446 in test_agent). This is because
we are doing something wrong, probably the connection has no time to
be stablished before calling the function or something.


Ideally, this should only be called for alias that represent a
SUB socket.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring Parameters and Returns sections are missing. 😉

@@ -912,7 +922,7 @@ def _handle_async_requests(self, data):
else:
handler(self, response)

def _subscribe(self, alias: str, handlers: Dict[Union[bytes, str], Any]):
def subscribe(self, alias: str, handlers: Dict[Union[bytes, str], Any]):
"""
Subscribe the agent to another agent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to complete de docstring here as well now that the method is "user accessible".


Make sure they have stablished the PUB/SUB communication within the
SYNC_PUB/SYNC_SUB channel before returning. This will guarantee that
no PUB messages are lost.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "Parameters" section from the docstring.

# Restore the original handler, now that the connection is guaranteed
client.subscribe(uuid, original_handler)

client.del_attr('_tmp_attr')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this implementation. Worst-case there might messages not received by the client when you change the handler. That is really bad. Also, that requires the .subscribe() call to be executed again, which I'm not sure is the best thing either.

Would it make sense to ensure synchronization with SYNC-PUB requests? i.e.:

  • First connect from the subscriber.
  • Then make requests from the subscriber each X seconds until one response is received.
  • That is all.

You achieve synchronization while avoiding double .subcribe() call and avoid risks of the SUB handler receiving messages sent for the "syncrhonization" process.

You may want to specify an on_error=lambda x: pass when sending the request to avoid warning log messages during synchronization.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems interesting, I will take a look into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would not be possible to synchronize them through requests. I am thinking about the case in which a request might change the internal state of the SYNC_PUB agent, and for those cases, an extra hack around would have to be made, I think (changing the handler of the SYNC_PUB momentarily for the requests).

Perhaps what we could do is explain that some messages might be lost in the process or that it should only be used for testing purposes. Now I'm thinking that perhaps the function does not even belong there. In the end, I came up with it to make some tests more consistent.

Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm.

Yeah, one possibility would be to take that away from there and keep it just for the tests.

Another possibility would be to make all SYNC_PUB sockets implement heart-beating. This is something we should do at some point anyway, hopefully Soon™.

Heart-beating would mean that a SYNC_PUB socket would publish a "heart-beat" (like an "I'm alive" message) periodically, to all subscribed agents. Clients can use this to know if they should reconnect (maybe the server went down).

What we could do for now is to implement just a part of the heart-beating: allow clients to request a heart-beat at any time (i.e.: "are you alive?"). This would be a request from the SYNC_SUB, although it would be treated as a special case and all SYNC_PUB channels should handle it the same way, without taking into account the user-defined handler for other requests. This means adding a "special case" in the protocol, but I don't think it is bad, as requests do not need to be very fast and are infrequent.

We could then use those heart-beat requests from the tests to ensure synchronization.

If you feel like that is too much for now, just take it away from there and open a new issue so we try to improve it for version 1.0.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe heart-beat messages could be published with a special topic (so that clients may decide to subscribe or not to this heart-beat).

Maybe we could use that for now for the tests: from the publisher, we add a timer that publishes heart-beats with a special topic, from the subscriber, when we want to make sure we are fully connected, we just subscribe to this topic and wait for the first heart-beat.

What you think?

Maybe we should anyway open an issue and move this discussion there (we're probably going to discuss these things again when we start working on a good heart-beat implementation).

assert message_01 not in a5.get_attr('received')
assert b'fooWorld' in a5.get_attr('received')
assert b'foobarFOO' in a5.get_attr('received')
assert b'foBAR' in a5.get_attr('received')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes related to receive, received_list, etc. should go in a separate pull-request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got them a bit messed up 😅 Will fix that before merging (if it is ever about to be merged).

Guillermo Alonso added 9 commits June 29, 2017 12:18
This function will guarantee the PUB/SUB channel is already up and
running after returning, ready for sending data right away.
However, there is a chance that the trace of the exception is printed
for the test of ASYNC_REP (line 446 in test_agent). This is because
we are doing something wrong, probably the connection has no time to
be stablished before calling the function or something.
@Flood1993
Copy link
Member Author

I managed to get around this issue in osmarkets without this function. I think we can close this PR and implement the heartbeat system in the future. What do you think @Peque ?

@Peque Peque added this to the 1.0.0 milestone Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants