-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-Latency networks: "generator already executing" when calling a service #1351
Comments
I do not really find the logical reason why #1308 can fix this... can you explain your thought a bit more? IMO, this is because the same The reason why we have this problem only with extra delay 200ms, is that 1st service call is still waiting to be completed because of this delay, and then 2nd call comes in after that. (without this delay, service call will be completed much quicker so 1st @mmatthebi thanks for the detailed information. can you try with the following patch for your application to see if that works? from rclpy.executors import SingleThreadedExecutor, MultiThreadedExecutor
...
se = SingleThreadedExecutor() # HERE
rclpy.spin_until_future_complete(self, send_goal_future, executor=se) # HERE |
…ervice ref: ros2/rclpy#1351 Signed-off-by: Tomoya Fujita <[email protected]>
thank you all for your help! The hint from #1123 did not fix it unfortunately. Though, if running with rolling, the error does not occur, so it seems to be fixed somewhere internally already. However, huge huge thanks to @fujitatomoya for providing a workaround! I have implemented it in the testcase and the error goes away. I will try this in production and see if it helps there as well. I'll let you know. Thanks again! |
@mmatthebi thanks for checking that.
one question, did you build the source code or just use the released package? can you share the version/commit hash of either package or source code? |
I ran the tests again. With rolling, the service call still blocks sometimes, but a subsequent call does not crash with the "generator already executing" bug. Though, it seems that the action calls to the action server do not get through smoothly but in bursts. Instead, when using your proposed solution with the dedicated executor, the service calls do not block and also the action calls go through more smoothly. I used the packaged rolling version from the ros2:rolling docker container:
|
Bug report
We have observed a strange behaviour when high network latency is occuring between an action client and server. Such setup can e.g. happen when two robots are connected via an LTE link. I could replicate the error with two docker containers and according traffic shaping.
Required Info:
Steps to reproduce issue
See attached zip file bugreport.zip which contains a docker-compose script along with the source code for an action server and an action client. The architecture is as follows:
action_server.py
provides an basic actionaction_client.py
provides/mode
to enable/disable calling the action in the server/data
which triggers the action call upon each reception on this topic, when the action calls have been enabled by the service.Expected behavior
The behaviour is as expected when there is a low-latency connection between the containers:
docker-compose up -d
tmux
, move to/app
and call. /opt/ros/humble/setup.bash
python3 action_server.py
python3 action_client.py
ros2 service call /mode std_srvs/srv/SetBool '{data: true}'
ros2 topic pub /data std_msgs/msg/String -r 10
From this point on, the action_server is constantly performing actions and the client is reporting on the console. Now, stop the action calls by calling the service:
The client stops executing the actions as expected.
Actual behavior
Now, we add a network latency between the containers by calling
Additional information
The blocking of the service call depends on the rate of the
/data
publisher. Essentially, it seems if the service call to stop the actions is incoming while a packet transmission is ongoing, the call blocks forever and the crash occurs.I know that the architecture of the
action_client.py
is a bit strange, as there are two nodes in the process where one is creating the other. However, we have seen such architecture in the wild where this bug occured. It would be of great help to understand what the cause is and how to fix it.The text was updated successfully, but these errors were encountered: