Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing issue: somehow packets get dropped communicating with ground station #189

Open
CodeTriangle opened this issue Aug 7, 2023 · 9 comments

Comments

@CodeTriangle
Copy link

This is a bit of a weird issue that we've been having. I was hesitant to classify it as an issue fit to be published here, but I have been told that @podhrmic at least is willing to look it over.

I've been working with rust-mavlink over the past few days to create a minimal MAVLink router (ideally one that is capable of replacing mavlink-router but will run in embedded and lower-memory environments). The program is very simple. It spawns a thread per endpoint which waits for a message on that endpoint and sends it to all the other endpoints.

Now we get to the crux of the issue: we're having some transmission problems. We have QGroundControl hooked into one side of the router and a PX4 aircraft in the other side. After about 20 or so messages, the endpoint connected to QGroundControl becomes unresponsive because QGroundControl switches to MAVLink 1. That happens because it doesn't think it's received an AUTOPILOT_VERSION message (requested by QGroundControl on startup with MAV_CMD_REQUEST_MESSAGE). My code prints every message received on all endpoints. According to these logs, the router does receive the message so it should be sent on to QGroundControl, but QGC is clearly missing them when they do come in.

This error only happens under the following conditions:

  • QGroundControl is running on a different machine than the router
  • The router is connected to QGroundControl over UDP Broadcast

Here's the repository if anyone is willing to try to reprodice this: https://github.com/AggieAir/rust-mavlink-router

@podhrmic
Copy link
Collaborator

podhrmic commented Aug 7, 2023

@CodeTriangle I haven't used rust-mavlink in a while, but based on what you are describing this sounds more like a network problem:

  • is MAV_CMD_REQUEST_MESSAGE the only message that is being broadcasted? Or can you positively capture other broadcasted messages?
  • and if so, is MAV_CMD_REQUEST_MESSAGE the first message that is being broadcasted? Maybe your receiver drops the first broadcast packet?
  • you say the router receives the message, but do you know that the message is being sent (do you have a sent message log)?
  • are you doing packet capture with wireshark on all endpoints to see that the packets are delivered before being passed to the applications?

@CodeTriangle
Copy link
Author

  • is MAV_CMD_REQUEST_MESSAGE the only message that is being broadcasted? Or can you positively capture other broadcasted messages?

  • and if so, is MAV_CMD_REQUEST_MESSAGE the first message that is being broadcasted? Maybe your receiver drops the first broadcast packet?

Far from it. Heartbeats are verifiably sent every second, pings also come in occasionally.

  • you say the router receives the message, but do you know that the message is being sent (do you have a sent message log)?

I don't suppose I do, but it hasn't been needed yet because...

  • are you doing packet capture with wireshark on all endpoints to see that the packets are delivered before being passed to the applications?

Yes, I have done this. The packets exist on both sides.

@podhrmic
Copy link
Collaborator

podhrmic commented Aug 7, 2023

Based on what you said, it sounds like QGroundControl doesn't accept the AUTOPILOT_VERSION packet (assuming you have a proof that packet indeed reaches the ground station) - would you agree? I don't know what is your endpoint config - is rust-mavlink somehow involved beyond being a part of the router?

@CodeTriangle
Copy link
Author

Based on what you said, it sounds like QGroundControl doesn't accept the AUTOPILOT_VERSION packet (assuming you have a proof that packet indeed reaches the ground station) - would you agree?

QGC is certainly not accepting the package for some reason or another. It's not the packet's fault, though. The same AUTOPILOT_VERSION packet is accepted under different stacks than the one detailed here.

I don't know what is your endpoint config - is rust-mavlink somehow involved beyond being a part of the router?

It is involved in that when rust-mavlink copies the messages to other ports things start falling apart whereas things work as expected when they're just sent directly via serial link or through mavlink-router.

@podhrmic
Copy link
Collaborator

podhrmic commented Aug 8, 2023

It is involved in that when rust-mavlink copies the messages to other ports things start falling apart whereas things work as expected when they're just sent directly via serial link or through mavlink-router.

@CodeTriangle do you have a minimal working example that uses rust-mavlink? E.g. something that doesn't involve your router? It would be helpful if you knew that rust-mavlink works in non-routed situation (GCS <-> UAV) and provides the correct packets.

If you suspect that one of the packets is not (de)serialized correctly (I suspect you mean AUTOPILOT_VERSION) then the best would be if you capture a working AUTOPILOT_VERSION packet, and try to (de)serialize it with rust-mavlink and see if you get the same data at the end.

I admit I didn't study much what is expected from the mavlink-router - but you seem to modify the packets during processing, see this line. The mavlink router also seems to have some requirements about UDP broadcast vs unicast so you might have to implement that behavior as well.

@CodeTriangle
Copy link
Author

It is involved in that when rust-mavlink copies the messages to other ports things start falling apart whereas things work as expected when they're just sent directly via serial link or through mavlink-router.

@CodeTriangle do you have a minimal working example that uses rust-mavlink? E.g. something that doesn't involve your router? It would be helpful if you knew that rust-mavlink works in non-routed situation (GCS <-> UAV) and provides the correct packets.

I can try to strip the program down even more, but honestly I don't see what I can take out. What you see is the bare minimum. It spawns up some number of threads, blocks on each until they receive a message, then immediately acts on the message, sending it to the other endpoints.

If you suspect that one of the packets is not (de)serialized correctly (I suspect you mean AUTOPILOT_VERSION) then the best would be if you capture a working AUTOPILOT_VERSION packet, and try to (de)serialize it with rust-mavlink and see if you get the same data at the end.

Sure, I can try this.

I admit I didn't study much what is expected from the mavlink-router - but you seem to modify the packets during processing, see this line.

No, this line of code runs after the message is resent. Modifying the message would be impossible since it's marked as immutable. What is modified is my_map, which maps sysid/compid pairs to the last seen sequence id. This is useful for accounting for packet loss.

The mavlink router also seems to have some requirements about UDP broadcast vs unicast so you might have to implement that behavior as well.

Avoiding mavlink-router's restrictions on UDP broadcast is one of the motivations of this project. We want to be able to broadcast to any number of ground station instances on the same port.

@podhrmic
Copy link
Collaborator

podhrmic commented Aug 8, 2023

Avoiding mavlink-router's restrictions on UDP broadcast is one of the motivations of this project. We want to be able to broadcast to any number of ground station instances on the same port.

I suspect that QGroundControl might have some restrictions on the shape and form of the communication, presumably the mavlink-router's restrictions are there for a reason. My immediate concern would be how to ensure that only one GCS can command the UAV etc. Unless the GCS has some listening only mode, multiple GCS might not even be possible.

If you agree that is indeed the problem, perhaps best to close this issue and post your question at https://github.com/mavlink/qgroundcontrol/issues ?

@patrickelectric
Copy link
Member

@CodeTriangle we merged a series of fixed on master right now related to serialization / deserialization. Can you check now if it works ? I'll release a 0.12.2 soon.

@patrickelectric
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants