Broflake: investigate NAT traversal failure #163

noahlevenson · 2023-07-12T21:15:18Z

Now that we've got an abundance of censored peers on the network, the problem of NAT traversal has revealed itself to be quite significant. If you start a widget and pop open the console, you'll see a nonstop torrent of attempted connections, all of which result in NAT traversal failure. I'd estimate that < 1% of attempted connections succeed at NAT traversal.

This is critical-ish path for the MVP, because the widget -- from the perspective of the user -- still doesn't really do anything. It acquires connections so infrequently that it sorta seems broken. If there are any light lifts we can perform to improve the traversal rate, we should do them now. And if there aren't any light lifts, then at least we should know why.

Thus, the topic of this issue:

Let's aggressively instrument our NAT traversal functions. We just need to understand more about what happens when NAT traversal fails. Who does it fail for? Where do they live? Are they on desktop or mobile? Were they able to gather ICE candidates, and if so, what do their ICE candidates look like? Do we think they're behind CGNATs?

Maybe we add NAT behavior discovery (RFC 5780) so that clients can report their NAT type at failure time? It should be relatively trivial with Pion: https://pkg.go.dev/github.com/pion/stun/cmd/stun-nat-behaviour#section-readme

There is the possibility that we may find it necessary to know the NAT types of both parties for each traversal failure, so as to determine whether we're dealing with an unworkable network composition. However, given the very controlled quantity of uncensored peers presently on the network -- I think it's just a few peers we've daemonized on DO, plus 3 or 4 Lantern employees -- it might just be easier for all of us to manually determine our NAT types and factor it into the research here.

noahlevenson · 2023-07-20T00:40:14Z

Just dumping an update:

We instrumented NAT traversal and added NAT behavior discovery. Traces will start arriving when the new Flashlight builds go out. We'll probably only need a few hours worth of traces to be able to deduce what's going on.

noahlevenson · 2023-08-17T23:22:24Z

Another update:

I'm still waiting for NAT traces to appear in Honeycomb. I'm assuming that application updates just haven't been pushed yet, or haven't been pushed widely enough. Once the trace data appears, we can start debugging and developing a plan.

In the meantime, as a ham-fisted workaround, we disabled Broflake for all mobile users. The effect is positive -- it seems that desktop users can pierce their NATs at a greater rate, which produces more activity in widget users' clients.

I'd like to keep this issue open until we're able to view those NAT traces and come up with a hypothesis.

noahlevenson self-assigned this Jul 12, 2023

oxtoacart transferred this issue from another repository Aug 21, 2023

noahlevenson mentioned this issue Aug 21, 2023

Ensure that the balancer is using Broflake in the way that Broflake ought to be used #174

Open

jay-418 unassigned noahlevenson Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broflake: investigate NAT traversal failure #163

Broflake: investigate NAT traversal failure #163

noahlevenson commented Jul 12, 2023

noahlevenson commented Jul 20, 2023

noahlevenson commented Aug 17, 2023

Broflake: investigate NAT traversal failure #163

Broflake: investigate NAT traversal failure #163

Comments

noahlevenson commented Jul 12, 2023

noahlevenson commented Jul 20, 2023

noahlevenson commented Aug 17, 2023