Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broflake: investigate NAT traversal failure #163

Open
noahlevenson opened this issue Jul 12, 2023 · 2 comments
Open

Broflake: investigate NAT traversal failure #163

noahlevenson opened this issue Jul 12, 2023 · 2 comments

Comments

@noahlevenson
Copy link
Contributor

Now that we've got an abundance of censored peers on the network, the problem of NAT traversal has revealed itself to be quite significant. If you start a widget and pop open the console, you'll see a nonstop torrent of attempted connections, all of which result in NAT traversal failure. I'd estimate that < 1% of attempted connections succeed at NAT traversal.

This is critical-ish path for the MVP, because the widget -- from the perspective of the user -- still doesn't really do anything. It acquires connections so infrequently that it sorta seems broken. If there are any light lifts we can perform to improve the traversal rate, we should do them now. And if there aren't any light lifts, then at least we should know why.

Thus, the topic of this issue:

Let's aggressively instrument our NAT traversal functions. We just need to understand more about what happens when NAT traversal fails. Who does it fail for? Where do they live? Are they on desktop or mobile? Were they able to gather ICE candidates, and if so, what do their ICE candidates look like? Do we think they're behind CGNATs?

Maybe we add NAT behavior discovery (RFC 5780) so that clients can report their NAT type at failure time? It should be relatively trivial with Pion: https://pkg.go.dev/github.com/pion/stun/cmd/stun-nat-behaviour#section-readme

There is the possibility that we may find it necessary to know the NAT types of both parties for each traversal failure, so as to determine whether we're dealing with an unworkable network composition. However, given the very controlled quantity of uncensored peers presently on the network -- I think it's just a few peers we've daemonized on DO, plus 3 or 4 Lantern employees -- it might just be easier for all of us to manually determine our NAT types and factor it into the research here.

@noahlevenson noahlevenson self-assigned this Jul 12, 2023
@noahlevenson
Copy link
Contributor Author

Just dumping an update:

We instrumented NAT traversal and added NAT behavior discovery. Traces will start arriving when the new Flashlight builds go out. We'll probably only need a few hours worth of traces to be able to deduce what's going on.

@noahlevenson
Copy link
Contributor Author

Another update:

I'm still waiting for NAT traces to appear in Honeycomb. I'm assuming that application updates just haven't been pushed yet, or haven't been pushed widely enough. Once the trace data appears, we can start debugging and developing a plan.

In the meantime, as a ham-fisted workaround, we disabled Broflake for all mobile users. The effect is positive -- it seems that desktop users can pierce their NATs at a greater rate, which produces more activity in widget users' clients.

I'd like to keep this issue open until we're able to view those NAT traces and come up with a hypothesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant