Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Comments #23

Closed
wants to merge 83 commits into from
Closed

Request for Comments #23

wants to merge 83 commits into from

Conversation

hackergrrl
Copy link
Member

@hackergrrl hackergrrl commented Jan 31, 2024

The Cable Protocol is a new, proposed protocol specification. It's made up of two smaller protocols: the Cable Wire Protocol and the Cable Handshake Protocol.

The purpose of the Cable Protocol is to facilitate the setup of a secure connection between two members of a cabal chat, and the creation and sync of that cabal, by allowing peers to exchange cryptographically signed documents with each other, such as chat messages, spread across various user-defined channels.

For small edits, please consider using the github feature for "suggested changes": https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/commenting-on-a-pull-request

Thank you for reviewing our work!

Copy link
Contributor

@AljoschaMeyer AljoschaMeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick reviewing pass on the handshake protocol. No compensation needed 💜

I hope github lets me open a second review for the wire protocol, about to find out.

README.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved

The pre-shared key MUST be mixed into the handshake state as per the rules in
*9. Pre-shared symmetric keys* of the Noise specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If me and you both share an interest in 4 cabals, how do we know which connection attempts match to which cabal? Do we just attempt 16 different connections on different transport channels? What if we both are interested in 100 cabals each, 10 of which are shared?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah feels like this won't scale great for a bunch of cabals being replicated on one machine. I assume you'd need separate ports for each advertisement in the peer discovery process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like @RangerMauve says: you'd need to be listening and doing discovery for each cabal you are interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that simply make the ports you use part of the shared secret? That might allow me to probe the ports of several machines to find out whether they belong to the same cabals or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this. Could you rephrase? Are you describing an attack vector? What do you mean by port as part of the shared secret?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hackergrrl Sorry, missed your reply there.

Suppose each cabal was identified only by its key, but there was no canonic information on where (i.e., on which port) peers expect connection attempts for that cabal. Suppose further, there is a peer that is interested in five cabals. That peer probably listens for incoming connections on five different ports. Given that there is no "official" port for each cabal, and they want to keep secret which cabals they belong to, they only advertise "you can connect to my cabals on ports 8000 to 8004". Now, if I come along, interested in only a single cabal, and want to connect to this peer, I have to try out all five ports. And if I'm interested in 20 cabals, none of which the other peer also has (which I don't know though), I make 5*20 connection attempts.

The most obvious way to restore efficiency with the current protocol is to have all members in a cabal agree on a port that they all will use for that cabal. But at that point, the port becomes an identifying secret, just like the actual secret key - if I'm not connecting to the correct port, I cannot join, just like if I didn't know the secret key. And it is a secret that a nosy participant can snoop by probing people which ports they expose (for cable, if that is detectable); the nosy peer can then obtain information on who shares cabals with whom.

Does that clarify things?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok! I can see your understanding now.

Something not mentioned in this specification, but that is planned, is having a discovery mechanism for finding other members of cabals, which will include both an IP and port. This will resolve any uncertainty about what port to find which cabal on.

handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated
these steps:

1. Compute the total length of all of the cipertexts fragments, in bytes, as
`len`, a 32-bit unsigned little endian integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the "total length of all ciphertext fragments" just plaintext.len? If yes, write that instead. If no, what is it and how do I compute it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each ciphertext also carries a 16 byte MAC tag for authentication. That's also why the maximum plaintext length isn't 65535 (== u16::MAX) but 65519 (==u16::MAX - 16). So I suppose the total length of all ciphertexts is plaintext.len + 16 * ceil(plaintext.len/65519), or when using integer division, plaintext.len + 16 * (1 + plaintext.len/65519).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon encrypting a plaintext to a ciphertext, Noise adds 16 bytes of authentication data to each resulting ciphertext, so the total length here depends on the number of fragments. I agree about making the computation explicit though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a brief explanation of the calculation. It is shown in the WriteMsg pseudocode, also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this section to be more clear. I'm curious what y'all think.

handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved
Copy link
Contributor

@AljoschaMeyer AljoschaMeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the wire spec. Pedantic to the point of obnoxiousness in parts, but hopefully helpful nonetheless.

Overall, this was an enjoyable read, kudos. I have to run for now, but enjoy =)

wire.md Outdated Show resolved Hide resolved
wire.md Outdated Show resolved Hide resolved
wire.md Outdated Show resolved Hide resolved
wire.md Outdated Show resolved Hide resolved
wire.md Outdated Show resolved Hide resolved
wire.md Outdated
requested time range, though they may desire not to in certain circumstances,
particularly if a channel has a very long history and the responding client
lacks sufficient resources at the time to return thousands or hundreds of
thousands of chat message hashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any recommendation on which messages to transmit then (probably the newest ones?)?

Copy link
Member Author

@hackergrrl hackergrrl Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this catch. Specifying this makes pagination possible!

A responder is RECOMMENDED to send posts in reverse chronological order by post timestamp, so that, if the requester sees that they received a number of hashes equal to their limit, they can be assured that they now have all posts newer than the oldest post hash the responder did send, and can make subsequent requests to paginate backwards in time.


If `future = 0`, only the latest state posts will be included, and the request
MUST NOT be held open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is future boolean and not simply a limit int analogous to that of Channel Time Range Requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a limit would be useful here without a means of paginating through the response set of hashes.

skipping the first `offset` entries).

The `offset` field can be combined with the `limit` field to allow clients to
paginate through the list of all channel names known by a peer.
Copy link
Contributor

@AljoschaMeyer AljoschaMeyer Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To paginate, they must be consistently sorted. The spec could require this? It might even require a particular ordering, to make pagination consistent irrespective of the peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention the stable sort requirement in the Channel List Response message, where it's used. I'll think about whether it makes sense to specify a sort method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and agree that a sort should be specified.

The channel names SHOULD be sorted lexicographically in ascending order, so that requesters can effectively sent subsequent requests to paginate through the results.


A responder MUST send a Hash Response message with `hash_count = 0` to indicate
that they do not intend to return any further hashes for the given `req_id` and
they have concluded the request on their side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes responses that consist of zero hashes indistinguishable from there-might-be-data-but-I-wont-deliver-it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is true more generally -- the recipient can't distinguish the "i don't want to send you more data" from the "i don't have any more data to send you" case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AljoschaMeyer Is there a case where knowing the difference would be helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair =D I can't immediately come up with one.

wire.md Outdated Show resolved Hide resolved
@keks
Copy link
Contributor

keks commented Feb 3, 2024

Agreed, this was a nice read! I have left a few comments inline.

Regarding the version negotiation: I think right now it's not great that there is no cryptographic agreement on the major version, and I suggest using the Noise prologue to achieve it. And as long as there are no features to be negotiated, maybe it's not needed? Either way, the minor version negotiation could theoretically happen in the ciphertext now.

Regarding the paper I posted on fragmentation: I only skimmed it a long time ago, and the only thing I took away from it was "boundary-hiding is hard than just encrypting the length prefixes", because you usually can see where ciphertexts start and end by watching transmission patterns. It's definitely a vector of leakage, but when I think about what the actual concern is, then I think it boils down to fingerprinting: I see that you received a lot of ciphertexts that were the same size as that other persons, so you are likely in a chat group. If I recall correctly, the InterMAC stuff was a bit complicated to implement, so I'd understand if you consider this out of scope. But maybe you can add some padding inside the ciphertext, so messages look more uniform?

Looking forward to any follow-ups :)

Oh and yeah, no compensation please.

@hackergrrl
Copy link
Member Author

@AljoschaMeyer @keks Thank you so much for your notes, wow! It's going to take me a bit to work through all of this, but I feel tremendous gratitude for y'all taking the time. 💜

@hackergrrl hackergrrl changed the title [DRAFT] Request for Comments Request for Comments Feb 6, 2024
Copy link

@RangerMauve RangerMauve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I like how it's focused on gossip. It'd be neat to compare performance against the hypercore version.

handshake.md Outdated Show resolved Hide resolved
handshake.md Show resolved Hide resolved

The pre-shared key MUST be mixed into the handshake state as per the rules in
*9. Pre-shared symmetric keys* of the Noise specification.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah feels like this won't scale great for a bunch of cabals being replicated on one machine. I assume you'd need separate ports for each advertisement in the peer discovery process?

wire.md Show resolved Hide resolved
wire.md Outdated
cause ordering problems if a client's hardware clock is skewed, or a timestamp
is spoofed.

Implementations are RECOMMENDED to set and utilize links on chat messages

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind elaborating on why it's recommended instead of mandatory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea with recommended implementation is because it adds to the load of work implementors would need to do, which is an accessibility concern, making it possible for more minimal implementations to exist.

wire.md Outdated Show resolved Hide resolved
wire.md Outdated Show resolved Hide resolved
@hackergrrl
Copy link
Member Author

Thanks very much for the notes @RangerMauve!

Copy link
Contributor

@keks keks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my god I just realized somehow my review hasn't been posted?? I thought I posted it with my comment! One of these days I'll learn to github...

handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved
handshake.md Outdated Show resolved Hide resolved
Protocol must be encoded and decoded.

At a high level, all Cable Wire Protocol messages need to be passed through
Noise for encryption, and then prefixed with an encrypted length indicator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then prefixed with an encrypted length indicator

I don't think this came up again, or did I miss it? I only saw plaintext length prefixes, and only for fragmentation, not for the simple case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the confusion. I originally used encryption for length prefixes, but then removed it... and now have re-added it. 😅

handshake.md Outdated
these steps:

1. Compute the total length of all of the cipertexts fragments, in bytes, as
`len`, a 32-bit unsigned little endian integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each ciphertext also carries a 16 byte MAC tag for authentication. That's also why the maximum plaintext length isn't 65535 (== u16::MAX) but 65519 (==u16::MAX - 16). So I suppose the total length of all ciphertexts is plaintext.len + 16 * ceil(plaintext.len/65519), or when using integer division, plaintext.len + 16 * (1 + plaintext.len/65519).

- `varint`: a variable-length unsigned integer.

### 6.2 Post Formats

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found this pretty confusing.

wire.md Outdated

Regarding the above section (5.2.3 Limits), hashes that are deduplicated by an
intermediary peer, and thus not transmitted back to the requester, do not count
against the `limit`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the wording I would assume that is the case. They can request the posts/hashes of a time range that does not contain the message they just received 100 times.

wire.md Outdated Show resolved Hide resolved

A responder MUST send a Hash Response message with `hash_count = 0` to indicate
that they do not intend to return any further hashes for the given `req_id` and
they have concluded the request on their side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is true more generally -- the recipient can't distinguish the "i don't want to send you more data" from the "i don't have any more data to send you" case.

handshake.md Outdated Show resolved Hide resolved
@hackergrrl
Copy link
Member Author

After consideration, I've decided to remove TTL from this version of the protocol. It's not yet clear that it's a necessary feature, and its removal simplifies implementation logic.

@hackergrrl
Copy link
Member Author

@AljoschaMeyer @keks Hey y'all! How would you like to be credited as contributors (if at all)? If you can give me the verbatim name you'd like to use I can joyfully add it. :)

@hackergrrl
Copy link
Member Author

Hey y'all! I'm going to consider feedback concluded on 8am Pacific on Wednesday April 24th, and merge in this work. :)

@AljoschaMeyer
Copy link
Contributor

@AljoschaMeyer @keks Hey y'all! How would you like to be credited as contributors (if at all)? If you can give me the verbatim name you'd like to use I can joyfully add it. :)

Huh, "contributor" feels a bit strong to me. If you want to mention me (as "Aljoscha Meyer") for "providing feedback", or something along those lines, that works.

@hackergrrl
Copy link
Member Author

hackergrrl commented Apr 23, 2024

@AljoschaMeyer

Huh, "contributor" feels a bit strong to me. If you want to mention me (as "Aljoscha Meyer") for "providing feedback", or something along those lines, that works.

For me, Contributor feels right. You definitely contributed. I see a lot of value in your comments, and the spec gained tremendously from them in my opinion. So far "Contributor" has included everyone who has contributed anything, including fixes to typos. It all counts to me!

If you still feel strongly about it though, I will of course honour your request and add an additional category for you called "Providing Feedback". 💙

@AljoschaMeyer
Copy link
Contributor

In that case, "contributor" is fine for me =)

@keks
Copy link
Contributor

keks commented Apr 24, 2024

If you feel like I contributed, I'll take it, but I don't have strong feelings either way. If you want to mention me (in whatever role), you can call me "Jan Winkelmann".

@hackergrrl
Copy link
Member Author

Merged! Thank you so much y'all for your valuable time and comments. If y'all end up having thoughts re: any of the revision comments I made, please feel free to respond here still, or open a new issue.

@hackergrrl hackergrrl closed this Apr 25, 2024
@cblgh
Copy link
Member

cblgh commented Apr 25, 2024

🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼!!!!

@AljoschaMeyer
Copy link
Contributor

Woop, congratulations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants