-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Comments #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick reviewing pass on the handshake protocol. No compensation needed 💜
I hope github lets me open a second review for the wire protocol, about to find out.
|
||
The pre-shared key MUST be mixed into the handshake state as per the rules in | ||
*9. Pre-shared symmetric keys* of the Noise specification. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If me and you both share an interest in 4 cabals, how do we know which connection attempts match to which cabal? Do we just attempt 16 different connections on different transport channels? What if we both are interested in 100 cabals each, 10 of which are shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah feels like this won't scale great for a bunch of cabals being replicated on one machine. I assume you'd need separate ports for each advertisement in the peer discovery process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's like @RangerMauve says: you'd need to be listening and doing discovery for each cabal you are interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that simply make the ports you use part of the shared secret? That might allow me to probe the ports of several machines to find out whether they belong to the same cabals or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this. Could you rephrase? Are you describing an attack vector? What do you mean by port as part of the shared secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hackergrrl Sorry, missed your reply there.
Suppose each cabal was identified only by its key, but there was no canonic information on where (i.e., on which port) peers expect connection attempts for that cabal. Suppose further, there is a peer that is interested in five cabals. That peer probably listens for incoming connections on five different ports. Given that there is no "official" port for each cabal, and they want to keep secret which cabals they belong to, they only advertise "you can connect to my cabals on ports 8000 to 8004". Now, if I come along, interested in only a single cabal, and want to connect to this peer, I have to try out all five ports. And if I'm interested in 20 cabals, none of which the other peer also has (which I don't know though), I make 5*20 connection attempts.
The most obvious way to restore efficiency with the current protocol is to have all members in a cabal agree on a port that they all will use for that cabal. But at that point, the port becomes an identifying secret, just like the actual secret key - if I'm not connecting to the correct port, I cannot join, just like if I didn't know the secret key. And it is a secret that a nosy participant can snoop by probing people which ports they expose (for cable, if that is detectable); the nosy peer can then obtain information on who shares cabals with whom.
Does that clarify things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok! I can see your understanding now.
Something not mentioned in this specification, but that is planned, is having a discovery mechanism for finding other members of cabals, which will include both an IP and port. This will resolve any uncertainty about what port to find which cabal on.
handshake.md
Outdated
these steps: | ||
|
||
1. Compute the total length of all of the cipertexts fragments, in bytes, as | ||
`len`, a 32-bit unsigned little endian integer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the "total length of all ciphertext fragments" just plaintext.len
? If yes, write that instead. If no, what is it and how do I compute it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each ciphertext also carries a 16 byte MAC tag for authentication. That's also why the maximum plaintext length isn't 65535 (== u16::MAX) but 65519 (==u16::MAX - 16). So I suppose the total length of all ciphertexts is plaintext.len + 16 * ceil(plaintext.len/65519)
, or when using integer division, plaintext.len + 16 * (1 + plaintext.len/65519)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon encrypting a plaintext to a ciphertext, Noise adds 16 bytes of authentication data to each resulting ciphertext, so the total length here depends on the number of fragments. I agree about making the computation explicit though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a brief explanation of the calculation. It is shown in the WriteMsg
pseudocode, also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this section to be more clear. I'm curious what y'all think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the wire spec. Pedantic to the point of obnoxiousness in parts, but hopefully helpful nonetheless.
Overall, this was an enjoyable read, kudos. I have to run for now, but enjoy =)
wire.md
Outdated
requested time range, though they may desire not to in certain circumstances, | ||
particularly if a channel has a very long history and the responding client | ||
lacks sufficient resources at the time to return thousands or hundreds of | ||
thousands of chat message hashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any recommendation on which messages to transmit then (probably the newest ones?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this catch. Specifying this makes pagination possible!
A responder is RECOMMENDED to send posts in reverse chronological order by post timestamp, so that, if the requester sees that they received a number of hashes equal to their
limit
, they can be assured that they now have all posts newer than the oldest post hash the responder did send, and can make subsequent requests to paginate backwards in time.
|
||
If `future = 0`, only the latest state posts will be included, and the request | ||
MUST NOT be held open. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is future boolean and not simply a limit
int analogous to that of Channel Time Range Requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a limit would be useful here without a means of paginating through the response set of hashes.
skipping the first `offset` entries). | ||
|
||
The `offset` field can be combined with the `limit` field to allow clients to | ||
paginate through the list of all channel names known by a peer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To paginate, they must be consistently sorted. The spec could require this? It might even require a particular ordering, to make pagination consistent irrespective of the peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mention the stable sort requirement in the Channel List Response message, where it's used. I'll think about whether it makes sense to specify a sort method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, and agree that a sort should be specified.
The channel names SHOULD be sorted lexicographically in ascending order, so that requesters can effectively sent subsequent requests to paginate through the results.
|
||
A responder MUST send a Hash Response message with `hash_count = 0` to indicate | ||
that they do not intend to return any further hashes for the given `req_id` and | ||
they have concluded the request on their side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes responses that consist of zero hashes indistinguishable from there-might-be-data-but-I-wont-deliver-it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is true more generally -- the recipient can't distinguish the "i don't want to send you more data" from the "i don't have any more data to send you" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AljoschaMeyer Is there a case where knowing the difference would be helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair =D I can't immediately come up with one.
Agreed, this was a nice read! I have left a few comments inline. Regarding the version negotiation: I think right now it's not great that there is no cryptographic agreement on the major version, and I suggest using the Noise prologue to achieve it. And as long as there are no features to be negotiated, maybe it's not needed? Either way, the minor version negotiation could theoretically happen in the ciphertext now. Regarding the paper I posted on fragmentation: I only skimmed it a long time ago, and the only thing I took away from it was "boundary-hiding is hard than just encrypting the length prefixes", because you usually can see where ciphertexts start and end by watching transmission patterns. It's definitely a vector of leakage, but when I think about what the actual concern is, then I think it boils down to fingerprinting: I see that you received a lot of ciphertexts that were the same size as that other persons, so you are likely in a chat group. If I recall correctly, the InterMAC stuff was a bit complicated to implement, so I'd understand if you consider this out of scope. But maybe you can add some padding inside the ciphertext, so messages look more uniform? Looking forward to any follow-ups :) Oh and yeah, no compensation please. |
@AljoschaMeyer @keks Thank you so much for your notes, wow! It's going to take me a bit to work through all of this, but I feel tremendous gratitude for y'all taking the time. 💜 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I like how it's focused on gossip. It'd be neat to compare performance against the hypercore version.
|
||
The pre-shared key MUST be mixed into the handshake state as per the rules in | ||
*9. Pre-shared symmetric keys* of the Noise specification. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah feels like this won't scale great for a bunch of cabals being replicated on one machine. I assume you'd need separate ports for each advertisement in the peer discovery process?
wire.md
Outdated
cause ordering problems if a client's hardware clock is skewed, or a timestamp | ||
is spoofed. | ||
|
||
Implementations are RECOMMENDED to set and utilize links on chat messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind elaborating on why it's recommended instead of mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea with recommended implementation is because it adds to the load of work implementors would need to do, which is an accessibility concern, making it possible for more minimal implementations to exist.
Co-authored-by: Aljoscha Meyer <[email protected]>
Co-authored-by: Aljoscha Meyer <[email protected]>
Co-authored-by: Aljoscha Meyer <[email protected]>
Co-authored-by: Aljoscha Meyer <[email protected]>
Co-authored-by: Aljoscha Meyer <[email protected]>
Co-authored-by: Aljoscha Meyer <[email protected]>
Co-authored-by: Aljoscha Meyer <[email protected]>
Thanks very much for the notes @RangerMauve! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my god I just realized somehow my review hasn't been posted?? I thought I posted it with my comment! One of these days I'll learn to github...
Protocol must be encoded and decoded. | ||
|
||
At a high level, all Cable Wire Protocol messages need to be passed through | ||
Noise for encryption, and then prefixed with an encrypted length indicator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then prefixed with an encrypted length indicator
I don't think this came up again, or did I miss it? I only saw plaintext length prefixes, and only for fragmentation, not for the simple case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see the confusion. I originally used encryption for length prefixes, but then removed it... and now have re-added it. 😅
handshake.md
Outdated
these steps: | ||
|
||
1. Compute the total length of all of the cipertexts fragments, in bytes, as | ||
`len`, a 32-bit unsigned little endian integer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each ciphertext also carries a 16 byte MAC tag for authentication. That's also why the maximum plaintext length isn't 65535 (== u16::MAX) but 65519 (==u16::MAX - 16). So I suppose the total length of all ciphertexts is plaintext.len + 16 * ceil(plaintext.len/65519)
, or when using integer division, plaintext.len + 16 * (1 + plaintext.len/65519)
.
- `varint`: a variable-length unsigned integer. | ||
|
||
### 6.2 Post Formats | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found this pretty confusing.
wire.md
Outdated
|
||
Regarding the above section (5.2.3 Limits), hashes that are deduplicated by an | ||
intermediary peer, and thus not transmitted back to the requester, do not count | ||
against the `limit`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the wording I would assume that is the case. They can request the posts/hashes of a time range that does not contain the message they just received 100 times.
|
||
A responder MUST send a Hash Response message with `hash_count = 0` to indicate | ||
that they do not intend to return any further hashes for the given `req_id` and | ||
they have concluded the request on their side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is true more generally -- the recipient can't distinguish the "i don't want to send you more data" from the "i don't have any more data to send you" case.
After consideration, I've decided to remove TTL from this version of the protocol. It's not yet clear that it's a necessary feature, and its removal simplifies implementation logic. |
@AljoschaMeyer @keks Hey y'all! How would you like to be credited as contributors (if at all)? If you can give me the verbatim name you'd like to use I can joyfully add it. :) |
Hey y'all! I'm going to consider feedback concluded on 8am Pacific on Wednesday April 24th, and merge in this work. :) |
Huh, "contributor" feels a bit strong to me. If you want to mention me (as "Aljoscha Meyer") for "providing feedback", or something along those lines, that works. |
For me, Contributor feels right. You definitely contributed. I see a lot of value in your comments, and the spec gained tremendously from them in my opinion. So far "Contributor" has included everyone who has contributed anything, including fixes to typos. It all counts to me! If you still feel strongly about it though, I will of course honour your request and add an additional category for you called "Providing Feedback". 💙 |
In that case, "contributor" is fine for me =) |
If you feel like I contributed, I'll take it, but I don't have strong feelings either way. If you want to mention me (in whatever role), you can call me "Jan Winkelmann". |
4b59a57
to
56a2ef8
Compare
Merged! Thank you so much y'all for your valuable time and comments. If y'all end up having thoughts re: any of the revision comments I made, please feel free to respond here still, or open a new issue. |
🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼🌼!!!! |
Woop, congratulations! |
The Cable Protocol is a new, proposed protocol specification. It's made up of two smaller protocols: the Cable Wire Protocol and the Cable Handshake Protocol.
The purpose of the Cable Protocol is to facilitate the setup of a secure connection between two members of a cabal chat, and the creation and sync of that cabal, by allowing peers to exchange cryptographically signed documents with each other, such as chat messages, spread across various user-defined channels.
For small edits, please consider using the github feature for "suggested changes": https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/commenting-on-a-pull-request
Thank you for reviewing our work!