sync v2 #272

dshulyak · 2023-10-03T07:17:30Z

Description

syncv2 protocol uses recursive set reconciliation to make sure the nodes are always in sync.
It will replace existing sync, providing the following benefits:

avoid expensive database queries that happen very often with current ATX syncer, namely, ax/1 epoch queries
make it possible to split large sync requests between multiple peers ("torrent style")
make it very cheap to verify if 2 nodes are in sync
help resolve small differences between nodes with very little network traffic and resource usage
make it more practical to sync from scratch (this may require some additional database optimizations)
eventually, replace gossip for ATXs and malfeasance proofs

The plan is to use syncv2 for ATXs only initially, later adding support for malfeasance proofs, then extending the use of the protocol to active set synchronization, etc.

The recursive set reconciliation protocol is based on Range-Based Set Reconciliation paper by Aljoscha Meyer.
The multi-peer reconciliation approach is loosely based on SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale Blockchains paper by Novak Boškov, Sevval Simsek, Ari Trachtenberg, and David Starobinski.

Full description of the syncv2 protocol

TODO

Original issue contents

this is a placeholder (not a spec) for all known sync improvements:

drop custom communication protocol for fetcher and use stream per request

this was implemented way before project switched to libp2p and had other requirements in mind. right now this code has no benefits.

efficient sync from scratch

the goal is to have implementation that makes good use of hardware resources:

limited by network throughput it can get from peers. good metric is how fast state is synced based on total throughput it can get from peers. for example if it can get 10MB/s from peers and total unique data is 1GB, then whole sync should take ~100s
downloading data should not be blocked by validation and writing to disk. this is straighforward to solve with pipelining, but will require some changes in a way sync interacts with validators
dos resistance

current implementation lacks protection against abuse. some queries are expensive and we should rate limit them in a way that allows honest nodes to get data that they need, and prevents abusers from causing harm.

additionally every sync endpoint should be fuzzed. it needs to be taken into account when writing code for it.

hash resolution that also download late data (atxs/ballot/malfeasance proofs)

existing implementation may miss out of order data. this is required long term for consensus correctness.
the implementation idea was to index data by received timestamp, and send such data to peers according to the timestamp.

from that list above it will be good implement 3 first items within 3 months. 4th item in 6 month or so

The text was updated successfully, but these errors were encountered:

…5264) long term solution for sync protocol limits is to refactor code to use stream and enforce limits in a way that doesn't require re-compilation (spacemeshos/pm#272). for cache size, we will soon switch to smarter cache in atxsdata module, but until then we have to keep all atxs in memory.

pigmej · 2023-12-15T16:03:15Z

We should also make it so that requests are evenly split, because right now they're not.

I understand the need for recent layers (yellow), but all others are DDoSing nodes.

lrettig · 2023-12-15T17:33:56Z

We should also make it so that requests are evenly split, because right now they're not.

I understand the need for recent layers (yellow), but all others are DDoSing nodes.

I don't understand any of what I'm looking at here. What does this mean? What are the colors? What are the axes?

fasmat · 2023-12-18T09:04:41Z

@lrettig: Requests for layer data comes in bursts, roughly every 5 minutes. For recent layers (yellow - recent 10 layers) this is OK, but for older layers (other colours, e.g. blue - last 2000 layers) this should be more evenly spread out to avoid spikes of requests / traffic that could be spread across a larger timeframe.

pigmej · 2023-12-18T09:39:03Z

very good summary from @fasmat that's exactly what it is. X time, Y number, colors, and legend "layers behind CurrentLayer at the time of asking.

ivan4th · 2024-04-09T13:11:57Z

Current code related to synvc2 effort is here: spacemeshos/go-spacemesh#5769
The generic peer-to-peer set reconciliation algorithm based on the "monoid tree" structure is already implemented.
The work is being done to perform "SREP"-style" multi-peer sync that eventually can replace gossip.

There's also another proposal of a potentially simpler sync approach, but it makes an assumption that transferring e.g. a month worth of ATX IDs is and always will be cheap, and while this assumption may (or may not) hold for a while for a while after the ATX merge, it is not guaranteed to hold in the long run

ivan4th · 2024-05-06T12:03:56Z

In order to improve the efficiency of the set reconciliation algorithm, we need LayerID field in ATXv2. It is NOT critical to be able to validate that field, b/c some bad actors using some invalid LayerID values in ATXs cannot cause noticeable performance degradation, given the honest majority.
Discussion: spacemeshos/go-spacemesh#5785 (comment)

ivan4th · 2024-05-18T06:00:12Z

After research discussion:
While it might not be really easy or efficient to add LayerID to the ATXs, the efficiency of sync can be improved in the following manner: a time filter is applied to the set based reconciler, so that only items which were received by the means of gossip some time ago (e.g. a minute) are reconciled.
Additional note: this might even work with the current gossip, so that the set reconciler quickly and efficiently fixes any data missed by the gossip mechanism.

ivan4th · 2024-05-18T06:05:54Z

Current progress: the multipeer reconciliation code has been added with support for both split-sync ("serve") and full-sync mode.
It appears that for efficient review of the syncv2 code the big PR #5769 needs to be split into smaller parts.
After a bit of stabilization and refactoring, the following PRs will be produced:

SyncTree
pairwise reconciler
multi-peer reconciler
integration with the existing syncer for ATXs and malfeasant identities

syncv2 will be feature-gated, initially. At some point later, we'll first enable syncv2 server across the network that will not initiate any sync by itself, and then, after some time, full syncv2 implementation will replace the current sync.

ivan4th · 2024-09-25T11:13:25Z

Pairwise sync part: spacemeshos/go-spacemesh#6350

ivan4th · 2024-11-18T18:02:35Z

syncv2 related PRs: https://github.com/spacemeshos/go-spacemesh/pulls?q=is%3Apr+label%3Aarea%2Fsyncv2

ivan4th · 2024-11-25T20:09:11Z

Updated issue description for easier tracking of this issue.

dshulyak added feat/sync v2 area/sync labels Oct 3, 2023

dshulyak added this to Dev team kanban Oct 3, 2023

dshulyak moved this to 📋 Backlog in Dev team kanban Oct 3, 2023

lrettig mentioned this issue Oct 17, 2023

Improve sync/gossip #280

Open

5 tasks

This was referenced Oct 19, 2023

[Merged by Bors] - sync: enable rate limiting for servers spacemeshos/go-spacemesh#5151

Closed

post distributed verification spacemeshos/go-spacemesh#5185

Closed

dshulyak mentioned this issue Nov 14, 2023

[Merged by Bors] - sync, cache: update sync atxs limits and increase cache size again spacemeshos/go-spacemesh#5264

Closed

pigmej assigned ivan4th Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync v2 #272

sync v2 #272

dshulyak commented Oct 3, 2023 •

edited by ivan4th

Loading

pigmej commented Dec 15, 2023

lrettig commented Dec 15, 2023

fasmat commented Dec 18, 2023

pigmej commented Dec 18, 2023

ivan4th commented Apr 9, 2024

ivan4th commented May 6, 2024

ivan4th commented May 18, 2024

ivan4th commented May 18, 2024

ivan4th commented Sep 25, 2024

ivan4th commented Nov 18, 2024

ivan4th commented Nov 25, 2024

sync v2 #272

sync v2 #272

Comments

dshulyak commented Oct 3, 2023 • edited by ivan4th Loading

Description

TODO

Original issue contents

pigmej commented Dec 15, 2023

lrettig commented Dec 15, 2023

fasmat commented Dec 18, 2023

pigmej commented Dec 18, 2023

ivan4th commented Apr 9, 2024

ivan4th commented May 6, 2024

ivan4th commented May 18, 2024

ivan4th commented May 18, 2024

ivan4th commented Sep 25, 2024

ivan4th commented Nov 18, 2024

ivan4th commented Nov 25, 2024

dshulyak commented Oct 3, 2023 •

edited by ivan4th

Loading