Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfiller #3263

Merged
merged 4 commits into from
Jan 20, 2022
Merged

Backfiller #3263

merged 4 commits into from
Jan 20, 2022

Conversation

arnetheduck
Copy link
Member

Backfilling is the process of downloading historical blocks via P2P that
are required to fulfill GetBlocksByRange duties - this happens during
both trusted node and finalized checkpoint syncs.

In particular, backfilling happens after syncing to head, such that
attestation work can start as soon as possible.

This PR is captures the part of #3209 that still needs more work.

@arnetheduck arnetheduck marked this pull request as draft January 9, 2022 21:10
arnetheduck added a commit that referenced this pull request Jan 9, 2022
Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.

Features include:

* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)

Future improvements:

* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries
@arnetheduck arnetheduck mentioned this pull request Jan 9, 2022
arnetheduck added a commit that referenced this pull request Jan 9, 2022
Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.

Features include:

* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)

Future improvements:

* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries
@github-actions
Copy link

github-actions bot commented Jan 10, 2022

Unit Test Results

     12 files  ±0     794 suites  ±0   35m 57s ⏱️ - 2m 8s
1 602 tests ±0  1 556 ✔️ ±0  46 💤 ±0  0 ±0 
9 457 runs  ±0  9 361 ✔️ ±0  96 💤 ±0  0 ±0 

Results for commit 59aab00. ± Comparison against base commit e493953.

♻️ This comment has been updated with latest results.

arnetheduck added a commit that referenced this pull request Jan 11, 2022
Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.

Features include:

* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)

Future improvements:

* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries
arnetheduck added a commit that referenced this pull request Jan 12, 2022
Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.

Features include:

* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)

Future improvements:

* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries
arnetheduck added a commit that referenced this pull request Jan 17, 2022
* Trusted node sync

Trusted node sync, aka checkpoint sync, allows syncing tyhe chain from a
trusted node instead of relying on a full sync from genesis.

Features include:

* sync from any slot, including the latest finalized slot
* backfill blocks either from the REST api (default) or p2p (#3263)

Future improvements:

* top up blocks between head in database and some other node - this
makes for an efficient backup tool
* recreate historical state to enable historical queries

* fixes

* load genesis from network metadata
* check checkpoint block root against state
* fix invalid block root in rest json decoding
* odds and ends

* retry looking for epoch-boundary checkpoint blocks
@arnetheduck arnetheduck marked this pull request as ready for review January 18, 2022 07:24
except CancelledError:
if not(isNil(peer)):
man.pool.release(peer)
true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simple.. break here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because I do not want to investigate one more for/while+exception+break issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wouldn't be bad to add tests for this in chronos - basically, a few simple for/while/finally etc sanity tests

let forked =
res.map() do (blcks: seq[phase0.SignedBeaconBlock]) -> auto:
blcks.mapIt(ForkedSignedBeaconBlock.init(it))
return forked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a template is worth it, so many wasted lines/mindspace

template debugLog(msg: string): untyped {.dirty.} =
  debug msg, 
            peer = peer, slot = req.slot, count = req.count,
            step = req.step, peer_speed = peer.netKbps(),
            direction = man.direction, error = $res.error(),
            topics = "syncman"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically, this is solved with logScope, but logscope is unfortunately unreliable (cc @zah can we do anything about that?)

@@ -303,10 +341,6 @@ proc syncStep[A, B](man: SyncManager[A, B], index: int, peer: A) {.async.} =
peer_speed = peer.netKbps(), index = index,
direction = man.direction, topics = "syncman"
peer.updateScore(PeerScoreUseless)

# Give the peer time to do some syncing
await sleepAsync(StatusExpirationTime div 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we avoid dos-ing peers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think we DOS-ing peers? We descored peer which is not useful for us and we return it back to the PeerPool. So it is not supposed to be returned for next step.

peer = peer, index = index,
peer_score = peer.getScore(), peer_speed = peer.netKbps(),
errName = exc.name, errMsg = exc.msg, direction = man.direction,
topics = "syncman"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sync_manager needs a debugLog template to avoid littering the codebase with those. (though that might cause issue with filename:linenumber logic from Chronicles?

arnetheduck and others added 2 commits January 19, 2022 22:32
Backfilling is the process of downloading historical blocks via P2P that
are required to fulfill `GetBlocksByRange` duties - this happens during
both trusted node and finalized checkpoint syncs.

In particular, backfilling happens after syncing to head, such that
attestation work can start as soon as possible.
Remove usage of `awaitne`.
Add cancellation support.
Remove unneeded `sleepAsync()` if peer's head is older than needed.
Add `direction` field to all logs.
Fix syncmanager wedge issue.
Add proper resource cleaning procedure on backward sync finish.
@arnetheduck arnetheduck merged commit 570379d into unstable Jan 20, 2022
@arnetheduck arnetheduck deleted the chaindag-backfill branch January 20, 2022 07:25
etan-status added a commit to etan-status/nimbus-eth2 that referenced this pull request Feb 13, 2022
When initializing backfill sync, the implementation intends to start at
the first unknown slot (`1` before tail). However, an incorrect variable
is passed, and backfill sync actually starts at the tail slot instead.
This patch corrects this by passing the intended variable. The problem
was introduced with the original backfill implementation at status-im#3263.
zah pushed a commit that referenced this pull request Feb 14, 2022
When initializing backfill sync, the implementation intends to start at
the first unknown slot (`1` before tail). However, an incorrect variable
is passed, and backfill sync actually starts at the tail slot instead.
This patch corrects this by passing the intended variable. The problem
was introduced with the original backfill implementation at #3263.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants