Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add instructions on a state snapshot recovery #92

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

posvyatokum
Copy link
Member

@wacban I created draft instructions on restoring state snapshot. You are the one who actually did the restoring, so review this please, feel free to make any changes, and merge this whenever it is done. You should have permission to do it without approval.

Copy link

render bot commented Mar 11, 2024

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

2. Create a support directory anywhere on the node. We will refer to it as `$OTHER_HOME`.
3. Copy config to the new directory
```bash
cp $NEAR_HOME/config.json $OTHER_HOME/config.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also copied genesis and node key. I'm not sure if both were required but at least one of them was.


## Terminology {#terminology}
State Snapshot is different from DB snapshot.
State Snapshot is checkpoint of some columns of the full DB taken at the epoch boundary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's a checkpoint of the whole db with some unneeded columns deleted. At the end of the day we have hardlinks to all of the sst files (at least without compaction). I would keep it simple and just say it's a checkpoing of the full db, and not mention about being selective of some columns. This is in line with the expected size of the snapshot too.

```
5. Change `$OTHER_HOME` config to work with state snapshot
```bash
cat <<< $(jq '.archive = false | .cold_store = null | .store.path = "test-data"' $OTHER_HOME/config.json) > $OTHER_HOME/config.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll trust your bash scripting :)

@@ -60,6 +60,16 @@ If you observe problems with block production or resharding performance, you can
This does not require a node restart, you can send a signal to the neard process to load the new config.
Read more [on github](https://github.com/near/nearcore/blob/master/docs/architecture/how/resharding.md#monitoring).

### Mitigating state snapshot issue {#state snapshot}
Node has to have a state snapshot in order for resharding to run.
State snapshot is a smaller checkpoint of the whole DB taken at the epoch boundary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add something like:
"If the node fails to capture a snapshot at the epoch boundary it will not be able to proceed with resharding. In this case manual recovery will be needed."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants