-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add instructions on a state snapshot recovery #92
base: main
Are you sure you want to change the base?
Conversation
Your Render PR Server URL is https://node-docs-pr-92.onrender.com. Follow its progress at https://dashboard.render.com/static/srv-cnnnp16v3ddc73fht5dg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
2. Create a support directory anywhere on the node. We will refer to it as `$OTHER_HOME`. | ||
3. Copy config to the new directory | ||
```bash | ||
cp $NEAR_HOME/config.json $OTHER_HOME/config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also copied genesis and node key. I'm not sure if both were required but at least one of them was.
|
||
## Terminology {#terminology} | ||
State Snapshot is different from DB snapshot. | ||
State Snapshot is checkpoint of some columns of the full DB taken at the epoch boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's a checkpoint of the whole db with some unneeded columns deleted. At the end of the day we have hardlinks to all of the sst files (at least without compaction). I would keep it simple and just say it's a checkpoing of the full db, and not mention about being selective of some columns. This is in line with the expected size of the snapshot too.
``` | ||
5. Change `$OTHER_HOME` config to work with state snapshot | ||
```bash | ||
cat <<< $(jq '.archive = false | .cold_store = null | .store.path = "test-data"' $OTHER_HOME/config.json) > $OTHER_HOME/config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll trust your bash scripting :)
@@ -60,6 +60,16 @@ If you observe problems with block production or resharding performance, you can | |||
This does not require a node restart, you can send a signal to the neard process to load the new config. | |||
Read more [on github](https://github.com/near/nearcore/blob/master/docs/architecture/how/resharding.md#monitoring). | |||
|
|||
### Mitigating state snapshot issue {#state snapshot} | |||
Node has to have a state snapshot in order for resharding to run. | |||
State snapshot is a smaller checkpoint of the whole DB taken at the epoch boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add something like:
"If the node fails to capture a snapshot at the epoch boundary it will not be able to proceed with resharding. In this case manual recovery will be needed."
@wacban I created draft instructions on restoring state snapshot. You are the one who actually did the restoring, so review this please, feel free to make any changes, and merge this whenever it is done. You should have permission to do it without approval.