display actionable error message when sync does not start #4042

egasimus · 2024-11-18T10:29:19Z

Recently, we began to frequently encounter cases where our local "resync" (pseudo-archival) nodes successfully credit initial balances, and then do not begin to sync from the provided persistent_peers.

INFO namada_node::shell::init_chain: Crediting X nam tokens to Y
INFO namada_node::shell::init_chain: Crediting A nam tokens to Z
# ... repeat for thousands of lines ...
# and then crickets 🦗🦗🦗

Sometimes this is due to misconfiguring the persistent peers, e.g. wrong hostname, wrong port, wrong docker networking config...
Sometimes might possibly be due to the peers only allowing 1 connection per source IP (bad if you want to do e.g. blue/green deploy from same IP).
Sometimes, confusingly, you wait for 2 minutes and it does begin to sync.
Sometimes, even more confusingly, everything is fine, except that the node has stopped retrying, and the sync only begins after a manual restart of the node.

These have different root causes, yet in all cases there is zero feedback from namadan as to what is wrong. This makes it difficult to determine and take the appropriate next step in a timely manner, which puts unreasonable strain on our DevOps resources.

It would be immensely helpful if the state of "failure to begin sync" resulted in an explanatory error message being emitted at INFO, WARNING, or ERROR level.

Looking at the way run_aux launches multiple sub-tasks on an asynchronous basis, I'd venture a guess that it will also be necessary to repeat that message periodically, so that it doesn't get lost in the scrollback from the crediting tokens messages.

The text was updated successfully, but these errors were encountered:

sug0 · 2024-11-18T21:27:25Z

All network code is handled by CometBFT. Namada hides its output by default, but you can export NAMADA_CMT_STDOUT=true and CMT_LOG_LEVEL=info or CMT_LOG_LEVEL=debug to see what's going on at the P2P level. Be warned that setting CometBFT's log level to debug generates incredibly noisy output.

egasimus added the enhancement New feature or request label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

display actionable error message when sync does not start #4042

display actionable error message when sync does not start #4042

egasimus commented Nov 18, 2024 •

edited

Loading

sug0 commented Nov 18, 2024

display actionable error message when sync does not start #4042

display actionable error message when sync does not start #4042

Comments

egasimus commented Nov 18, 2024 • edited Loading

sug0 commented Nov 18, 2024

egasimus commented Nov 18, 2024 •

edited

Loading