You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we began to frequently encounter cases where our local "resync" (pseudo-archival) nodes successfully credit initial balances, and then do not begin to sync from the provided persistent_peers.
INFO namada_node::shell::init_chain: Crediting X nam tokens to Y
INFO namada_node::shell::init_chain: Crediting A nam tokens to Z
# ... repeat for thousands of lines ...
# and then crickets 🦗🦗🦗
Sometimes this is due to misconfiguring the persistent peers, e.g. wrong hostname, wrong port, wrong docker networking config...
Sometimes might possibly be due to the peers only allowing 1 connection per source IP (bad if you want to do e.g. blue/green deploy from same IP).
Sometimes, confusingly, you wait for 2 minutes and it does begin to sync.
Sometimes, even more confusingly, everything is fine, except that the node has stopped retrying, and the sync only begins after a manual restart of the node.
These have different root causes, yet in all cases there is zero feedback from namadan as to what is wrong. This makes it difficult to determine and take the appropriate next step in a timely manner, which puts unreasonable strain on our DevOps resources.
It would be immensely helpful if the state of "failure to begin sync" resulted in an explanatory error message being emitted at INFO, WARNING, or ERROR level.
Looking at the way run_aux launches multiple sub-tasks on an asynchronous basis, I'd venture a guess that it will also be necessary to repeat that message periodically, so that it doesn't get lost in the scrollback from the crediting tokens messages.
The text was updated successfully, but these errors were encountered:
All network code is handled by CometBFT. Namada hides its output by default, but you can export NAMADA_CMT_STDOUT=true and CMT_LOG_LEVEL=info or CMT_LOG_LEVEL=debug to see what's going on at the P2P level. Be warned that setting CometBFT's log level to debug generates incredibly noisy output.
Recently, we began to frequently encounter cases where our local "resync" (pseudo-archival) nodes successfully credit initial balances, and then do not begin to sync from the provided
persistent_peers
.These have different root causes, yet in all cases there is zero feedback from
namadan
as to what is wrong. This makes it difficult to determine and take the appropriate next step in a timely manner, which puts unreasonable strain on our DevOps resources.It would be immensely helpful if the state of "failure to begin sync" resulted in an explanatory error message being emitted at INFO, WARNING, or ERROR level.
Looking at the way
run_aux
launches multiple sub-tasks on an asynchronous basis, I'd venture a guess that it will also be necessary to repeat that message periodically, so that it doesn't get lost in the scrollback from thecrediting tokens
messages.The text was updated successfully, but these errors were encountered: