Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our node does not perform as good as other nodes #940

Open
Tracked by #818
volovyks opened this issue Nov 26, 2024 · 5 comments
Open
Tracked by #818

Our node does not perform as good as other nodes #940

volovyks opened this issue Nov 26, 2024 · 5 comments
Assignees
Labels
Emerging Tech Emerging Tech flying formation at Pagoda Near BOS NEAR BOS team at Pagoda

Comments

@volovyks
Copy link
Collaborator

Description

image

Such behavior was explored on Testnet and Mainnet. It can lead to failures in all protocols.

@volovyks volovyks added Near BOS NEAR BOS team at Pagoda Emerging Tech Emerging Tech flying formation at Pagoda labels Nov 26, 2024
@volovyks
Copy link
Collaborator Author

PS, it is a modified Dashboard, I will add it soon.

@auto-mausx
Copy link
Collaborator

So I did notice this started when we moved our node over, I'm not sure if the fact that our node is technically running on a shorter timeframe than the others since we destroyed our node and rebuilt it. I attributed it to that, so perhaps it is the way the metric is exported.

Just for clarity sake, this node is the exact same machine size, disk size, and networking configuration as the rest of the partner nodes. I mirrored the environment from Pagoda 1 for 1 just to avoid any issues.

@auto-mausx
Copy link
Collaborator

Here's my theory:

This line of code controls the increment of that metric count

crate::metrics::PROTOCOL_ITER_CNT
                .with_label_values(&[my_account_id.as_str()])
                .inc();

I hypothesize that grafana calculates the rate per hour (increase()) by dividing the total count by 60 mins. So since our node is "newer" than the other nodes, there will be significant difference between the total number of iterations from all other nodes to this node. There are months of iterations on the other nodes, and we only have about 27 days worth of iterations.

That is also the reason the other nodes are not exactly aligned with each other, since it took about a week for all of our partners to update.

@volovyks
Copy link
Collaborator Author

Let's see how it will behave after the release. I hope increase means how many new iterations happened in the last hour.

@auto-mausx
Copy link
Collaborator

That is what the docs says it means, so maybe we do have an issue. I am not sure what that may be though.

https://prometheus.io/docs/prometheus/latest/querying/functions/#increase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Emerging Tech Emerging Tech flying formation at Pagoda Near BOS NEAR BOS team at Pagoda
Projects
Status: Backlog
Development

No branches or pull requests

3 participants