Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,admission: per tenant metrics in the shared kv/storage server #136030

Open
sumeerbhola opened this issue Nov 22, 2024 · 0 comments
Open
Labels
A-admission-control A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Nov 22, 2024

We seem to lack per-tenant metrics in the shared server. This hinders troubleshooting.
Examples of useful metrics:

  • Admission WorkQueue latency histograms per tenant
  • Intent resolution rate per tenant
  • TODO: add more here

Per-tenant metrics need to account for the fact that over the lifetime of a shared server process it may see 1000s of active tenants. Based on past experience with multi-tenant systems, the set of active tenants is often in the 10s (on a server). The approach that works is to have in-memory metric storage only for the active ones, and hence only those get exported as timeseries. This results in 1000s of timeseries over a long interval (say a 30d interval) but for the interval of interest (say an hour) the number of timeseries is small. Some approaches I have seen in the past:

  • Gauge metrics: In-memory state for points with a 0 value is GC'd.
  • Cumulative metrics: One of two approaches:
    • GC cumulative points that are not changing.
    • Export as deltas: Deltas that are 0 are GC'd and not exported.

cc: @dhartunian

Jira issue: CRDB-44829

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-kv Anything in KV that doesn't belong in a more specific category. A-admission-control labels Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
Projects
None yet
Development

No branches or pull requests

1 participant