kvserver,admission: per tenant metrics in the shared kv/storage server #136030
Labels
A-admission-control
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
We seem to lack per-tenant metrics in the shared server. This hinders troubleshooting.
Examples of useful metrics:
Per-tenant metrics need to account for the fact that over the lifetime of a shared server process it may see 1000s of active tenants. Based on past experience with multi-tenant systems, the set of active tenants is often in the 10s (on a server). The approach that works is to have in-memory metric storage only for the active ones, and hence only those get exported as timeseries. This results in 1000s of timeseries over a long interval (say a 30d interval) but for the interval of interest (say an hour) the number of timeseries is small. Some approaches I have seen in the past:
cc: @dhartunian
Jira issue: CRDB-44829
The text was updated successfully, but these errors were encountered: