This document gives the best practices on how to use histograms in code and how to document the histograms for the dashboards. There are three general types of histograms: enumerated histograms, count histograms (for arbitrary numbers), and sparse histograms (for anything when the precision is important over a wide range and/or the range is not possible to specify a priori).
[TOC]
Measure exactly what you want, whether that's the time used for a function call, the number of bytes transmitted to fetch a page, the number of items in a list, etc. Do not assume you can calculate what you want from other histograms, as most ways of doing this are incorrect.
For example, suppose you want to measure the runtime of a function that just calls two subfunctions, each of which is instrumented with histogram logging. You might assume that you can simply sum the histograms for those two functions to get the total time, but that results in misleading data. If we knew which emissions came from which calls, we could pair them up and derive the total time for the function. However, histograms are pre-aggregated client-side, which means that there's no way to recover which emissions should be paired up. If you simply add up the two histograms to get a total duration histogram, you're implicitly assuming the two histograms' values are independent, which may not be the case.
Directly measure what you care about; don't try to derive it from other data.
When defining a new metric, think ahead about how you will analyze the data. Often, this will require providing context in order for the data to be interpretable.
For enumerated histograms in particular, that often means including a bucket that can be used as a baseline for understanding the data recorded to other buckets: see the enumerated histogram section.
Histograms are taxonomized into categories, using dot (.
) characters as
separators. Thus, histogram names should be in the form Category.Name or
Category.Subcategory.Name, etc., where each category organizes related
histograms.
It should be quite rare to introduce new top-level categories into the existing taxonomy. If you're tempted to do so, please look through the existing categories to see whether any matches the metric(s) that you are adding. To create a new category, the CL must be reviewed by [email protected].
Prefer the helper functions defined in histogram_functions.h. These functions take a lock and perform a map lookup, but the overhead is generally insignificant. However, when recording metrics on the critical path (e.g. called in a loop or logged multiple times per second), use the macros in histogram_macros.h instead. These macros cache a pointer to the histogram object for efficiency, though this comes at the cost of increased binary size: 130 bytes/macro usage sounds small but quickly adds up.
These logging macros and functions have long names and sometimes include extra parameters (defining the number of buckets for example). Use a helper function if possible. This leads to shorter, more readable code that's also more resilient to problems that could be introduced when making changes. (One could, for example, erroneously change the bucketing of the histogram in one call but not the other.)
When using histogram macros (calls such as UMA_HISTOGRAM_ENUMERATION
), you're
not allowed to construct your string dynamically so that it can vary at a
callsite. At a given callsite (preferably you have only one), the string
should be the same every time the macro is called. If you need to use dynamic
names, use the functions in histogram_functions.h instead of the macros.
If you must use the histogram name in multiple places, use a compile-time constant of appropriate scope that can be referenced everywhere. Using inline strings in multiple places can lead to errors if you ever need to revise the name and you update one one location and forget another.
Generally, don't be concerned about the processing cost of emitting to a histogram (unless you're using sparse histograms). The normal histogram code is highly optimized. If you are recording to a histogram in particularly performance-sensitive or "hot" code, make sure you're using the histogram macros; see reasons above.
Enumerated histogram are most appropriate when you have a list of connected / related states that should be analyzed jointly. For example, the set of actions that can be done on the New Tab Page (use the omnibox, click a most visited tile, click a bookmark, etc.) would make a good enumerated histogram. If the total count of your histogram (i.e. the sum across all buckets) is something meaningful—as it is in this example—that is generally a good sign. However, the total count does not have to be meaningful for an enum histogram to still be the right choice.
Enumerated histograms are also appropriate for counting events. Use a simple boolean histogram. It's usually best if you have a comparison point in the same histogram. For example, if you want to count pages opened from the history page, it might be a useful comparison to have the same histogram record the number of times the history page was opened.
In rarer cases, it's okay if you only log to one bucket (say, true
). However,
think about whether this will provide enough context. For
example, suppose we want to understand how often users interact with a button.
Just knowing that users clicked this particular button 1 million times in a day
is not very informative on its own: The size of Chrome's user base is constantly
changing, only a subset of users have consented to metrics reporting, different
platforms have different sampling rates for metrics reporting, and so on. The
data would be much easier to make sense of if it included a baseline: how often
is the button shown?
If only a few buckets are emitted to, consider using a sparse histogram.
Enums logged in histograms must:
- be prefixed with the comment:
// These values are persisted to logs. Entries should not be renumbered and // numeric values should never be reused.
- be numbered starting from
0
. Note this bullet point does not apply for enums logged with sparse histograms. - have enumerators with explicit values (
= 0
,= 1
,= 2
) to make it clear that the actual values are important. This also makes it easy to match the values between the C++/Java definition and histograms.xml. - not renumber or reuse enumerator values. When adding a new enumerator, append the new enumerator to the end. When removing an unused enumerator, comment it out, making it clear the value was previously used.
If your enum histogram has a catch-all / miscellaneous bucket, put that bucket
first (= 0
). This makes the bucket easy to find on the dashboard if additional
buckets are added later.
In C++, define an enum class
with a kMaxValue
enumerator:
enum class NewTabPageAction {
kUseOmnibox = 0,
kClickTitle = 1,
// kUseSearchbox = 2, // no longer used, combined into omnibox
kOpenBookmark = 3,
kMaxValue = kOpenBookmark,
};
kMaxValue
is a special enumerator that must share the highest enumerator
value, typically done by aliasing it with the enumerator with the highest
value: clang automatically checks that kMaxValue
is correctly set for enum class
.
The histogram helpers use the kMaxValue
convention, and the enum may be
logged with:
UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action);
or:
UmaHistogramEnumeration("NewTabPageAction", action);
Logging histograms from Java should look similar:
// These values are persisted to logs. Entries should not be renumbered and
// numeric values should never be reused.
@IntDef({NewTabPageAction.USE_OMNIBOX, NewTabPageAction.CLICK_TITLE,
NewTabPageAction.OPEN_BOOKMARK})
private @interface NewTabPageAction {
int USE_OMNIBOX = 0;
int CLICK_TITLE = 1;
// int USE_SEARCHBOX = 2; // no longer used, combined into omnibox
int OPEN_BOOKMARK = 3;
int COUNT = 4;
}
// Using a helper function is optional, but avoids some boilerplate.
private static void logNewTabPageAction(@NewTabPageAction int action) {
RecordHistogram.recordEnumeratedHistogram(
"NewTabPageAction", action, NewTabPageAction.COUNT);
}
Note: this method of defining histogram enums is deprecated. Do not use this for new enums in C++.
Many legacy enums define a kCount
sentinel, relying on the compiler to
automatically update it when new entries are added:
enum class NewTabPageAction {
kUseOmnibox = 0,
kClickTitle = 1,
// kUseSearchbox = 2, // no longer used, combined into omnibox
kOpenBookmark = 3,
kCount,
};
These enums must be recorded using the legacy helpers:
UMA_HISTOGRAM_ENUMERATION("NewTabPageAction", action, NewTabPageAction::kCount);
or:
UmaHistogramEnumeration("NewTabPageAction", action, NewTabPageAction::kCount);
When adding a new flag in
about_flags.cc, you need to add a
corresponding entry to enums.xml. This is automatically verified
by the AboutFlagsHistogramTest
unit test.
To add a new entry:
- Edit enums.xml, adding the feature to the
LoginCustomFlags
enum section, with any unique value (just make one up, although whatever it is needs to appear in sorted order;pretty_print.py
can do this for you). - Build
unit_tests
, then rununit_tests --gtest_filter='AboutFlagsHistogramTest.*'
to compute the correct value. - Update the entry in enums.xml with the correct value, and move
it so the list is sorted by value (
pretty_print.py
can do this for you). - Re-run the test to ensure the value and ordering are correct.
You can also use tools/metrics/histograms/validate_format.py
to check the
ordering (but not that the value is correct).
Don't remove entries when removing a flag; they are still used to decode data from previous Chrome versions.
histogram_macros.h provides macros for some common count types such as memory or elapsed time, in addition to general count macros. These have reasonable default values; you seldom need to choose the number of buckets or histogram min. However, you still need to choose the histogram max (use the advice below).
If none of the default macros work well for you, please thoughtfully choose a min, max, and bucket count for your histogram using the advice below.
For histogram max, choose a value such that very few emissions to the histogram exceed the max. If a metric emission is above the max value, it will get put into an "overflow" bucket. If this bucket is too large, it can be difficult to compute statistics. One rule of thumb is at most 1% of samples should be in the overflow bucket (and ideally, less). This allows analysis of the 99th percentile. Err on the side of too large a range versus too short a range. (Remember that if you choose poorly, you'll have to wait for another release cycle to fix it.)
For histogram min, if you care about all possible values (zero and above), choose a min of 1. All histograms have an underflow bucket for emitted zeros, so a min of 1 is appropriate. Otherwise, choose the min appropriate for your particular situation.
Choose the smallest number of buckets that give you the granularity you need. By default, count histogram bucket sizes scale exponentially so you can get fine granularity when the numbers are small yet still reasonable resolution for larger numbers. The macros default to 50 buckets (or 100 buckets for histograms with wide ranges), which is appropriate for most purposes. Because histograms pre-allocate all the buckets, the number of buckets selected directly dictates how much memory is used. Do not exceed 100 buckets without good reason (and consider whether sparse histograms might work better for you in that case—they do not pre-allocate their buckets).
You can easily emit a time duration (time delta) using UMA_HISTOGRAM_TIMES, UMA_HISTOGRAM_MEDIUM_TIMES, UMA_HISTOGRAM_LONG_TIMES macros, and their friends, as well as helpers like SCOPED_UMA_HISTOGRAM_TIMER. Many timing histograms are used for performance monitoring; if this is the case for you, please read this document about how to structure timing histograms to make them more useful and actionable.
You can easily emit a percentage histogram using the UMA_HISTOGRAM_PERCENTAGE macro provided in histogram_macros.h. You can also easily emit any ratio as a linear histogram (for equally sized buckets).
For such histograms, you want each value recorded to cover approximately the same span of time. This typically means emitting values periodically at a set time interval, such as every 5 minutes. We do not recommend recording a ratio at the end of a video playback, as video lengths vary greatly.
It is okay to emit at the end of an animation sequence when what's being animated is fixed / known. In this case, each value represents roughly the same span of time.
Why? You typically cannot make decisions based on histograms whose values are recorded in response to an event that varies in length because such metrics can conflate heavy usage with light usage. It's easier to reason about metrics that avoid this source of bias.
Many developers have been bitten by this. For example, it was previously common to emit an actions-per-minute ratio whenever Chrome was backgrounded. Precisely, these metrics computed the number of uses of a particular action during a Chrome session, divided by length of time Chrome had been open. Sometimes, the recorded rate was based on a short interaction with Chrome–a few seconds or a minute. Other times, the recorded rate was based on a long interaction, tens of minutes or hours. These two situations are indistinguishable in the UMA logs–the recorded values can be identical.
The inability to distinguish these two qualitatively different settings make such histograms effectively uninterpretable and not actionable. Emitting at a regular interval avoids the issue. Each value represents the same amount of time (e.g., one minute of video playback).
Histograms can be added via Local macros. These still record locally, but are not uploaded to UMA and are therefore not available for analysis. This can be useful for metrics only needed for local debugging. We don't recommend using local histograms outside of that scenario.
It is common to be interested in logging multidimensional data–where multiple pieces of information need to be logged together. For example, a developer may be interested in the counts of features X and Y based on whether a user is in state A or B. In this case, they want to know the count of X under state A, as well as the other three permutations.
There is no general purpose solution for this type of analysis. We suggest using the workaround of using an enum of length MxN, where you log each unique pair {state, feature} as a separate entry in the same enum. If this causes a large explosion in data (i.e. >100 enum entries), a sparse histogram may be appropriate. If you are unsure of the best way to proceed, please contact someone from the OWNERS file.
Histogram expiry is specified by the expires_after
attribute in histogram
descriptions in histograms.xml. The attribute can be specified as date in
YYYY-MM-DD format or as Chrome milestone in M*(e.g. M68) format. In the
latter case, the actual expiry date is about 12 weeks after that branch is cut,
or basically when it is replaced on the "stable" channel by the following
release.
After a histogram expires, it ceases to be displayed on the dashboard. Follow these directions to extend it.
Once a histogram has expired, the code that records it becomes dead code and should be removed from the codebase along with marking the histogram definition as obsolete.
In rare cases, the expiry can be set to "never". This is used to denote metrics of critical importance that are, typically, used for other reports. For example, all metrics of the "heartbeat" are set to never expire. All metrics that never expire must have an XML comment describing why so that it can be audited in the future. Setting an expiry to "never" must be reviewed by [email protected].
<!-- expires-never: "heartbeat" metric (internal: go/uma-heartbeats) -->
For all new histograms, the use of expiry attribute is strongly encouraged and enforced by the Chrome Metrics team through reviews.
If you are adding a histogram to evaluate a feature launch, set an expiry date consistent with the expected feature launch date. Otherwise, we recommend choosing 3-6 months.
Here are some guidelines for common scenarios:
- If the listed owner moved to different project, find a new owner.
- If neither the owner nor the team uses the histogram, remove it.
- If the histogram is not in use now, but might be useful in the far future, remove it.
- If the histogram is not in use now, but might be useful in the near future, pick ~3 months or ~2 milestones ahead.
- If the histogram is actively in use now and is useful in the short term, pick 3-6 months or 2-4 milestones ahead.
- If the histogram is actively in use and seems useful for an indefinite time, pick 1 year.
We also have a tool that automatically extends expiry dates. The 80% more frequently accessed histograms are pushed out every Tuesday, to 6 months from the date of the run. Googlers can view the design doc.
You can revive an expired histogram by setting the expiration date to a date in the future.
There's some leeway here. A client may continue to send data for that histogram for some time after the official expiry date so simply bumping the 'expires_after' date at HEAD may be sufficient to resurrect it without any data discontinuity.
If a histogram expired more than a month ago (for histograms with an expiration date) or more than one milestone ago (for histograms with expiration milestones; this means top-of-tree is two or more milestones away from expired milestone), then you may be outside the safety window. In this case, when extending the histogram add to the histogram description a message: "Warning: this histogram was expired from DATE to DATE; data may be missing." (For milestones, write something similar.)
When reviving a histogram outside the safety window, realize the change to histograms.xml to revive it rolls out with the binary release. It takes some time to get to the stable channel.
It you need to revive it faster, the histogram can be re-enabled via adding to the expired histogram allowlist.
The expired histogram notifier notifies histogram owners before their histograms expire by creating crbugs, which are assigned to owners. This allows owners to extend the lifetime of their histograms, if needed, or deprecate them. The notifier regularly checks all histograms across the histograms.xml files and identifies expired or soon-to-be expired histograms. It then creates or updates crbugs accordingly.
If a histogram expires but turns out to be useful, you can add the histogram's name to the allowlist until the updated expiration date reaches the stable channel. When doing so, update the histogram's summary to document the period during which the histogram's data is incomplete. To add a histogram to the allowlist, see the internal documentation: Histogram Expiry.
Test your histograms using chrome://histograms
. Make sure they're being
emitted to when you expect and not emitted to at other times. Also check that
the values emitted to are correct. Finally, for count histograms, make sure
that buckets capture enough precision for your needs over the range.
Pro tip: You can filter the set of histograms shown on chrome://histograms
by
specifying a prefix. For example, chrome://histograms/Extensions.Load
shows
only histograms whose names match the pattern "Extensions.Load*".
In addition to testing interactively, you can have unit tests examine the values emitted to histograms. See histogram_tester.h for details.
The top of go/uma-guide has good advice on how to go about analyzing and interpreting the results of UMA data uploaded by users. If you're reading this page, you've probably just finished adding a histogram to the Chromium source code and you're waiting for users to update their version of Chrome to a version that includes your code. In this case, the best advice is to remind you that users who update frequently / quickly are biased. Best take the initial statistics with a grain of salt; they're probably mostly right but not entirely so.
When changing the semantics of a histogram (when it's emitted, what the buckets represent, the bucket range or number of buckets, etc.), create a new histogram with a new name. Otherwise analysis that mixes the data pre- and post- change may be misleading. If the histogram name is still the best name choice, the recommendation is to simply append a '2' to the name. See Cleaning Up Histogram Entries for details on how to handle the XML changes.
Please delete code that emits to histograms that are no longer needed. Histograms take up memory. Cleaning up histograms that you no longer care about is good! But see the note below on Cleaning Up Histogram Entries.
Document histograms in histograms.xml. There is also a google-internal version of the file for the rare case in which the histogram is confidential (added only to Chrome code, not Chromium code; or, an accurate description about how to interpret the histogram would reveal information about Google's plans).
If possible, please add the histograms.xml description in the same changelist in which you add the histogram-emitting code. This has several benefits. One, it sometimes happens that the histograms.xml reviewer has questions or concerns about the histogram description that reveal problems with interpretation of the data and call for a different recording strategy. Two, it allows the histogram reviewer to easily review the emission code to see if it comports with these best practices and to look for other errors.
Histogram descriptions should be roughly understandable to someone not familiar with your feature. Please add a sentence or two of background if necessary.
Note any caveats associated with your histogram in the summary. For example, if the set of supported platforms is surprising, such as if a desktop feature is not available on Mac, the summary should explain where it is recorded. It is also common to have caveats along the lines of "this histogram is only recorded if X" (e.g., upon a successful connection to a service, a feature is enabled by the user).
Histogram descriptions should clearly state when the histogram is emitted (profile open? network request received? etc.).
Some histograms record error conditions. These should be clear about whether all errors are recorded or only the first. If only the first, the histogram description should have text like:
In the case of multiple errors, only the first reason encountered is recorded. Refer
to Class::FunctionImplementingLogic() for details.
For enumerated histograms, including boolean and sparse histograms, provide an
enum=
attribute mapping enum values to semantically contentful labels. Define
the <enum>
in enums.xml if none of the existing enums are a good fit. Use
labels whenever they would be clearer than raw numeric values.
For non-enumerated histograms, include a units=
attribute. Be specific:
e.g. distinguish "MB" vs. "MiB", refine generic labels like "counts" to more
precise labels like "pages", etc.
Histograms need owners, who are the experts on the metric and the points of contact for any questions or maintenance tasks, such as extending a histogram's expiry or deprecating the metric.
Histograms must have a primary owner and may have secondary owners. A primary owner is a Googler with an @google.com or @chromium.org email address, e.g. [email protected], who is ultimately responsible for maintaining the metric. Secondary owners may be other individuals, team mailing lists, e.g. [email protected], or paths to OWNERS files, e.g. src/directory/OWNERS.
It's a best practice to list multiple owners, so that there's no single point of failure for histogram-related questions and maintenance tasks. If you are using a metric heavily and understand it intimately, feel free to add yourself as an owner.
Notably, owners are asked to determine whether histograms have outlived their usefulness. When a histogram is nearing expiry, a robot files a reminder bug in Monorail. It's important that somebody familiar with the histogram notices and triages such bugs!
Tip: When removing someone from the owner list for a histogram, it's a nice courtesy to ask them for approval.
Histograms may be associated with components, which can help make sure that histogram expiry bugs don't fall through the cracks.
There are two ways in which components may be associated with a histogram. The first and recommended way is to add a tag to a histogram or histogram suffix, e.g. UI>Shell. The second way is to specify an OWNERS file as a secondary owner for a histogram. If the OWNERS file has an adjacent DIR_METADATA file that contains a component, then that component is associated with the histogram. If there isn't a parallel DIR_METADATA file with such a component, but a parent directory has one, then the parent directory's component is used.
Do not delete histograms from histograms.xml files or move them to
obsolete_histograms.xml. Instead, mark unused histograms as obsolete and
annotate them with the date or milestone in the <obsolete>
tag entry. They
will later get moved to obsolete_histograms.xml via tooling.
If deprecating only some variants of a
patterned histogram, mark each deprecated <variant>
as obsolete as well. Similarly, if the histogram used histogram suffixes, mark
the suffix entry for the histogram as obsolete.
If the histogram is being replaced by a new version:
-
Note in the
<obsolete>
message the name of the replacement histogram. -
Make sure the descriptions of the original and replacement histogram are different. It's never appropriate for them to be identical. Either the old description was wrong, and it should be revised to explain what it actually measured, or the old histogram was measuring something not as useful as the replacement, in which case the new histogram is measuring something different and needs to have a new description.
A changelist that marks a histogram as obsolete should be reviewed by all current owners.
Deleting histogram entries would be bad if someone accidentally reused your old histogram name and thereby corrupted new data with whatever old data is still coming in. It's also useful to keep obsolete histogram descriptions in histograms.xml—that way, if someone is searching for a histogram to answer a particular question, they can learn if there was a histogram at some point that did so even if it isn't active now.
Exception: It is ok to delete the metadata for any histogram that has never been recorded to. For example, it's fine to correct a typo where the histogram name in the metadata does not match the name in the Chromium source code.
It is sometimes useful to record several closely related metrics, which measure the same type of data, with some minor variations. You can declare the metadata for these concisely using patterned histograms. For example:
<histogram name="Pokemon.{Character}.EfficacyAgainst{OpponentType}"
units="multiplier" expires_after="M95">
<owner>[email protected]</owner>
<owner>[email protected]</owner>
<summary>
The efficacy multiplier for {Character} against an opponent of
{OpponentType} type.
</summary>
<token key="Character">
<variant name="Bulbasaur"/>
<variant name="Charizard"/>
<variant name="Mewtwo"/>
</token>
<token key="OpponentType">
<variant name="Dragon" summary="dragon"/>
<variant name="Flying" summary="flappity-flap"/>
<variant name="Psychic" summary="psychic"/>
<variant name="Water" summary="water"/>
</token>
</histogram>
This example defines metadata for 12 (= 3 x 4) concrete histograms, such as
<histogram name="Pokemon.Charizard.EfficacyAgainstWater"
units="multiplier" expires_after="M95">
<owner>[email protected]</owner>
<owner>[email protected]</owner>
<summary>
The efficacy multiplier for Charizard against an opponent of water type.
</summary>
</histogram>
Note that each token <variant>
defines what text should be substituted for it,
both in the histogram name and in the summary text. As shorthand, a <variant>
that omits the summary
attribute substitutes the value of the name
attribute
in the histogram's <summary>
text as well.
*** promo
Tip: You can declare an optional token by listing an empty name: <variant name="" summary="aggregated across all breakdowns"/>
. This can be useful when
recording a "parent" histogram that aggregates across a set of breakdowns.
You can use the <variants>
tag to define a set of <variant>
s out-of-line.
This is useful for token substitutions that are shared among multiple families
of histograms. See
histograms.xml
for examples.
By default, a <variant>
inherits the owners declared for the patterned
histogram. Each variant can optionally override the inherited list with custom
owners:
<variant name="SubteamBreakdown" ...>
<owner>[email protected]</owner>
<owner>[email protected]</owner>
</variant>
As with histogram entries, never delete
variants. If the variant expansion is no longer used, mark it as <obsolete>
.
*** promo
Tip: You can run print_expanded_histograms.py --pattern=
to show all generated
histograms by patterned histograms or histogram suffixes including their
summaries and owners. For example, this can be run (from the repo root) as:
./tools/metrics/histograms/print_expanded_histograms.py --pattern=^UMA.A.B
*** promo
Tip: You can run print_histogram_names.py --diff
to enumerate all the
histogram names that are generated by a particular CL. For example, this can be
run (from the repo root) as:
./tools/metrics/histograms/print_histogram_names.py --diff origin/main
For documentation about the <histogram_suffixes>
syntax, which is deprecated,
see
https://chromium.googlesource.com/chromium/src/+/refs/tags/87.0.4270.1/tools/metrics/histograms/one-pager.md#histogram-suffixes-deprecated-in-favor-of-pattern-histograms
Sparse histograms are well-suited for recording counts of exact sample values that are sparsely distributed over a large range. They can be used with enums as well as regular integer values. It is often valuable to provide labels in enums.xml.
The implementation uses a lock and a map, whereas other histogram types use a vector and no lock. It is thus more costly to add values to, and each value stored has more overhead, compared to the other histogram types. However it may be more efficient in memory if the total number of sample values is small compared to the range of their values.
Please talk with the metrics team if there are more than a thousand possible different values that you could emit.
For more information, see sparse_histograms.h.
Any Chromium committer who is also a Google employee is eligible to become a metrics reviewer. Please follow the instructions at go/reviewing-metrics. This consists of reviewing our training materials and passing an informational quiz. Since metrics have a direct impact on internal systems and have privacy considerations, we're currently only adding Googlers into this program.
If you are a metric OWNER, you have the serious responsibility of ensuring Chrome's data collection is following best practices. If there's any concern about an incoming metrics changelist, please escalate by assigning to [email protected].
When reviewing metrics CLs, look at the following, listed in approximate order of importance:
Does anything tickle your privacy senses? (Googlers, see go/uma-privacy for guidelines.)
Please escalate if there's any doubt!
Is the metadata clear enough for all Chromies to understand what the metric is recording? Consider the histogram name, description, units, enum labels, etc.
It's really common for developers to forget to list when the metric is recorded. This is particularly important context, so please remind developers to clearly document it.
Note: Clarity is a bit less important for very niche metrics used only by a couple of engineers. However, it's hard to assess the metric design and correctness if the metadata is especially unclear.
- Does the metric definition make sense?
- Will the resulting data be interpretable at analysis time?
Is the histogram being recorded correctly?
-
Does the bucket layout look reasonable?
-
The metrics APIs like base::UmaHistogram* have some sharp edges, especially for the APIs that require specifying the number of buckets. Check for off-by-one errors and unused buckets.
-
Is the bucket layout efficient? Typically, push back if there are >50 buckets -- this can be ok in some cases, but make sure that the CL author has consciously considered the tradeoffs here and is making a reasonable choice.
-
For timing metrics, do the min and max bounds make sense for the duration that is being measured?
-
-
The base::UmaHistogram* functions are generally preferred over the UMA_HISTOGRAM_* macros. If using the macros, remember that names must be runtime constants!
Also, related to clarity: Does the client logic correctly implement the metric described in the XML metadata? Some common errors to watch out for:
-
The metric is only emitted within an if-stmt (e.g., only if some data is available) and this restriction isn't mentioned in the metadata description.
-
The metric description states that it's recorded when X happens, but it's actually recorded when X is scheduled to occur, or only emitted when X succeeds (but omitted on failure), etc.
When the metadata and the client logic do not match, the appropriate solution might be to update the metadata, or it might be to update the client logic. Guide this decision by considering what data will be more easily interpretable and what data will have hidden surprises/gotchas.
-
Is the CL adding a reasonable number of metrics/buckets?
- When reviewing a CL that is trying to add many metrics at once, guide the CL author toward an appropriate solution for their needs. For example, multidimensional metrics can be recorded via UKM, and we are currently building support for structured metrics in UMA.
- There's no hard rule, but anything above 20 separate histograms should be escalated by being assigned to [email protected].
- Similarly, any histogram with more than 100 possible buckets should be escalated by being assigned to [email protected].
-
Are expiry dates being set appropriately?
This document describes many other nuances that are important for defining and recording useful metrics. Check CLs for these other types of issues as well.
And, as you would with a language style guide, periodically re-review the doc to stay up to date on the details.
When working with histograms.xml, verify whether you require fully expanded OWNERS files. Many scripts in this directory process histograms.xml, and sometimes OWNERS file paths are expanded and other times they are not. OWNERS paths are expanded when scripts make use of merge_xml's function MergeFiles; otherwise, they are not.