-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: OpenTelemetry #111
Comments
There might be a way to build traceroute metrics with prometheus, but I'm not sure if I like it.
The following traceroute cli output would map to prometheus metrics like so: $ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 MyRouter (68.87.192.226) 0.345 ms
2 68.87.192.227 (68.87.192.227) 9.291 ms
3 dns.google (8.8.8.8) 9.231 ms
I'm not sure how easy it is to create a nice visualization from this though. Let me know what your thought on this are @lvlcn-t @puffitos @y-eight |
I think it is not a good idea. Especially when the route is changing often. The metrics do not provide a dependency to each other over time. Even if we would clear the last state, the issue is transported to the next layer (the Prometheus). Update: I am not 100% sure, it might be fine for a start. Clearing the state and expose the last traceroute as metrics. Especially it is easy to implement in contrast to OTEL. In the end we need to get feedback from the experts. |
thats what I thought too. We would be generating a lot of metrics with different labels with this. In the worst case we'd get |
Let's forget Prometheus for this one. The metrics will explode if we go that way. Let's try to make some simple API response available and then concentrate on offering OTEL traces, which can be visualised in the the preferred monitoring tool, which is grafana. |
@niklastreml Imo we shouldn't use prometheus metrics for something they're not intended/hardly capable of. If the traceroute check was the only check that would implement otel traces then we could consider building this workaround for prometheus metrics but since the other checks will also write their own traces we should focus on implementing OTel. @puffitos I don't know about forgetting prometheus completely for the traceroute check. I think we should still offer the amount of hops as prometheus metric for the traceroute check. |
When implementing the traces for the dns check, we should keep this in mind: #81 (comment) |
@lvlcn-t sorry, I communicated my intention incorrectly; we shouldn't use prometheus metrics for each trace, as that would make the metrics' cardinality rise quickly. We can use the metrics for number of hops to target, totalDuration, meanDuration and so on, if those metrics make any sense in a trace route scenario. I haven't used the check in a while, so I'm not exactly sure what's critical information and what's not. |
I've started to implement the first step of completing this issue in feat/otel-provider. I've added a otel provider setup depending on the user's (startup) configuration. The following configuration options will be available for the user (naming is not final): # Configures the telemetry exporter.
telemetry:
# The telemetry exporter to use.
# Options:
# grpc: Exports telemetry using OTLP via gRPC.
# http: Exports telemetry using OTLP via HTTP.
# stdout: Prints telemetry to stdout.
# noop | "": Disables telemetry.
exporter: grpc
# The address to export telemetry to.
url: localhost:4317
# The token to use for authentication.
# If the exporter does not require a token, this can be left empty.
token: ""
# The path to the tls certificate to use.
# To disable tls, either set this to an empty string or set it to insecure.
certPath: "" Since the OpenTelemetry Protocol (OTLP) is a standard, the user can choose any collector that supports this protocol. The user can also choose to use the stdout exporter to see the traces in the console (for debugging purposes) or the noop exporter to disable the telemetry. If the user chooses an external collector, the user can also provide a bearer token for authentication. The user can also provide a path to a tls certificate to use for secure communication. Next I'm wondering if I should already open the PR even though the actual traces are not yet implemented. This way we can split the issue into smaller parts and everyone can instrument one part of the codebase. What do you think? |
Is there an existing feature request for this?
Problem Description
The prometheus format is not sufficient for exporting metrics from the traceroute check, aside from the amount of hops. To provide more complex data, it might be necessary to support a second data format like opentelemetry.
Solution Description
Implement the OTEL library and inject it into the checks, so they can write their own traces. Traceroute can use this to collect more detailed data about how every single invocation of the check behaves, essentially allowing a user to visualize how packets move from sparrow to their target
Who can address the issue?
No response
Additional Context
https://github.com/open-telemetry/opentelemetry-go
https://opentelemetry.io/docs/languages/go/
https://www.jaegertracing.io/docs/1.54/client-libraries/#go
https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/traces/
The text was updated successfully, but these errors were encountered: