Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Implement Traceroute Check #82

Closed
1 task done
lvlcn-t opened this issue Jan 19, 2024 · 2 comments · Fixed by #113
Closed
1 task done

Feature: Implement Traceroute Check #82

lvlcn-t opened this issue Jan 19, 2024 · 2 comments · Fixed by #113
Assignees
Labels
area/checks Issues/PRs related to Checks request/internal Indicates an internal feature request
Milestone

Comments

@lvlcn-t
Copy link
Member

lvlcn-t commented Jan 19, 2024

Is there an existing feature request for this?

  • I have searched the existing issues

Problem Description

Features

We want to gather data similar to the output of traceroute. We want to declare a list of targets and sparrow should collect the hops through which the packets travel to reach that destination.

Metrics

For now, we want to collect the following metrics for every target:

  • num hops
  • path taken
    • Hop number
    • IP address

Solution Description

Features

Unlike traditional traceroute, the check will be using tcp. This is to avoid requiring root permissions or cap_net_raw. I'm proposing the config to look like this:

checks:
  traceroute:
    retry:
      delay: 10s
      count: 3
    timeout: 30s
    targets:
      - https://google.com
      - https://bing.com:80
      - https://myservice.com:12345

Metrics

Num Hops should be a simple counter:

For example: sparrow_traceroute_hops_count{target="https://google.com"} 12

The path taken is a bit more difficult, as we can't really convey the graph like nature of the data to prometheus. My suggestion is, we export these metrics in OpenTelemtry compatible format. We could use the otel sdk to create the metrics and then ship them of to a trace aggregator like jaeger. This makes it easy to adopt sparrow, as grafana already has a native jaeger datasource, so there would be no need for hacking together our own grafana datasource.

While we can't collect traces in prometheus, we can atleast link a timeseries to a trace using prometheus exemplars. This is not a requirement, but makes the UX nicer when viewing the data in grafana

@lvlcn-t lvlcn-t added request/internal Indicates an internal feature request area/checks Issues/PRs related to Checks labels Jan 19, 2024
@lvlcn-t lvlcn-t added this to the 0.4.0 milestone Jan 19, 2024
@niklastreml
Copy link
Contributor

niklastreml commented Jan 19, 2024

Did some investigation on this. There are a few ways to set TTL and such on a UDP packet.

  1. Use a raw socket through linux syscalls and implement it ourselves
  2. Use "golang.org/x/net/icmp". Does a fair amount of the work for us

Both options have the same caveat: we need permission to use a raw socket to do these kinds of low level operations. We can either run the binary as sudo or we need to grant the CAP_NET_RAW capability to the binary, so the kernel allows it to create raw sockets. This is something we need to keep in mind when deploying sparrow to i.e. kubernetes environments where we need to explicitly allow capabilities.

In case of kubernetes, this means adding the following to the securityContext:

securityContext:
  allowPrivilegeEscalation: true
  capabilities:
    add: ["NET_RAW"]

@niklastreml
Copy link
Contributor

Did some more research on this. We can use syscalls from golang.org/x/sys/unix to use the os's tcp/ip stack without any special permissions. the setsockopt syscall allows us to set things like the TTL on ip packet, allowing us to implement a traceroute like functionality. This shows how to open a tcp socket, but from that point on we have to figure the rest out. Once I'm familiar with which syscalls to use etc, I'll give implementing the check a shot

@lvlcn-t lvlcn-t linked a pull request Feb 21, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Issues/PRs related to Checks request/internal Indicates an internal feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants