OpenTelemetry Starter Pack NDC {Oslo} 2024
- Docker is running? Run
docker ps
. You should not get any errors-error during connect
means Rancher Desktop/Podman Desktop/Docker Desktop has not been starterd
- Docker-Compose is installed? Run
Docker-Compose -v
- Dotnet is installed?
dotnet --version
. This should display the version.
- Start by running
docker compose up
add-d
to run detatched (just start it without displaying all logs) - To clean up any docker container run
docker-compose down
from this folder.
Run it with dotnet run
or inside an IDE. This will give you more about URLs you can visit.
Go to localhost:8080/test
- Tool for building and running containers, e.g. Docker Desktop, Podman or Rancher Desktop - Including Compose.
- Dotnet 8
flowchart LR
A["Application(example)"] --> B("OpenTelemetry Collector \n :4317 (gRPC)")
H["dependency1"] --> B
I["dependency2"] --> B
J["dependency3"] --> B
B --> D("Loki \n :3100") --> G("Grafana")
B --> E("Tempo \n :3200") --> G
B --> F(Prometheus \n :9090)--> G
K(k6)-->|remote write|F
style A fill:red
style B fill:green
style H fill:blue
style I fill:blue
style J fill:blue
style D fill:yellow
style E fill:orange
style F fill:cyan
style G fill:purple
- Your application, called example, send data directly to the OTEl collector over gRPC
- The "legacy" dependencies use autoinstrumentation and do the same
- Prometheus scrapes data from the OTEL collector
- Collector writes data to Loki and Tempo
- Grafana uses the 3 sources to display data
- k6 writes to prometheus using remote write
Versions are defined in .env
Note 09.06.2024: Tempo is currently running version 2.4.2. The latest version 2.5.0 have breaking changes related to the ownership of Ownership of /var/tempo
. This was causing issues. Refer to https://github.com/grafana/tempo/releases/tag/v2.5.0 for more info.
Grafana has release a simplified setup with a single container, This is found here: https://github.com/grafana/docker-otel-lgtm/.
There are 2 options for setting up OpenTelemetry in .NET applications.
- Setup with code with the option of using both manual and automatic instrumentation
- No-code automatic setup of automatic instrumentation. Manual instrumentation cannot be added
The example app uses this setup. Refer to SetupOpentelemetry to see how this may be done. For more infomation and examples refer to
The self-paced tasks focus on the manual setup.
This setup has include 3 containers (dependency1..3) with automatic instrumentation. The 2 main solutions for doing this is:
- Download and run
otel-dotnet-auto-install.sh
orOpenTelemetry.DotNet.Auto.psm1
or - Include Nuget
OpenTelemetry.AutoInstrumentation
We have used the latter approch. Refer to dependency1 docker-compose.yaml for an example. This example includes ENV variables for easier debugging.
For more information refer to:
.
├── exampleAPI.http <- To run HTTP command. An alternative to using Swagger or browser
├── MapRoutesExtensions.cs <- Sets up the routes
├── SetupOpentelemetry.cs <- All OpenTelemetry setup for Logging, Tracing and Metrics
└── Program.cs... <- All the normal stuff
Startup app and go to http://localhost:5000/
Grafana shows the data using 3 data sources:
- Tempo for tracing
- Prometheus for metrics
- Loki for Logging
PS: Message Failed to authenticate request
might appear. This should not have any impact, but is noisy. Go to Sign in
in the top right corner and sign in with user and password, admin
and admin
Before looking at data, you need to populate some data. You can do that by running some of the HTTP requests in:
Then open Grafana on localhost:3000
flowchart LR
A["Grafana Home"] --> B("Explore") --> C("Select source 'Tempo' - Default data source is 'Loki'") --> D("Select 'Search' - Default is 'TraceQL'")
style A fill:blue
style B fill:Yellow
style C fill:Green
style D fill:Red
Read more about TraceQL here: https://grafana.com/docs/tempo/latest/traceql/
Should look something like this:
flowchart LR
A["Grafana Home"] --> B("Explore") --> C("Select source 'Prometheus' - Default data source is 'Loki'") --> D("Metrics browser'")
style A fill:blue
style B fill:Yellow
style C fill:Green
style D fill:Red
Prometheus uses PromQL as a query language. Here are some examples: https://prometheus.io/docs/prometheus/latest/querying/examples/
You should be able to run http_client_request_duration_seconds_count{}
. Send HTTP requests e.g. from exampleAPI.http
Should look something like this:
flowchart LR
A["Grafana Home"] --> B("Explore") --> C("Default data source is 'Loki'") --> D("Add a LogQL")
style A fill:blue
style B fill:Yellow
style C fill:Green
style D fill:Red
Loki uses LogQL. Refer to https://grafana.com/docs/loki/latest/query/.
You can start by adding LogQL: {exporter="OTLP"}
. This will show all log records exported by the OpenTelemtry Collector
Should look something like this:
- Start the infrastructure as in this section
- Start the
ExampleApi
- Verify that everything in up and running with http://localhost:8080/test
- Open the webpage in http://localhost:5000/
- Go to Loki last 5 min
- Confirm that Service name has not been set. It should be
unknown_service:dotnet
(at this stage) - Understand the basics for the setup. Refer to SetupOpentelemetry.cs
- Add
"OTEL_SERVICE_NAME": "exampleApiSetInEnv"
to env e.g. here: launchSettings.json - Uncomment the code in for configure resources in SetupOpentelemetry.cs
- Did it work as expected? How to fix it?
- Add
- Verify that
ExampleApi
is also sending Metrics and Traces the OpenTelemetry Collector, by checking Prometheus and Tempo in Grafana. Refer tosection Grafana and some fundamentals for viewing data
- Locate the tracing RequestFilter in SetupOpentelemetry.cs
- Send request to the http://localhost:5000/remove
- Update the request filter (input to SetupOpentelemetry) to remove this endpoint
- Verify in Tempo that you succeeded.
- send requests to the APIs
- It is easy understand what is causing the delay? Check if you understand it in
Tempo
- Find out why tracing is missing in
ThisNeedsToBeTraced
. Note that you need to start the activity, not only create it.
- Send requests to
- Become familiar with the different ways of tracing errors. The mapping is done in MapRoutesExtensions.cs
- Look at the trace in
Tempo
- I have added a custom metric to
SuperServiceWithMetrics
SuperServiceWithMetrics.cs - Use the endpoint http://localhost:5000/metric/inc/10 to increment the custom metric.
- Are you seeing any data in Grafana dashboard? No? Why?
- PS: the counter
super_service_counter
produces the metricsuper_service_counter_total
- Verify that you can track the metric from the OpenTelemetry collector to Grafana-
- Find the metric in the output of the Open Telemetry Collector http://localhost:8889/metrics
- Find the metric in prometheus: http://localhost:9090/graph
- Find the metric in Grafana: http://localhost:3000/explore http://localhost:3000/explore
- There is a bug in the custom metrics counter? Can you spot it? Fix it.
- Look into: AddAspNetCoreInstrumentation. It is called in SetupOpentelemetry.cs
- You can also see it here: AspNetCoreInstrumentationMeterProviderBuilderExtensions.cs
- Read more about core metrics https://learn.microsoft.com/en-us/dotnet/core/diagnostics/built-in-metrics-aspnetcore
- Generally, It is a bit hard to keep track which metrics are added. Use what you learned from task M1 to see what data is actually present.
- The metrics have more than once broken the dashboards.
- Add another counter to the setup
- Verify that this has been added to your metrics
- Open /parallel
- Open Loki with LogQL {exporter="OTLP"}
- Get to know the log record. Some key fields are:
- body
- severety
- attributes
- resources
- instrumentation scope name
- Get to know the log record. Some key fields are:
- Open /metric/inc/10
- How did instrumentation scope change?
- Stop the application
- Look at the log record with body "Application is shutting down...". Does it have spanid and traceid? Why?
- PS: LogQL
{exporter="OTLP"} | json | line_format "{{.body}}"
gives a clean prinout of log record body. Try this. Observe how the log records are flattended
- Send request to http://localhost:5000/parallel
- Go to log lines created during this request processing, and find the trace ID
- Go to the trace in Tempo.
- Go to SuperService.cs, and add
logger.BeginScope
- What do you see in Loki. What and where is the scope state added in the log record?
- Advanced/optional:
- Try to duplicate the scope call
- Do you see the duplicated entry?
- What do you get if you mutate the value (and not the key)
logger.BeginScope
? - Look at the log record in the custom processer in SetupOpentelemetry.cs
- And look in he OtlpLogRecordTransformer. Can you find the duplicated attribute key?
- Go to SetupOpentelemetry.cs, and set
IncludeFormattedMessage = false
in the configuration of the logging. - Open /parallel
- Go to Loki and observe how the body changes.
- Install k6. E.g. by running this https://dl.k6.io/msi/k6-latest-amd64.msi
- Docs: https://k6.io/docs/
- Run the test with
K6_PROMETHEUS_RW_SERVER_URL=http://localhost:9090/api/v1/write K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true k6 run -o experimental-prometheus-rw test/script.js
- View the result in the dashboard called
k6 Prometheus
- Update SetupOpentelemetry.cs to use signal-specific (Logging, Trace, Metrics) setup of OTLP exporter.
- Add the debug exporter to the pipeline in the OTEL collector config
- traces
- Logging
- traces
You can e.g.
- Test with sampling https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/probabilisticsamplerprocessor/README.md
- Check out frontend instrumentation using Faro https://github.com/grafana/faro-web-sdk
- Replace the OTEL collector with Alloy https://grafana.com/docs/alloy/latest/
- Check out the .NET Aspire dashboard.
This setup originally is based on: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/examples/demo