Support elasticsearch rollover #68

pavolloffay · 2019-07-24T14:14:46Z

Support elasticearch rollover indices - we need to read data from jaeger-span-read and not daily indices.

The write could be done to daily indices - we do not support rollover aliases for dependency index at the moment.

The text was updated successfully, but these errors were encountered:

frittentheke · 2020-03-11T07:43:40Z

I just ran into this issues after successfully configuring all of Jaeger to use ES rollover.

Using rollover (and also ILM) is just much more sensible when dealing with the ever changing amounts of data that tracing might produce (or might not).
Would just be great if the spark job would work with it.

frittentheke · 2020-03-11T12:21:05Z

Considering there also is the Flink based implementation, but which currently lacks ES support altogether (jaegertracing/jaeger-analytics-flink#7).

What is the intended way forward around creating the dependencies data?

pavolloffay · 2020-03-11T12:35:25Z

There hasn't been much work on the flink job. There are no plans to support ES in the flink project. The whole project hasn't been "productized".

It makes sense to add support for rollover here, however it will need some changes how the data are loaded since we cannot read all the data from the read alias. Maybe we could use timestamps from spans instead.

frittentheke · 2020-03-11T13:02:16Z

@pavolloffay while peeking into the code, I expected the only change required is to directly address the read and write alias in the run method (i.e. when env ES_USE_ALIASES is set instead of calling the indexDate method (see: https://github.com/jaegertracing/spark-dependencies/blob/master/jaeger-spark-dependencies-elasticsearch/src/main/java/io/jaegertracing/spark/dependencies/elastic/ElasticsearchDependenciesJob.java#L203)

pavolloffay · 2020-03-11T13:08:39Z

Read alias might point to multiple indices for extended period of time e.g. week or two weeks. We cannot load that much data into memory.

Also the dependencies reader in Jaeger expects derived dependencies for the current day (or previous).

frittentheke · 2020-03-11T13:46:30Z

@pavolloffay I see, my bad. But thank you for taking the time to think about this issue.

But in any case a limit on the number of docs requested from ES makes sense. Even a single day index could contain up to 2 billion docs (ES / Lucene limit)

How about simply applying a filter via startTimeMillis when querying jaeger-span-read?
Certainly the same would need to be done in the UI when showing the dependencies from the jaeger-dependencies-read alias. But ES is very good with range queries on date fields.

Potentially using the terms query build into ES could speed things up even more (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html#query-dsl-terms-lookup) -- this will need to be done in chunks of 65k terms though....

pavolloffay · 2020-03-11T14:05:10Z

This is exactly what I have already proposed. Use the span timestamp for query and then just store the data into dependency daily index like we do now. It does not require the changes on the Jaeger side.

frittentheke · 2020-03-11T14:40:55Z

Yeah - that sounds like simplest approach. Only downside about having a daily dependency index is that is is quite wasteful on the number shards ... but potentially not really an issue when talking about a few days worth of indices.

lgenasi-gocity · 2022-12-04T12:35:28Z

@frittentheke I know this is an old issue now but I tried setting ES_USE_ALIASES: true and it still not using the aliases. Am I missing something obvious?

   jaeger-jaeger-operator-jaeger-spark-dependencies:
    Image:      jaegertracing/spark-dependencies
    Port:       <none>
    Host Port:  <none>
    Environment:
      STORAGE:         elasticsearch
      ES_NODES:        http://elasticsearch-master.elastic-system:9200
      ES_USE_ALIASES:  true

22/12/04 08:38:30 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2022-12-04T00:00Z, reading from jaeger-span-2022-12-04 index, result storing to jaeger-dependencies-2022-12-04
22/12/04 08:38:31 INFO ElasticsearchDependenciesJob: Done, 0 dependency objects created

It still seems to be attempting to read an index with a date suffix
reading from jaeger-span-2022-12-04 index

The container image ID being used seems to be the latest on dockerhub
docker.io/jaegertracing/spark-dependencies@sha256:08dca989f4c7de0af8940ab3466e9fcc69e4c159ddb23be28ffab378ea66e03b

Any help understanding what's going on would be much appreciated, thanks.

frittentheke · 2023-04-05T19:48:44Z

The container image ID being used seems to be the latest on dockerhub
docker.io/jaegertracing/spark-dependencies@sha256:08dca989f4c7de0af8940ab3466e9fcc69e4c159ddb23be28ffab378ea66e03b

Any help understanding what's going on would be much appreciated, thanks.

Sorry @lgenasi-gocity for never responding.
I honestly don't know if there was a container release after my PR was merged.

@albertteoh ?

sergeykad · 2024-10-29T15:21:41Z

@frittentheke The latest release is at ghcr.io/jaegertracing/spark-dependencies/spark-dependencies according to the readme.

It worked for me with the following configuration.

spark:
  enabled: true
  image:
    registry: ghcr.io
    repository: jaegertracing/spark-dependencies/spark-dependencies
  extraEnv:
    - name: ES_USE_ALIASES
      value: "true"

The only issue is that it is configured differently from the rest of the Jaeger services, but it's not critical.

frittentheke mentioned this issue Mar 25, 2020

Add support for ElasticSearch alias to read spans from (Resolves #68) #86

Merged

pavolloffay mentioned this issue Mar 26, 2020

Use single ElasticSearch index to store dependencies jaegertracing/jaeger#2143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support elasticsearch rollover #68

Support elasticsearch rollover #68

pavolloffay commented Jul 24, 2019

frittentheke commented Mar 11, 2020

frittentheke commented Mar 11, 2020

pavolloffay commented Mar 11, 2020

frittentheke commented Mar 11, 2020 •

edited

Loading

pavolloffay commented Mar 11, 2020

frittentheke commented Mar 11, 2020 •

edited

Loading

pavolloffay commented Mar 11, 2020

frittentheke commented Mar 11, 2020

lgenasi-gocity commented Dec 4, 2022

frittentheke commented Apr 5, 2023

sergeykad commented Oct 29, 2024

Support elasticsearch rollover #68

Support elasticsearch rollover #68

Comments

pavolloffay commented Jul 24, 2019

frittentheke commented Mar 11, 2020

frittentheke commented Mar 11, 2020

pavolloffay commented Mar 11, 2020

frittentheke commented Mar 11, 2020 • edited Loading

pavolloffay commented Mar 11, 2020

frittentheke commented Mar 11, 2020 • edited Loading

pavolloffay commented Mar 11, 2020

frittentheke commented Mar 11, 2020

lgenasi-gocity commented Dec 4, 2022

frittentheke commented Apr 5, 2023

sergeykad commented Oct 29, 2024

frittentheke commented Mar 11, 2020 •

edited

Loading

frittentheke commented Mar 11, 2020 •

edited

Loading