[fix][broker] Fix reading entries failed due to max in-flight reading #23524

Technoboy- · 2024-10-28T13:39:38Z

Motivation

If the estimatedReadSize larger than the managedLedgerMaxReadsInFlightSizeInMB, reading entries will fail.

2024-10-10T14:31:46,939+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.cache.RangeEntryCacheImpl - Time-out elapsed while acquiring enough permits on the memory limiter to read from ledger 114226, public/default/persistent/test-partition-0, estimated read size 120875300 bytes for 100 entries (check managedLedgerMaxReadsInFlightSizeInMB)

Documentation

doc
doc-required
doc-not-needed
doc-complete

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

eolivelli

We should not update a global variable this way. The update will affect all of the ledgers/topics without control.

Also, we should add a test that reproduces the problem and ensures that it is fixed

@lhotari wdyt?

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

eolivelli · 2024-11-07T07:21:33Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/InflightReadsLimiter.java

-    private final long maxReadsInFlightSize;
+    @Setter
+    @Getter
+    private long maxReadsInFlightSize;


This variable should stay final

lhotari · 2024-11-07T09:07:00Z

@Technoboy- I'm working on a broader refactoring to address the issue. There are multiple challenges. I have reported multiple issues, #23482, #23504, #23505 and #23506. I'm working to resolve them soon.
While fixing the issues, I have discovered more. For example, when reads get de-duplicated by PendingReadsManager and there are partial matches, the permits will get acquired also for the partial reads. I have a large change set in progress, which I will split into smaller pull requests once I have addressed the issues with the refactoring and the improvements and I have tests passing.
Since my main goal is to improve caching for Key_Shared subscriptions, this has revealed more gaps in addressing that. In the current solution replay reads aren't cached at all. I noticed that @eolivelli has reported a related issue #16421 about that for Shared subscriptions. The comment #16421 (comment) is relevant. Messages in the replay queues shouldn't be discarded from the cache. I'm also trying to address that in my experiments. That's why the changes have expanded to also address broker cache short comings.

lhotari · 2024-11-07T10:02:04Z

One reason why the read size could exceed managedLedgerMaxReadsInFlightSizeInMB is #23482, that's what I'm also addressing in my changes that are WIP (example commit, part of the WIP changes).

Technoboy- · 2024-11-08T01:30:19Z

@Technoboy- I'm working on a broader refactoring to address the issue. There are multiple challenges. I have reported multiple issues, #23482, #23504, #23505 and #23506. I'm working to resolve them soon. While fixing the issues, I have discovered more. For example, when reads get de-duplicated by PendingReadsManager and there are partial matches, the permits will get acquired also for the partial reads. I have a large change set in progress, which I will split into smaller pull requests once I have addressed the issues with the refactoring and the improvements and I have tests passing. Since my main goal is to improve caching for Key_Shared subscriptions, this has revealed more gaps in addressing that. In the current solution replay reads aren't cached at all. I noticed that @eolivelli has reported a related issue #16421 about that for Shared subscriptions. The comment #16421 (comment) is relevant. Messages in the replay queues shouldn't be discarded from the cache. I'm also trying to address that in my experiments. That's why the changes have expanded to also address broker cache short comings.

ok, i will close this patch.

Add log to track issue

519da6f

Technoboy- self-assigned this Oct 28, 2024

Technoboy- added ready-to-test release/3.3.3 release/3.0.8 release/4.0.1 labels Oct 28, 2024

Technoboy- added this to the 4.1.0 milestone Oct 28, 2024

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 28, 2024

Technoboy- requested a review from poorbarcode October 28, 2024 13:43

eolivelli requested changes Nov 5, 2024

View reviewed changes

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java Show resolved Hide resolved

address comment

540fcdc

Technoboy- requested a review from eolivelli November 6, 2024 06:27

eolivelli requested changes Nov 7, 2024

View reviewed changes

Technoboy- closed this Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

Technoboy- commented Oct 28, 2024

eolivelli left a comment

eolivelli Nov 7, 2024

lhotari commented Nov 7, 2024 •

edited

Loading

lhotari commented Nov 7, 2024

Technoboy- commented Nov 8, 2024

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

Conversation

Technoboy- commented Oct 28, 2024

Motivation

Documentation

eolivelli left a comment

Choose a reason for hiding this comment

eolivelli Nov 7, 2024

Choose a reason for hiding this comment

lhotari commented Nov 7, 2024 • edited Loading

lhotari commented Nov 7, 2024

Technoboy- commented Nov 8, 2024

lhotari commented Nov 7, 2024 •

edited

Loading