Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

Closed
wants to merge 2 commits into from

Conversation

Technoboy-
Copy link
Contributor

Motivation

If the estimatedReadSize larger than the managedLedgerMaxReadsInFlightSizeInMB, reading entries will fail.

2024-10-10T14:31:46,939+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.cache.RangeEntryCacheImpl - Time-out elapsed while acquiring enough permits on the memory limiter to read from ledger 114226, public/default/persistent/test-partition-0, estimated read size 120875300 bytes for 100 entries (check managedLedgerMaxReadsInFlightSizeInMB)

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not update a global variable this way. The update will affect all of the ledgers/topics without control.

Also, we should add a test that reproduces the problem and ensures that it is fixed

@lhotari wdyt?

private final long maxReadsInFlightSize;
@Setter
@Getter
private long maxReadsInFlightSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable should stay final

@lhotari
Copy link
Member

lhotari commented Nov 7, 2024

@Technoboy- I'm working on a broader refactoring to address the issue. There are multiple challenges. I have reported multiple issues, #23482, #23504, #23505 and #23506. I'm working to resolve them soon.
While fixing the issues, I have discovered more. For example, when reads get de-duplicated by PendingReadsManager and there are partial matches, the permits will get acquired also for the partial reads. I have a large change set in progress, which I will split into smaller pull requests once I have addressed the issues with the refactoring and the improvements and I have tests passing.
Since my main goal is to improve caching for Key_Shared subscriptions, this has revealed more gaps in addressing that. In the current solution replay reads aren't cached at all. I noticed that @eolivelli has reported a related issue #16421 about that for Shared subscriptions. The comment #16421 (comment) is relevant. Messages in the replay queues shouldn't be discarded from the cache. I'm also trying to address that in my experiments. That's why the changes have expanded to also address broker cache short comings.

@lhotari
Copy link
Member

lhotari commented Nov 7, 2024

One reason why the read size could exceed managedLedgerMaxReadsInFlightSizeInMB is #23482, that's what I'm also addressing in my changes that are WIP (example commit, part of the WIP changes).

@Technoboy-
Copy link
Contributor Author

@Technoboy- I'm working on a broader refactoring to address the issue. There are multiple challenges. I have reported multiple issues, #23482, #23504, #23505 and #23506. I'm working to resolve them soon. While fixing the issues, I have discovered more. For example, when reads get de-duplicated by PendingReadsManager and there are partial matches, the permits will get acquired also for the partial reads. I have a large change set in progress, which I will split into smaller pull requests once I have addressed the issues with the refactoring and the improvements and I have tests passing. Since my main goal is to improve caching for Key_Shared subscriptions, this has revealed more gaps in addressing that. In the current solution replay reads aren't cached at all. I noticed that @eolivelli has reported a related issue #16421 about that for Shared subscriptions. The comment #16421 (comment) is relevant. Messages in the replay queues shouldn't be discarded from the cache. I'm also trying to address that in my experiments. That's why the changes have expanded to also address broker cache short comings.

ok, i will close this patch.

@Technoboy- Technoboy- closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants