Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10338. Implement a Client Datanode API to stream a block #6613

Open
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

chungen0126
Copy link
Contributor

@chungen0126 chungen0126 commented Apr 30, 2024

What changes were proposed in this pull request?

To reduce round trips between the Client and Datanode for reading a block, we nee a new API to read.

Client -> block(offset, length) -> Datanode
Client <- chunkN <- Datanode
Client <- chunkN+1 <- Datanode
..
Client <-chunkLast <- Datanode

This is using the ability of gRPC to send bidirectional traffic such that the server can pipeline the chunks to the client without waiting for ReadChunk API calls. This also avoids the client from creating multiple Chunk Stream Clients and should simplify the read path on the client side by a bit.

Please describe your PR in detail:

  • Add a new logic at both client and server side to read block as streaming chunks.
  • Add a new StreamBlockInput at client side called from KeyInputStream to read a block from the container.
  • Add unit tests and integration tests for `StreamBlockInput.
  • Add a new version in datanode for compatibilities, while new client reading blocks from old server, it will fallback and read blocks by BlockInputStream.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10338

How was this patch tested?

There are existed test for reading data.

Copy link
Contributor

@devabhishekpal devabhishekpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this effort @chungen0126.
I just had a few questions and some nits.

@fenixjin
Copy link

Test conducted on our cluster(3DN / HDD / 10 Gigabit network) shows this improvement can boost read speed by at least 30%.

In single thread read, stream read cut read time from 7.3 - 7.4s to 4.8 - 5.2s.
In freon ozone-client-one-key-reader test, stream read increased read bandwidth from 427MB/s to 586MB/s.

BlockID blockID = BlockID.getFromProtobuf(
readBlock.getBlockID());
// This is a new api the block should always be checked.
BlockUtils.verifyReplicaIdx(kvContainer, blockID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the replicaIndex of the block changes because of containerBalancer running on the background. We would need to take some kind of a lock here to ensure the block data does change after this point.

Copy link
Contributor

@swamirishi swamirishi Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we would need to validate the replicaIndex and bcsID is the same on every readChunk call on the file(Basically move this check inside the loop). Take a look at HDDS-10983 for context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we would need to validate the replicaIndex and bcsID is the same on every readChunk call on the file(Basically move this check inside the loop). Take a look at HDDS-10983 for context.

@swamirishi thanks for your review.
I'm still confused of the problem. We validate the replicaIndex and bcsID at the start of the readBlock, and all the readChunks belong to the same block. Why do we need to validate again fot every readChunk. If the replicaIndex of the block changes during the readBlock, a mismatch can still happen after the validation and before readChunk.

Copy link
Contributor

@swamirishi swamirishi Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could so happen that the container replica index could change b/w 2 read chunks. You are right about the fact that the replica can still fail, but we can save unnecessary round trips b/w client & server side. It is about narrowing down the possibilty and minor optimization. We need to ensure that the checksums of the chunks do match on the client side. I am still looking through the client side code, just wanted to understand if we are doing a checksum verification for each and every chunk read on the client side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for adding replica index validation.

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chungen0126 Thanks for working on the patch. I am still reviewing the PR. Posting my first level review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants