[V1] VLM prefix caching: Add hashing of images #10497

alexm-neuralmagic · 2024-11-20T17:13:38Z

As part of V1 VLM prefix caching, we need to support hashing of images. This PR adds logic to hash images and pipes the hashes down to the model runner (if needed). Currently, it uses a cryptographic hash so the match between image and hash is precise, however, it is also possible to use a less precise hash to match "similar" images. The library used for hashing is blake3 (), which seems to be pretty efficient.

As an example to hash a 1770x1180 RGB PIL image, it takes 1.6ms to perform image.tobytes() and 0.8ms to hash all of the image bytes (177011803 = 6265800 bytes). Log print:

req.mm_data = {'image': <PIL.Image.Image image mode=RGB size=1770x1180 at 0x7E871606C1F0>}
tobytes time = 0.0016231536865234375
hash time = 0.0008261203765869141
hash value = 9de5230fb3052dee056a26e71a545552207495528a763a15ad98d31961f53578

As a reference, to run the HF mapper/preprocessor it may take 10-50ms per image.

github-actions · 2024-11-20T17:13:51Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac · 2024-11-20T17:15:41Z

cc @rickyyx @yue-anyscale

rickyyx · 2024-11-20T18:56:37Z

vllm/v1/engine/core.py

@@ -101,6 +131,9 @@ def add_request(self, request: EngineCoreRequest):
        # take 10-50 ms, which can cause a spike in the latency. We should
        # consider moving this to a separate thread.
        if req.mm_data:
+


Thoughts on doing this on the frontend engine process (i.e. v1/engine/processor.py::Processor) before sending to the EngineCore?

IIUC: this add_request is called on the EngineCore process, meaning it's sync blocking the model executor too?

Yea this is already planned. Eventually the multimodal data processor will live on the frontend, together with input token sequence processor. #10044 is working towards this direction.

@rickyyx I think it is a good idea, I can try it.

[V1] VLM prefix caching: Add hashing of images

6e5c165

alexm-neuralmagic requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96 and comaniac as code owners November 20, 2024 17:13

alexm-neuralmagic self-assigned this Nov 20, 2024

alexm-neuralmagic marked this pull request as draft November 20, 2024 17:13

rickyyx reviewed Nov 20, 2024

View reviewed changes

ywang96 mentioned this pull request Nov 21, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

33 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] VLM prefix caching: Add hashing of images #10497

[V1] VLM prefix caching: Add hashing of images #10497

alexm-neuralmagic commented Nov 20, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 20, 2024

comaniac commented Nov 20, 2024

rickyyx Nov 20, 2024

ywang96 Nov 20, 2024 •

edited

Loading

alexm-neuralmagic Nov 20, 2024

[V1] VLM prefix caching: Add hashing of images #10497

Are you sure you want to change the base?

[V1] VLM prefix caching: Add hashing of images #10497

Conversation

alexm-neuralmagic commented Nov 20, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 20, 2024

comaniac commented Nov 20, 2024

rickyyx Nov 20, 2024

Choose a reason for hiding this comment

ywang96 Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

alexm-neuralmagic Nov 20, 2024

Choose a reason for hiding this comment

alexm-neuralmagic commented Nov 20, 2024 •

edited by github-actions bot

Loading

ywang96 Nov 20, 2024 •

edited

Loading