Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic NPU optimizing notebook #2531

Open
SRai22 opened this issue Nov 18, 2024 · 5 comments
Open

Generic NPU optimizing notebook #2531

SRai22 opened this issue Nov 18, 2024 · 5 comments

Comments

@SRai22
Copy link

SRai22 commented Nov 18, 2024

Could someone help me with optimizing models for NPU ? I want to optimize Florence2 for NPU and also want to study if there can be a generic method to optimize a model to run on NPUs.

@brmarkus
Copy link

Can you provide more details, please?

Are you talking about this model, e.g. "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/florence2/florence2.ipynb" (where only CPU and AUTO can be selected as inference device, but not GPU or NPU)?

Do you already see this model performing inference on your NPU (which?), but want to "optimize" it, in terms of reducing latency, increase throughput, increase accuracy, optimize in terms of tayloring for your input and/or output?

Or do you mean enabling it for NPU for the first time?

@eaidova
Copy link
Collaborator

eaidova commented Nov 19, 2024

@brmarkus that is not true that GPU can not be selected, this is just current widget state with saved output because notebook was executed on machine without GPU, NPU if it is available in the system also can be selected, but I do not think that it will work as model inference required dynamic shapes support that is not supported on NPU plugin side unfortunatly

@SRai22
Copy link
Author

SRai22 commented Nov 19, 2024

As @eaidova mentioned, I'm facing the issue with dynamic shapes support when selecting NPU as the device. I saw in https://github.com/openvinotoolkit/openvino_notebooks/blob/2ca15e460fac1fd2be18c81f5530f03505a4fd03/utils/notebook_utils.py#L702C1-L715C71 creates a model optimized for NPU. Wanted to understand if there is a generic method for achieving this ?

@eaidova
Copy link
Collaborator

eaidova commented Nov 19, 2024

No, this transfrmation pass is workaround for some plugin issue used in rag notebook for making embeddings model more npu friendly, it can not be treated as general solution.

Unfortunatly, it maybe complicated to adopt full florence2 model fro npu inference as model generates tokens one by one incrementing model input shape for next step. It will requires own pipeline implementation that working with padding to overcome dynamic shapes issue similar to llm case. cc @TolyaTalamanov @dmatveev

For now, I can suggest trying to run the model on NPU partially, the image encoder part may easily work with static input shape and be inferred on NPU. You need just reshape it before compilation in this place:

self.image_embedding = core.compile_model(model_dir / IMAGE_EMBEDDING_NAME, device, ov_config)

ov_model = core.read_model(model_dir / IMAGE_EMBEDDING_NAME)
ov_model.reshape({"pixel_values": [1, 3, 768, 768]})
self.image_embedding = core.compile_model(ov_model, "NPU", ov_config)

remains model part can be inferred on CPU or GPU in the same time

@SRai22
Copy link
Author

SRai22 commented Nov 20, 2024

This change compiles the model successfully on the NPU but outputs are incorrect. CAPTION is incorrect and Object detection returns empty list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants