Generic NPU optimizing notebook #2531

SRai22 · 2024-11-18T20:21:26Z

Could someone help me with optimizing models for NPU ? I want to optimize Florence2 for NPU and also want to study if there can be a generic method to optimize a model to run on NPUs.

brmarkus · 2024-11-18T22:53:57Z

Can you provide more details, please?

Are you talking about this model, e.g. "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/florence2/florence2.ipynb" (where only CPU and AUTO can be selected as inference device, but not GPU or NPU)?

Do you already see this model performing inference on your NPU (which?), but want to "optimize" it, in terms of reducing latency, increase throughput, increase accuracy, optimize in terms of tayloring for your input and/or output?

Or do you mean enabling it for NPU for the first time?

eaidova · 2024-11-19T03:41:40Z

@brmarkus that is not true that GPU can not be selected, this is just current widget state with saved output because notebook was executed on machine without GPU, NPU if it is available in the system also can be selected, but I do not think that it will work as model inference required dynamic shapes support that is not supported on NPU plugin side unfortunatly

SRai22 · 2024-11-19T05:11:48Z

As @eaidova mentioned, I'm facing the issue with dynamic shapes support when selecting NPU as the device. I saw in https://github.com/openvinotoolkit/openvino_notebooks/blob/2ca15e460fac1fd2be18c81f5530f03505a4fd03/utils/notebook_utils.py#L702C1-L715C71 creates a model optimized for NPU. Wanted to understand if there is a generic method for achieving this ?

eaidova · 2024-11-19T05:29:44Z

No, this transfrmation pass is workaround for some plugin issue used in rag notebook for making embeddings model more npu friendly, it can not be treated as general solution.

Unfortunatly, it maybe complicated to adopt full florence2 model fro npu inference as model generates tokens one by one incrementing model input shape for next step. It will requires own pipeline implementation that working with padding to overcome dynamic shapes issue similar to llm case. cc @TolyaTalamanov @dmatveev

For now, I can suggest trying to run the model on NPU partially, the image encoder part may easily work with static input shape and be inferred on NPU. You need just reshape it before compilation in this place:

openvino_notebooks/notebooks/florence2/ov_florence2_helper.py

Line 356 in 2ca15e4

    
           self.image_embedding = core.compile_model(model_dir / IMAGE_EMBEDDING_NAME, device, ov_config)

ov_model = core.read_model(model_dir / IMAGE_EMBEDDING_NAME)
ov_model.reshape({"pixel_values": [1, 3, 768, 768]})
self.image_embedding = core.compile_model(ov_model, "NPU", ov_config)

remains model part can be inferred on CPU or GPU in the same time

SRai22 · 2024-11-20T17:43:05Z

This change compiles the model successfully on the NPU but outputs are incorrect. CAPTION is incorrect and Object detection returns empty list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic NPU optimizing notebook #2531

Generic NPU optimizing notebook #2531

SRai22 commented Nov 18, 2024

brmarkus commented Nov 18, 2024

eaidova commented Nov 19, 2024

SRai22 commented Nov 19, 2024

eaidova commented Nov 19, 2024 •

edited

Loading

SRai22 commented Nov 20, 2024

Generic NPU optimizing notebook #2531

Generic NPU optimizing notebook #2531

Comments

SRai22 commented Nov 18, 2024

brmarkus commented Nov 18, 2024

eaidova commented Nov 19, 2024

SRai22 commented Nov 19, 2024

eaidova commented Nov 19, 2024 • edited Loading

SRai22 commented Nov 20, 2024

eaidova commented Nov 19, 2024 •

edited

Loading