-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic NPU optimizing notebook #2531
Comments
Can you provide more details, please? Are you talking about this model, e.g. "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/florence2/florence2.ipynb" (where only CPU and AUTO can be selected as inference device, but not GPU or NPU)? Do you already see this model performing inference on your NPU (which?), but want to "optimize" it, in terms of reducing latency, increase throughput, increase accuracy, optimize in terms of tayloring for your input and/or output? Or do you mean enabling it for NPU for the first time? |
@brmarkus that is not true that GPU can not be selected, this is just current widget state with saved output because notebook was executed on machine without GPU, NPU if it is available in the system also can be selected, but I do not think that it will work as model inference required dynamic shapes support that is not supported on NPU plugin side unfortunatly |
As @eaidova mentioned, I'm facing the issue with dynamic shapes support when selecting NPU as the device. I saw in https://github.com/openvinotoolkit/openvino_notebooks/blob/2ca15e460fac1fd2be18c81f5530f03505a4fd03/utils/notebook_utils.py#L702C1-L715C71 creates a model optimized for NPU. Wanted to understand if there is a generic method for achieving this ? |
No, this transfrmation pass is workaround for some plugin issue used in rag notebook for making embeddings model more npu friendly, it can not be treated as general solution. Unfortunatly, it maybe complicated to adopt full florence2 model fro npu inference as model generates tokens one by one incrementing model input shape for next step. It will requires own pipeline implementation that working with padding to overcome dynamic shapes issue similar to llm case. cc @TolyaTalamanov @dmatveev For now, I can suggest trying to run the model on NPU partially, the image encoder part may easily work with static input shape and be inferred on NPU. You need just reshape it before compilation in this place:
remains model part can be inferred on CPU or GPU in the same time |
This change compiles the model successfully on the NPU but outputs are incorrect. CAPTION is incorrect and Object detection returns empty list. |
Could someone help me with optimizing models for NPU ? I want to optimize Florence2 for NPU and also want to study if there can be a generic method to optimize a model to run on NPUs.
The text was updated successfully, but these errors were encountered: