Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MarkDequantization transformation #27406

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

itikhono
Copy link
Contributor

@itikhono itikhono commented Nov 5, 2024

Details:

Original issue description:

Inside MarkDequantizationSubgraph, we try to find the dequantization subgraph and mark it with disable const folding and keep const precision attributes.
The dequantization subgraph has a relaxed structure and allows "any_input" as the input of Convert operation.

from MarkDequantizationSubgraph:
{A09C9C08-03C1-4A48-8EAD-60D7EF76A4B5}

in the current case this is
(any input: Constant->Reshape) -> (necessary Convert)

Constant->Reshape will be ConstFolded. ConstFolding doesn't run data copying in case of Reshape operation, so this is valid scenario.

MarkDequantizationSubgraph transformation marks Reshape op with "KeepConstPrecision" attribute.
But after ConstFolding, KeepConstPrecision attr won't be copied to resulting constant because of
{FE3E166D-17BD-410F-A08F-4502FB9BB3D0}
and the whole dequantization subgraph will be const folded.

Changes:

  • MarkDequantizationSubgraph logic was split into 2 transformations: MarkDequantization, KeepConstPrecision

Tickets:

@itikhono itikhono requested review from a team as code owners November 5, 2024 12:19
@itikhono itikhono requested review from Lyamin-Roman and removed request for a team November 5, 2024 12:19
@github-actions github-actions bot added category: transformations OpenVINO Runtime library - Transformations category: LP transformations OpenVINO Low Precision transformations labels Nov 5, 2024
@itikhono itikhono added this to the 2025.0 milestone Nov 5, 2024
Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that we start propagating the rt_info which shouldn't be propagated. I have an alternative idea on how the described issue can be fixed.

Currently, MarkDequantizationSubgraph does several markups including:

  1. disable_constant_folding markup for converts
  2. enable_keep_const_precision markup for constants
  3. The rest markup (mark_as_dequantization_node or unmark_as_decompression)

What if we move enable_keep_const_precision markup in a separate matcher pass, and organize the markup pipeline in the following way?

  1. First pass will have the same matcher as MarkDequantizationSubgraph currently has, and will perform only disable_constant_folding markup.
  2. Then, ConstantFolding will be called, and all the intermediate layers between weights and convert will be folded. After that, the subgraph will have canonical form (low precision weights -> Convert -> (optional)Subtract -> Multiply)
  3. At the end, enable_keep_const_precision markup will match the subgraphs with canonical form and mark constant nodes in this pattern

What do you think?

@itikhono
Copy link
Contributor Author

itikhono commented Nov 6, 2024

My main concern is that we start propagating the rt_info which shouldn't be propagated. I have an alternative idea on how the described issue can be fixed.

Currently, MarkDequantizationSubgraph does several markups including:

  1. disable_constant_folding markup for converts
  2. enable_keep_const_precision markup for constants
  3. The rest markup (mark_as_dequantization_node or unmark_as_decompression)

What if we move enable_keep_const_precision markup in a separate matcher pass, and organize the markup pipeline in the following way?

  1. First pass will have the same matcher as MarkDequantizationSubgraph currently has, and will perform only disable_constant_folding markup.
  2. Then, ConstantFolding will be called, and all the intermediate layers between weights and convert will be folded. After that, the subgraph will have canonical form (low precision weights -> Convert -> (optional)Subtract -> Multiply)
  3. At the end, enable_keep_const_precision markup will match the subgraphs with canonical form and mark constant nodes in this pattern

What do you think?

Yes, it's possible. But wouldn't that complicate the already complicated pipeline?
As I can see, KeepConstPrecision is used only for Constants in ConvertPrecision transformation, the flag will be ignored in case of other ops marked. So, it's precisely unsafe to copy this attribute into Constants, not to other ops.

e.g.

  1. initial subgraph: Parameter -> Reshape( "target_shape" const as an input) -> Convert
  2. Reshape is marked with KeepConstPrecision flag
  3. some transformation replaces the Reshape: Const -> Reshape (new "target_shape" Const marked with KeepConstPrecision)

"target_shape" Const will keep the original precision and won't be affected by ConvertPrecision transformation.
This may cause some implicit issues.

I believe we can expect constant subgraphs to fold and the issues do not arise.
Do we have non-constant (Parameter dependent) subgraphs in the real scenarios in this pattern?
Can this flag somehow extend beyond the dequantization subgraph?

@v-Golubev
Copy link
Contributor

Do we have non-constant (Parameter dependent) subgraphs in the real scenarios in this pattern?

Theoretically, we can have the subgraphs, which are marked by MarkDequantizationSubgraph, on non-constant path. For example, LPT supports the following subgraphs coming from some frontends (a picture from lpt docs):
image

In this case, the dequantization subgraph has canonical form, and f32->u8 convert allows MarkDequantizationSubgraph to match by precision on this convert as data node. And such subgraphs are placed both on data flow and weights.

Can this flag somehow extend beyond the dequantization subgraph?

It seems like rt info from convert ops is not propagated through the graph during LPT. However, LPT are not the single interaction point with such canonical subgraphs: decompression related passes can also fuse ops from the subgraph in a separate operation, e.g. GatherCompressed. On your branch, we get the following graph after the fusion:
image

So keep_const_precision attribute is placed on non-constant flow after the fusion. Currently, we have no transformations which could work with GatherCompressed, but there is no guarantee that such optimizations will not exist in the future, and in case of such unwanted rt_info propagation we could get the problems you described in the example.

In conclusion, we will most likely not get any problems from the copyable KeepConstPrecision attribute in the current pipeline, but such change lays the foundation for potential future problems

@itikhono
Copy link
Contributor Author

itikhono commented Nov 7, 2024

Do we have non-constant (Parameter dependent) subgraphs in the real scenarios in this pattern?

Theoretically, we can have the subgraphs, which are marked by MarkDequantizationSubgraph, on non-constant path. For example, LPT supports the following subgraphs coming from some frontends (a picture from lpt docs): image

In this case, the dequantization subgraph has canonical form, and f32->u8 convert allows MarkDequantizationSubgraph to match by precision on this convert as data node. And such subgraphs are placed both on data flow and weights.

Can this flag somehow extend beyond the dequantization subgraph?

It seems like rt info from convert ops is not propagated through the graph during LPT. However, LPT are not the single interaction point with such canonical subgraphs: decompression related passes can also fuse ops from the subgraph in a separate operation, e.g. GatherCompressed. On your branch, we get the following graph after the fusion: image

So keep_const_precision attribute is placed on non-constant flow after the fusion. Currently, we have no transformations which could work with GatherCompressed, but there is no guarantee that such optimizations will not exist in the future, and in case of such unwanted rt_info propagation we could get the problems you described in the example.

In conclusion, we will most likely not get any problems from the copyable KeepConstPrecision attribute in the current pipeline, but such change lays the foundation for potential future problems

Ok, in this case, I will split the transformation into 2:

  1. disable_constant_folding markup.
  2. enable_keep_const_precision markup

@itikhono itikhono requested review from a team as code owners November 13, 2024 11:51
@itikhono itikhono requested review from tadamczx and removed request for a team November 13, 2024 11:51
@itikhono itikhono changed the title Make KeepConstPrecision attribute copyable Update MarkDequantization transformation Nov 13, 2024
@itikhono
Copy link
Contributor Author

CPU functional test failure is not related to this PR.
GPU test failure is under investigation.

Additional perf validation will be triggered for this PR.


using namespace ov;

TEST_F(TransformationTestsF, MarkDequantizationSubgraphTransformation) {
TEST_F(TransformationTestsF, KeepConstPrecision) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also compare nodes' rt_info in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done
but it looks like the rt_info comparison doesn't work, the same with other tests, I will double check

@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Nov 21, 2024
std::vector<ov::element::Type>{ ov::element::i8, ov::element::u8, ov::element::i4, ov::element::u4 });
}

//if (enableInt8) { Why do we need this check? According to the line 378 we did this marking anyway
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: clarify this with GPU team

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a confirmation from the GPU team

Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comments left from my side (except minor existing ones). Good job 👍

@itikhono
Copy link
Contributor Author

the fix for Or pattern was moved to the separate PR: #27721

@github-actions github-actions bot removed the category: Core OpenVINO Core (aka ngraph) label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation category: GPU OpenVINO GPU plugin category: LP transformations OpenVINO Low Precision transformations category: ONNX FE OpenVINO ONNX FrontEnd category: transformations OpenVINO Runtime library - Transformations under_perf_check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants