-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080
WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080
Conversation
c8cee60
to
d62e4d3
Compare
d62e4d3
to
2e42899
Compare
0f383e0
to
a09364c
Compare
@itikhono may I ask you to review transformation changes? |
src/plugins/intel_cpu/tests/functional/subgraph_tests/src/arm/matmul_compress_convert.cpp
Outdated
Show resolved
Hide resolved
LGTM from tests side |
3bcbe16
to
aaa63c7
Compare
@@ -43,4 +49,19 @@ class TRANSFORMATIONS_API Decompression : public RuntimeAttribute { | |||
} | |||
}; | |||
|
|||
class TRANSFORMATIONS_API Compression : public RuntimeAttribute { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide in comment with explanation why we need this rt_info? Fron which it will be clear why we cannot use the existing ones a need a new rt_info
Compression() = default; | ||
|
||
bool visit_attributes(AttributeVisitor& visitor) override { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to store this rt_info to IR?
@@ -48,6 +49,7 @@ bool ov::pass::AlignMixedFP32FP16Types::run_on_model(const std::shared_ptr<ov::M | |||
copy_runtime_info(incoming_node, convert); | |||
input.replace_source_output(convert); | |||
disable_fp16_compression(convert); | |||
mark_as_compression(convert); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts are decompression converts: they upcast to fp32 for precision sensitive subgraphs. Is it possible to rename mark_as_compression
to avoid confusions?
Since mark_as_compression
is used only for converts that are inserted to align types for f16 and f32 parts, can we name it e.g. mark_type_aligning_convert
to avoid confusion?
aaa63c7
to
0e48570
Compare
0e48570
to
1b28c78
Compare
This PR will be closed in a week because of 2 weeks of no activity. |
This PR was closed because it has been stalled for 2 week with no activity. |
Details:
The PR disables weights compression fp32->fp16 on the ngraph side and moves them to the CPU plug-in in fp32 precision. It allows us to improve memory consumption on ARM64 platforms. This change only affects MatMul nodes
PR to oneDNN fork: openvinotoolkit/oneDNN#220
Tickets: