WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080

antonvor · 2023-11-15T07:37:35Z

Details:

The PR disables weights compression fp32->fp16 on the ngraph side and moves them to the CPU plug-in in fp32 precision. It allows us to improve memory consumption on ARM64 platforms. This change only affects MatMul nodes
PR to oneDNN fork: openvinotoolkit/oneDNN#220

Tickets:

127629

antonvor · 2023-12-13T08:50:54Z

@itikhono may I ask you to review transformation changes?

src/plugins/intel_cpu/tests/functional/subgraph_tests/src/arm/matmul_compress_convert.cpp

vurusovs · 2023-12-14T10:25:18Z

LGTM from tests side

pavel-esir · 2023-12-18T10:36:18Z

src/common/transformations/include/transformations/rt_info/decompression.hpp

@@ -43,4 +49,19 @@ class TRANSFORMATIONS_API Decompression : public RuntimeAttribute {
    }
 };

+class TRANSFORMATIONS_API Compression : public RuntimeAttribute {


Could you please provide in comment with explanation why we need this rt_info? Fron which it will be clear why we cannot use the existing ones a need a new rt_info

pavel-esir · 2023-12-18T10:36:48Z

src/common/transformations/include/transformations/rt_info/decompression.hpp

+    Compression() = default;
+
+    bool visit_attributes(AttributeVisitor& visitor) override {
+        return true;


is it really necessary to store this rt_info to IR?

pavel-esir · 2023-12-18T10:38:53Z

src/common/transformations/src/transformations/fp16_compression/align_mixed_fp32_fp16_types.cpp

@@ -48,6 +49,7 @@ bool ov::pass::AlignMixedFP32FP16Types::run_on_model(const std::shared_ptr<ov::M
                copy_runtime_info(incoming_node, convert);
                input.replace_source_output(convert);
                disable_fp16_compression(convert);
+                mark_as_compression(convert);


This converts are decompression converts: they upcast to fp32 for precision sensitive subgraphs. Is it possible to rename mark_as_compression to avoid confusions?
Since mark_as_compression is used only for converts that are inserted to align types for f16 and f32 parts, can we name it e.g. mark_type_aligning_convert to avoid confusion?

github-actions · 2024-01-04T00:18:28Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2024-01-11T00:18:30Z

This PR was closed because it has been stalled for 2 week with no activity.

antonvor added this to the 2023.3 milestone Nov 15, 2023

antonvor self-assigned this Nov 15, 2023

antonvor requested review from a team as code owners November 15, 2023 07:37

antonvor requested review from ilya-lavrenov and removed request for a team November 15, 2023 07:37

github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: samples OpenVINO Runtime Samples labels Nov 15, 2023

antonvor force-pushed the feature/llm_memory_consumption branch from c8cee60 to d62e4d3 Compare November 15, 2023 07:40

github-actions bot removed the category: samples OpenVINO Runtime Samples label Nov 15, 2023

antonvor force-pushed the feature/llm_memory_consumption branch from d62e4d3 to 2e42899 Compare December 11, 2023 09:59

antonvor requested a review from a team as a code owner December 11, 2023 09:59

github-actions bot added the category: build OpenVINO cmake script / infra label Dec 11, 2023

antonvor force-pushed the feature/llm_memory_consumption branch 2 times, most recently from 0f383e0 to a09364c Compare December 13, 2023 06:54

antonvor requested a review from itikhono December 13, 2023 08:51

vurusovs suggested changes Dec 13, 2023

View reviewed changes

src/plugins/intel_cpu/tests/functional/subgraph_tests/src/arm/matmul_compress_convert.cpp Outdated Show resolved Hide resolved

antonvor requested a review from vurusovs December 14, 2023 07:33

itikhono requested review from pavel-esir and mvafin December 14, 2023 11:22

antonvor force-pushed the feature/llm_memory_consumption branch 2 times, most recently from 3bcbe16 to aaa63c7 Compare December 15, 2023 08:18

pavel-esir reviewed Dec 18, 2023

View reviewed changes

antonvor force-pushed the feature/llm_memory_consumption branch from aaa63c7 to 0e48570 Compare December 19, 2023 08:50

antonvor added 2 commits December 19, 2023 10:24

[CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side

441f515

fixed dynamic shape test cases

1b28c78

antonvor force-pushed the feature/llm_memory_consumption branch from 0e48570 to 1b28c78 Compare December 19, 2023 09:24

dmitry-gorokhov modified the milestones: 2023.3, 2024.0 Dec 20, 2023

github-actions bot added the Stale label Jan 4, 2024

github-actions bot closed this Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080

WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080

antonvor commented Nov 15, 2023 •

edited

Loading

antonvor commented Dec 13, 2023

vurusovs commented Dec 14, 2023

pavel-esir Dec 18, 2023

pavel-esir Dec 18, 2023

pavel-esir Dec 18, 2023

github-actions bot commented Jan 4, 2024

github-actions bot commented Jan 11, 2024

WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080

WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080

Conversation

antonvor commented Nov 15, 2023 • edited Loading

Details:

Tickets:

antonvor commented Dec 13, 2023

vurusovs commented Dec 14, 2023

pavel-esir Dec 18, 2023

Choose a reason for hiding this comment

pavel-esir Dec 18, 2023

Choose a reason for hiding this comment

pavel-esir Dec 18, 2023

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2024

github-actions bot commented Jan 11, 2024

antonvor commented Nov 15, 2023 •

edited

Loading