Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] I64 native support. #18236

Closed
wants to merge 1 commit into from

Conversation

nshchego
Copy link
Contributor

Details:

  • INT64 support in CPU plugin

Tickets:

  • 100339

@nshchego nshchego requested review from a team as code owners June 26, 2023 11:53
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference category: TEMPLATE OpenVINO Template plugin category: transformations OpenVINO Runtime library - Transformations labels Jun 26, 2023
/**
* @brief Enables inference with INT64 data type in CPU plugin if it's presented in the original model.
*/
DECLARE_CONFIG_KEY(CPU_NATIVE_I64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be discussed before merging

@maxnick
Copy link
Contributor

maxnick commented Jul 3, 2023

@antonvor , could you please review?

@github-actions github-actions bot removed the category: CPP API OpenVINO CPP API bindings label Jul 11, 2023
@nshchego nshchego force-pushed the cpu/i64/proto branch 3 times, most recently from 30426ff to 98eeff9 Compare July 11, 2023 17:40
@@ -331,6 +354,9 @@ void Gather::prepareParams() {
} else if (x64::mayiuse(x64::avx2)) {
selectedPD->setImplementationType(jit_avx2);
}
} else {
// TODO: Add tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have these tests been added? If yes, we can remove this comment

@@ -29,12 +29,12 @@ bool Reorder::isExecutable() const {
return Node::isExecutable() && !isOptimized;
}

Reorder::Reorder(const std::shared_ptr<ngraph::Node>& op, const GraphContext::CPtr context) :
Reorder::Reorder(const std::shared_ptr<ngraph::Node>& op, const GraphContext::CPtr &context) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a common code style?

Suggested change
Reorder::Reorder(const std::shared_ptr<ngraph::Node>& op, const GraphContext::CPtr &context) :
Reorder::Reorder(const std::shared_ptr<ngraph::Node>& op, const GraphContext::CPtr& context) :

Comment on lines +69 to +65
if (axisOp->get_element_type() == ov::element::i64) {
axis = axisOp->get_data_ptr<int64_t>()[0];
} else {
axis = axisOp->cast_vector<int64_t>()[0];
Copy link
Contributor

@v-Golubev v-Golubev Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use different methods here? Can we just always cast constant to int64_t vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is cheaper than the fn 'cast_vector'.

src/plugins/intel_cpu/src/nodes/subgraph.cpp Outdated Show resolved Hide resolved
IE_ASSERT(to->GetDataType() == memory::data_type::s32);
// IE_ASSERT(to->GetDataType() == memory::data_type::s32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this assert and correct the corresponding comment upper?

Comment on lines +1023 to +1070
// TODO: Actually the Result is bool in U8 representation. 0x01 or 0xFF - is there a difference for real models?
// Remove all vpsrld instructions if there is no difference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check it within this PR?

Comment on lines 1220 to 1267
jit_greater_equal_emitter::jit_greater_equal_emitter(x64::jit_generator *host, x64::cpu_isa_t host_isa, const std::shared_ptr<ov::Node>& node,
Precision exec_prc)
const Precision &exec_prc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different style in one method (for host, node, and exec_prc)

Comment on lines 79 to 81
auto idxShape = (ov::as_type<ov::op::v0::Constant>(op->get_input_node_ptr(TARGET_SHAPE_IDX)))->get_vector<int64_t>();
targetShape.reserve(idxShape.size());
targetShape.assign(idxShape.begin(), idxShape.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understood, here we copy idxShape values to targetShape with conversion to Dim. Maybe we can just use cast_vector method to do the same?

Suggested change
auto idxShape = (ov::as_type<ov::op::v0::Constant>(op->get_input_node_ptr(TARGET_SHAPE_IDX)))->get_vector<int64_t>();
targetShape.reserve(idxShape.size());
targetShape.assign(idxShape.begin(), idxShape.end());
targetShape = (ov::as_type<ov::op::v0::Constant>(op->get_input_node_ptr(TARGET_SHAPE_IDX)))->cast_vector<Dim>();

Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd part of review comments. Emitters part left unreviewed

Comment on lines +482 to +485
case element::i64: {
value = ov::as_type_ptr<ov::op::v0::Constant>(n)->cast_vector<int64_t>()[0];
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, value has int32_t type and int64_t Constant's value is converted to int32_t. Maybe we can just instantiate cast_vector with int32_t to avoid additional cast?

@@ -109,9 +104,9 @@ void Convert::initSupportedPrimitiveDescriptors() {
supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::unknown);
} else if (inputShapes.size() == 1 && outputShapes.size() == 1) {
const Shape& insShape = getInputShapeAtPort(0);
auto insPrecision = getOriginalInputPrecisionAtPort(0);
const auto &insPrecision = getOriginalInputPrecisionAtPort(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo:

Suggested change
const auto &insPrecision = getOriginalInputPrecisionAtPort(0);
const auto &inPrecision = getOriginalInputPrecisionAtPort(0);

src/plugins/intel_cpu/src/nodes/cum_sum.cpp Outdated Show resolved Hide resolved
Comment on lines 56 to 57
if (!supportedPrimitiveDescriptors.empty())
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove this check? It exists in all cpu nodes. I propose to leave it

IE_THROW() << "Can't create primitive descriptor for NonZero layer with name: " << getName() << " doesn't support "
<< inPrc.name() << " precision on 0 port";
}
auto outPrc = getOriginalOutputPrecisionAtPort(0);
if (!one_of(outPrc, /*Precision::I64,*/ Precision::I32)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we uncomment I64 precision?

Comment on lines +157 to +161
if (maxBoxPrec == ElementType::i64) {
params.push_back(std::make_shared<ov::opset1::Parameter>(element::Type_t::i64, inputDynamicShapes.back()));
} else {
params.push_back(std::make_shared<ov::opset1::Parameter>(element::Type_t::i32, inputDynamicShapes.back()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use maxBoxPrec as a parameter?

Suggested change
if (maxBoxPrec == ElementType::i64) {
params.push_back(std::make_shared<ov::opset1::Parameter>(element::Type_t::i64, inputDynamicShapes.back()));
} else {
params.push_back(std::make_shared<ov::opset1::Parameter>(element::Type_t::i32, inputDynamicShapes.back()));
}
params.push_back(std::make_shared<ov::opset1::Parameter>(maxBoxPrec, inputDynamicShapes.back()));

Comment on lines -160 to +169
maxOutBoxesPerClassNode = builder::makeConstant(maxBoxPrec, ngraph::Shape{}, std::vector<int32_t>{maxOutBoxesPerClass});
if (maxBoxPrec == ElementType::i64) {
maxOutBoxesPerClassNode = builder::makeConstant(maxBoxPrec, ov::Shape{}, std::vector<int64_t>{maxOutBoxesPerClass});
} else {
maxOutBoxesPerClassNode = builder::makeConstant(maxBoxPrec, ov::Shape{}, std::vector<int32_t>{maxOutBoxesPerClass});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxOutBoxesPerClass has int32_t type. Do we really need to cast it to int64_t?

template <ov::element::Type_t ET>
bool evaluate(const std::shared_ptr<ov::op::v4::ReduceL1>& op, const ov::HostTensorVector& outputs, const ov::HostTensorVector& inputs) {
using T = typename ov::element_type_traits<ET>::value_type;
std::cout << "evaluate ReduceL1" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove std::cout in this file

Copy link
Contributor

@v-Golubev v-Golubev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would like to clarify one point about validation. Should we run accuracy validation (at least on the subsets on the local machine using validation scripts) to check that I64 related changes don't introduce any regressions?

Comment on lines -1796 to -1860
if (!supportedPrimitiveDescriptors.empty())
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove this check?

Comment on lines 651 to 655
// parallel_for(IH, [&](size_t ih){
for (size_t ih = 0; ih < IH; ih++) {
size_t oh = ih; GET_PTR_NCDH_PLN;
reduce_kernel_process(in_ptr_ncdh, out_ptr_ncdh, IW, 1);
});
// });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return parallel loop here?

Comment on lines 1026 to 1038
if (dst_data_size == 8) {
auto src_data = reinterpret_cast<const int64_t *>(proc_ptr);
auto dst_data = reinterpret_cast<int64_t *>(out_ptr);
parallel_for2d(DIM0, stride1, [&](size_t b, size_t j) {
auto src_off = b * stride0 + j * DIM1;
auto dst_off = b * stride0 + j;
for (size_t dim1 = 0; dim1 < DIM1; dim1++) {
dst_data[dst_off] = src_data[src_off];
src_off++;
dst_off += stride1;
}
});
} else if (dst_data_size == 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a template function for this logic to avoid code duplication?

Comment on lines 1088 to 1108
auto src_data = reinterpret_cast<const int64_t *>(proc_ptr);
auto dst_data = reinterpret_cast<int64_t *>(out_ptr);
parallel_for2d(DIM0, stride1, [&](size_t b, size_t j) {
auto src_off = b * src_stride0 + j * blockLen;
auto dst_off = b * dst_stride0 + j;
for (size_t dim1 = 0; dim1 + blockLen <= DIM1; dim1 += blockLen) {
for (size_t k = 0; k < blockLen; k++) {
dst_data[dst_off] = src_data[src_off];
src_off++;
dst_off += stride1;
}
src_off += (stride1 - 1) * blockLen;
}
size_t tail = DIM1 % blockLen;
for (size_t k = 0; k < tail; k++) {
dst_data[dst_off] = src_data[src_off];
src_off++;
dst_off += stride1;
}
});
} else if (dst_data_size == 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same: it would be great if we had some template function for these computations

Comment on lines 1646 to 1647
const auto input_prec = getOriginalInputPrecisionAtPort(REDUCE_DATA);
const auto output_prec = getOriginalOutputPrecisionAtPort(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:

Suggested change
const auto input_prec = getOriginalInputPrecisionAtPort(REDUCE_DATA);
const auto output_prec = getOriginalOutputPrecisionAtPort(0);
const auto& input_prec = getOriginalInputPrecisionAtPort(REDUCE_DATA);
const auto& output_prec = getOriginalOutputPrecisionAtPort(0);

Comment on lines 719 to 734
// template <x64::cpu_isa_t isa>
// void JitReduceKernel<isa>::reduce_gather(const Vmm& vmm_dst, int64_t offset) {
// switch (jcp.src_el_type) {
// case ov::element::f64:
// case ov::element::i64:
// if (isa == x64::avx512_core) {
// auto ymm_idx = Ymm(v_idx.getIdx());
// auto zmm_src = Zmm(v_src.getIdx());

// kxnorq(k_mask, k_mask, k_mask);
// vgatherdpd(zmm_src | k_mask, ptr[reg_src + offset + ymm_idx]);
// if (jcp.src_el_type == ov::element::f64 && exec_el_type == ov::element::i64) {
// vcvtpd2qq(zmm_src, zmm_src);
// } else if (jcp.src_el_type == ov::element::i64 && exec_el_type == ov::element::f64) {
// vcvtqq2pd(zmm_src, zmm_src);
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code commented? Should we remove or uncomment it?

Comment on lines +1133 to +1185
// vcmppd(k_mask, vmm_src, v_zero, _cmp_neq_uq);
// vblendmps(vmm_src | k_mask, v_zero, v_ones);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code

Comment on lines 1415 to 1416
// reg_oc_off = getReg64();
// reg_post_ops_data = getReg64();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code


this->postamble();

if (isa == x64::avx512_core) { // Only if bf16?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the comment mean? Should we check smth before the merge?

Comment on lines 1771 to 1778
// TODO
}
uni_vdivpd(vmm_dst, vmm_dst, v_divider);
uni_vroundpd(vmm_dst, vmm_dst, 0x3); // Truncation
if (isa == x64::avx512_core) {
vcvtpd2qq(vmm_dst, vmm_dst);
} else {
// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the TODO sections

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Aug 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference category: TEMPLATE OpenVINO Template plugin category: transformations OpenVINO Runtime library - Transformations Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants