-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Backend Engine Integration #773
base: master
Are you sure you want to change the base?
Conversation
ie_inputs[i] = nullptr; | ||
ie_inputs[i] = static_pointer_cast<IETensor>(inputs[i]); | ||
input_names[i] = input_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make IETensor named so that we can track the name together with the tensor.
@@ -167,7 +165,9 @@ bool Executable::Call(const vector<shared_ptr<runtime::Tensor>>& inputs, | |||
} | |||
|
|||
// Prepare input blobs | |||
auto func = m_network.getFunction(); | |||
auto func = m_ie_engine->GetFunc(); | |||
std::vector<std::shared_ptr<IETensor>> ie_inputs(inputs.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary. You can pass inputs directly to the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in that case we have to pass ngraph tensors directly and cast them to IETensors inside the engine. Then we will need to move more functionality on the backend from the executable. We might need further discussion about the related changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IETensor
is an nGraph tensor.
m_ie_engine->Infer(ie_inputs, input_names, ie_outputs, output_names, | ||
ie_hoisted_params, param_names); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the backend need to know hoisted parameters for? It could just be passed as an input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to merge them into a single vector. But I think this time we may not pass the inputs directly as suggested in the review above. Also, I'm not sure if this would be safe for batching. It may not be an issue but we may need to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A hoisted parameter is an input to the model. The backend doesn't need to know whether an input was hoisted or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the basic backend this may not matter. VADM backend will divide input into multiple batches but it's not the case for the hoisted parameters. How do their size affected by the actual input batch size? If we pass them as regular inputs to the backend now, the VADM backend will disable batching because it sees multiple inputs although the actual input size is 1.
@@ -0,0 +1,74 @@ | |||
/******************************************************************************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit in terms of naming: I'd prefer to call this Backend
. The "IE" prefix is unnecessary since that's the only type of backends we expect to interface with. The files can similarly be renamed to be:
backend.{h,cc}
backends/hddl.{h,cc}
backends/myriad.{h,cc}
if (m_device == "MYRIAD") { | ||
// Set MYRIAD configurations | ||
if (IEUtils::VPUConfigEnabled()) { | ||
config["MYRIAD_DETECT_NETWORK_BATCH"] = "NO"; | ||
} | ||
|
||
if (IEUtils::VPUFastCompileEnabled()) { | ||
config["MYRIAD_HW_INJECT_STAGES"] = "NO"; | ||
config["MYRIAD_COPY_OPTIMIZATION"] = "NO"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These customizations should be set in the "myriad" backend implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered within the comment below since it is related.
@@ -0,0 +1,80 @@ | |||
/******************************************************************************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Basic Engine" is unnecessary if you provide a default implementation for the abstract backend? Backends that don't need a custom "Infer" or custom configuration can fall back to the default impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the basic execution should be in Executable or in BackendEngine? We would disagree with the first one. The second option might be possible and it may help with the issue about MYRIAD configurations above too. But we may need to think about it if it will impact anything else in the future. Although, we may change this design later, I don't think it's in a bad shape the be merged now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Basic" execution should be in the implementation of the default Backend.
// Returns the NGraph Function from the CNNNetwork | ||
std::shared_ptr<ngraph::Function> GetFunc(); | ||
|
||
virtual const std::vector<size_t> GetOutputShape(const int i) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a function needed for VADM backend. We can remove it in this PR but we need to bring it back with the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output shape can be determined either from the function or the output that was originally passed to the backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VADM backend modifies the batch size of the function so the output size we get from the function will be wrong. We can get the output size just after creating the function and use it later for allocations maybe. This will add extra functionality into the NGraphEncapsulateOp and I need to test to see how it works.
|
||
// Returns output batch size based on the input batch size and the device | ||
// FIXME: This may not be needed | ||
virtual size_t GetOutputBatchSize(size_t input_batch_size) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
// Enables multi request execution if the execution engine supprts | ||
void EnableMultiReqExecution(); | ||
// Disables multi request execution | ||
void DisableMultiReqExecution(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a parameter to the HDDL backend constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may limit us to enable/disable batching dynamically. Or this parameter might be a part of caching. But it will require more changes into the existing bridge code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which scenarios would we want to enable/disable batching dynamically for a given network executing on a given device? If we expect this interface to be limited to a specific backend, then it shouldn't be a part of the abstract interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this should be a parameter to the Executable constructor since the backend is created there.
virtual void StartAsyncInference(const int req_id); | ||
virtual void CompleteAsyncInference(const int req_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's one supposed to use these? Infer
only uses one infer request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are required for the asynchronous execution. Currently we need this for VADM. For the other backends, it will be same as Infer for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can tell that these are required for asynchronous execution from the name :)
I was asking how one'd use this interface because I don't see a way to create an asynchronous inference request? The implementation for these is broken at the moment, and I'd prefer that we implement it correctly if we're extending the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not have any impact on the execution. We kept all executions as async to have a common call for all backends in case we may need it for multiple backends in the future. We can move back to Infer call for the basic execution for now, it will not make any difference.
This PR includes the new backend engine. It separates the OpenVINO execution from the Executable into an isolated class, IEBackendEngine. This will create an abstraction for integrating other device based backend engines in the future.