-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Command-Buffers] Implement new command-list enqueue path #1975
Conversation
6c0b66c
to
a27ba98
Compare
24b6579
to
580bec9
Compare
87ca82c
to
b67cd51
Compare
19c80b7
to
629dfb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you stick a link to the DPC++ PR in the description? (as part of that PR we might want to add an update to https://github.com/intel/llvm/blob/sycl/sycl/doc/design/CommandGraph.md#level-zero) to mention this new code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc change looks fine to me
Adds a new path that uses the zeCommandListImmediateAppendCommandListsExp to submit command-buffers on PVC hardware.
bb8bbe3
to
66c80c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L0 changes lgtm, just a few questions.
@@ -22,6 +22,71 @@ | |||
|
|||
namespace { | |||
|
|||
// Checks whether zeCommandListImmediateAppendCommandListsExp can be used for a | |||
// given Context and Device. | |||
bool checkImmediateAppendSupport(ur_context_handle_t Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
longer-term, is there anything stopping us from deprecating the old non-immediate path? I don't like how the two modes are intermingling together. It makes the code more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One consideration is the supported driver versions, we need to make sure this works with the LTS driver. 30898 is relatively new.
So no, we can't just remove it, at least not yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an internal task to look into this in the future. At the moment it's not possible because it's still an experimental feature. It's also not clear if the implementation of zeCommandListImmediateAppendCommandListsExp has been properly tested on older GPU's and we still need to support those for Graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree having 2 paths is definitely a maintenance burden, and I would very much like us to only have the new path in future. As has been said, in the short term we are blocked by:
- LTS/public driver being recent enough to support this L0 feature
- Verification on older GPUs for correctness/performance regressions.
PrecondEvents.push_back(CommandBuffer->WaitEvent->ZeEvent); | ||
} | ||
|
||
ZE2UR_CALL(zeCommandListAppendBarrier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be a barrier? Can you use zeCommandListAppendWaitOnEvents
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The L0 documentation is not very clear. Could you clarify what are the advantages of zeCommandListAppendWaitOnEvents over a barrier with a nullptr signal event?
Does zeCommandListAppendWaitOnEvents provide the same guarantees as a barrier? This is, does it guarantee that any commands appended after the wait are not executed before the events are signalled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zeCommandListAppendWaitOnEvents over a barrier with a nullptr signal event?
Barrier is coarse-grain and enforces global memory ordering.
Does zeCommandListAppendWaitOnEvents provide the same guarantees as a barrier? This is, does it guarantee that any commands appended after the wait are not executed before the events are signalled
Not for out-of-order cmdlists AFAIK. That is, commands can be reordered past AppendWaitOnEvents
if there's no explicit event synchronization point between these two operations. But you are using events anyway, so that's why I've asked.
@nrspruit ping, in case I got anything wrong.
@@ -139,8 +139,15 @@ Environment Variables | |||
| UR_L0_DISABLE_USM_ALLOCATOR | Controls the use of the USM allocator. | "0": USM allocator is enabled. | "0" | | |||
| | | Any other value: USM allocator is disabled. | | | |||
+---------------------------------------------+--------------------------------------------------------------+--------------------------------------------------------------+------------------+ | |||
|
|||
| UR_L0_CMD_BUFFER_USE_IMMEDIATE_APPEND_PATH | Controls which command-buffer implementation path is used. | "1": the immediate append path will always be enabled as | Unset | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which path is tested will depend on the CI runner. We should be more explicit about it by testing twice - with and without the "immediate append path" enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, the CI drivers are all older than 30898 and I would assume that that will be the case for the foreseeable future. So it will always use the old path.
I think that, once this extension is not experimental anymore, if we are still using 2 paths in the code, we could add more testing to CI. But for now, I don't think there is much point in doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK our PVC runners have a new enough driver (the newest available rolling is 24.39.31294.21
), but fair, we can expand CI later. Please create an issue to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created ticket #2373
Adds a new path that submits command-buffers using zeCommandListImmediateAppendCommandListsExp() instead of zeCommandQueueExecuteCommandLists(). This allows:
Intel/llvm PR: intel/llvm#16096