Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Command-Buffers] Implement new command-list enqueue path #1975

Merged
merged 2 commits into from
Nov 21, 2024

Conversation

fabiomestre
Copy link
Contributor

@fabiomestre fabiomestre commented Aug 14, 2024

Adds a new path that submits command-buffers using zeCommandListImmediateAppendCommandListsExp() instead of zeCommandQueueExecuteCommandLists(). This allows:

  • Waiting for command-buffer submission dependencies without having to create a separate command-list.
  • Since the WaitEvent is no longer needed in this path, counter-based events can be used for helper events which removes the need to create an event reset command-list.
  • When the command-buffer is not in-order, the new path also moves the reset event command-list to execute after the command-buffer execution.

Intel/llvm PR: intel/llvm#16096

@github-actions github-actions bot added level-zero L0 adapter specific issues command-buffer Command Buffer feature addition/changes/specification labels Aug 14, 2024
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
@fabiomestre fabiomestre force-pushed the fabio/immediate_append_exp branch 3 times, most recently from 6c0b66c to a27ba98 Compare October 14, 2024 18:07
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.hpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added the specification Changes or additions to the specification label Nov 14, 2024
@fabiomestre fabiomestre force-pushed the fabio/immediate_append_exp branch 2 times, most recently from 19c80b7 to 629dfb7 Compare November 14, 2024 19:15
@fabiomestre fabiomestre marked this pull request as ready for review November 14, 2024 19:15
@fabiomestre fabiomestre requested review from a team as code owners November 14, 2024 19:15
@fabiomestre fabiomestre changed the title Prototype new command-list submission entrypoint Implement new command-list enqueue path Nov 14, 2024
@fabiomestre fabiomestre changed the title Implement new command-list enqueue path [Command-Buffers] Implement new command-list enqueue path Nov 14, 2024
Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you stick a link to the DPC++ PR in the description? (as part of that PR we might want to add an update to https://github.com/intel/llvm/blob/sycl/sycl/doc/design/CommandGraph.md#level-zero) to mention this new code path

source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
source/adapters/level_zero/command_buffer.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc change looks fine to me

Adds a new path that uses the
zeCommandListImmediateAppendCommandListsExp to submit
command-buffers on PVC hardware.
@pbalcer pbalcer self-requested a review November 18, 2024 15:31
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L0 changes lgtm, just a few questions.

@@ -22,6 +22,71 @@

namespace {

// Checks whether zeCommandListImmediateAppendCommandListsExp can be used for a
// given Context and Device.
bool checkImmediateAppendSupport(ur_context_handle_t Context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

longer-term, is there anything stopping us from deprecating the old non-immediate path? I don't like how the two modes are intermingling together. It makes the code more complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EwanC , @reble ,

  • is @pbalcer right that old non-immediate path is not needed any more?
  • if so, can we remove non-immediate path right now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consideration is the supported driver versions, we need to make sure this works with the LTS driver. 30898 is relatively new.
So no, we can't just remove it, at least not yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an internal task to look into this in the future. At the moment it's not possible because it's still an experimental feature. It's also not clear if the implementation of zeCommandListImmediateAppendCommandListsExp has been properly tested on older GPU's and we still need to support those for Graphs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree having 2 paths is definitely a maintenance burden, and I would very much like us to only have the new path in future. As has been said, in the short term we are blocked by:

  1. LTS/public driver being recent enough to support this L0 feature
  2. Verification on older GPUs for correctness/performance regressions.

PrecondEvents.push_back(CommandBuffer->WaitEvent->ZeEvent);
}

ZE2UR_CALL(zeCommandListAppendBarrier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a barrier? Can you use zeCommandListAppendWaitOnEvents instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The L0 documentation is not very clear. Could you clarify what are the advantages of zeCommandListAppendWaitOnEvents over a barrier with a nullptr signal event?

Does zeCommandListAppendWaitOnEvents provide the same guarantees as a barrier? This is, does it guarantee that any commands appended after the wait are not executed before the events are signalled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zeCommandListAppendWaitOnEvents over a barrier with a nullptr signal event?

Barrier is coarse-grain and enforces global memory ordering.

Does zeCommandListAppendWaitOnEvents provide the same guarantees as a barrier? This is, does it guarantee that any commands appended after the wait are not executed before the events are signalled

Not for out-of-order cmdlists AFAIK. That is, commands can be reordered past AppendWaitOnEvents if there's no explicit event synchronization point between these two operations. But you are using events anyway, so that's why I've asked.

@nrspruit ping, in case I got anything wrong.

@@ -139,8 +139,15 @@ Environment Variables
| UR_L0_DISABLE_USM_ALLOCATOR | Controls the use of the USM allocator. | "0": USM allocator is enabled. | "0" |
| | | Any other value: USM allocator is disabled. | |
+---------------------------------------------+--------------------------------------------------------------+--------------------------------------------------------------+------------------+

| UR_L0_CMD_BUFFER_USE_IMMEDIATE_APPEND_PATH | Controls which command-buffer implementation path is used. | "1": the immediate append path will always be enabled as | Unset |
Copy link
Contributor

@pbalcer pbalcer Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which path is tested will depend on the CI runner. We should be more explicit about it by testing twice - with and without the "immediate append path" enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the CI drivers are all older than 30898 and I would assume that that will be the case for the foreseeable future. So it will always use the old path.

I think that, once this extension is not experimental anymore, if we are still using 2 paths in the code, we could add more testing to CI. But for now, I don't think there is much point in doing so.

Copy link
Contributor

@pbalcer pbalcer Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK our PVC runners have a new enough driver (the newest available rolling is 24.39.31294.21), but fair, we can expand CI later. Please create an issue to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created ticket #2373

@pbalcer pbalcer added the ready to merge Added to PR's which are ready to merge label Nov 21, 2024
@pbalcer pbalcer merged commit 20e501a into oneapi-src:main Nov 21, 2024
74 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command-buffer Command Buffer feature addition/changes/specification level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge specification Changes or additions to the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants