Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Document new command-list enqueue path #16096

Merged
merged 8 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ if(SYCL_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
set(UNIFIED_RUNTIME_REPO "https://github.com/Bensuo/unified-runtime.git")
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules/UnifiedRuntimeTag.cmake)

set(UMF_BUILD_EXAMPLES OFF CACHE INTERNAL "EXAMPLES")
Expand Down
2 changes: 1 addition & 1 deletion sycl/cmake/modules/UnifiedRuntimeTag.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
# Date: Thu Nov 14 14:38:05 2024 +0100
# Merge pull request #2253 from pbalcer/low-power-events
# add low-power events experimental extension spec
set(UNIFIED_RUNTIME_TAG 3a5b23c8b475712f9107c1d5ab41f27a1465578e)
set(UNIFIED_RUNTIME_TAG 66c80c9c639cf149de0aac911be875f9bc1fcd30)
62 changes: 59 additions & 3 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,62 @@ Backends which are implemented currently are: [Level Zero](#level-zero),

### Level Zero

The command-buffer implementation for the level-zero adapter has 2 different
implementation paths which are chosen depending on the device and level-zero
version:

- Immediate Append path - Relies on
[zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
fabiomestre marked this conversation as resolved.
Show resolved Hide resolved
to submit the command-buffer. This function is an experimental extension to the level-zero API.
- Wait event path - Relies on
[zeCommandQueueExecuteCommandLists](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandqueueexecutecommandlists)
to submit the command-buffer work. However, this level-zero function has
limitations and, as such, this path is used only when the immediate append
path is unavailable.

#### Immediate Append Path Implementation Details

This path is only available when the device supports immediate command-lists
and the [zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
API. This API can wait on a list of event dependencies using the `phWaitEvents`
parameter and can signal a return event when finished using the `hSignalEvent`
parameter. This allows for a cleaner and more efficient implementation than
what can be achieved when using the wait-event path
(see [this section](#wait-event-path-implementation-details-) for
more details about the wait-event path).

This path relies on 3 different command-lists in order to execute the
command-buffer:

- `ComputeCommandList` - Used to submit command-buffer work that requires
the compute engine.
- `CopyCommandList` - Used to submit command-buffer work that requires the
[copy engine](#copy-engine). This command-list is not created when none of the
nodes require the copy engine.
- `EventResetCommandList` - Used to reset the level-zero events that are
needed for every submission of the command-buffer. This is executed after
the compute and copy command-lists have finished executing. For the first
execution, this command-list is skipped since there is no need to reset events
at this point. When counter-based events are enabled (i.e. the command-buffer
is in-order), this command-list is not created since counter-based events do
not need to be reset.

The following diagram illustrates which commands are executed on
each command-list when the command-buffer is enqueued:
![L0 command-buffer diagram](images/diagram_immediate_append.png)

Additionally,
[zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
requires an extra command-list which is used to submit the other
command-lists. This command-list has a specific engine type
associated to it (i.e. compute or copy engine). Hence, for our implementation,
we need 2 of these helper command-lists:
- The `CommandListHelper` command-list is used to submit the
`ComputeCommandList`, `CommandListResetEvents` and profiling queries.
- The `ZeCopyEngineImmediateListHelper` command-list is used to submit the
`CopyCommandList`

#### Wait event Path Implementation Details
The UR `urCommandBufferEnqueueExp` interface for submitting a command-buffer
takes a list of events to wait on, and returns an event representing the
completion of that specific submission of the command-buffer.
Expand Down Expand Up @@ -364,7 +420,7 @@ is made only once (during the command-buffer finalization stage). This allows
the adapter to save time when submitting the command-buffer, by executing only
this command-list (i.e. without enqueuing any commands of the graph workload).

#### Prefix
##### Prefix

The prefix's commands aim to:
1. Handle the list of events to wait on, which is passed by the runtime
Expand Down Expand Up @@ -409,7 +465,7 @@ and another reset command for resetting the signal we use to signal the
completion of the graph workload. This signal is called *SignalEvent* and is
defined in the `ur_exp_command_buffer_handle_t` class.

#### Suffix
##### Suffix

The suffix's commands aim to:
1) Handle the completion of the graph workload and signal a UR return event.
Expand All @@ -435,7 +491,7 @@ with extra commands associated with *CB*, and the other after *CB*. These new
command-lists are retrieved from the UR queue, which will likely reuse existing
command-lists and only create a new one in the worst case.

#### Drawbacks
##### Drawbacks

There are three drawbacks of this approach to implementing UR command-buffers for
Level Zero:
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.