-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EC/CUDA: enable memops for executor #691
EC/CUDA: enable memops for executor #691
Conversation
addresses #672 |
8ed25c5
to
b022d11
Compare
If I understand the pr correctly, this pr changes the ec interfaces and breaks ROCm support. Something to keep track of. |
Yes, you are right. I can remove those functions from ROCm component, they are not used anyway, wdyt? |
@Sergei-Lebedev ok, sounds good, ping me when you have that ready and I can run your branch through our internal CI |
if (params->mask & UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES) { | ||
eee->requested_ops = params->task_types; | ||
} else { | ||
/* if no task types provided assume all tasks types required */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "all tasks types"? As far as I understand requested_ops is a essentially a bool indicating if we have requested an op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in next PR I'm going to add different task_types for different collectives e.g. reduce task and copy task for reduce_scatter in TL/CUDA
eee->requested_ops = params->task_types; | ||
} else { | ||
/* if no task types provided assume all tasks types required */ | ||
eee->requested_ops = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES
is used only in ucc_trigger_test
, where we set task_types=0
. This is also the only place where task_types
is assigned some value. Then task_types
is essentially only used at line 34 to assign requested_ops
(to 0).
So, wouldn't it be equivalent to replace the present if/else block by
eee->requested_ops = params->mask & UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES
?
This way we could remove the variable task_types (and then maybe rename UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES
to something like
UCC_EE_EXECUTOR_PARAM_FIELD_REQUESTED_OP
)
equivalently, we could get rid of UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES
and keep only task_types
.
what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it's currently just placeholder for next PR, I can remove it until then
@@ -115,6 +97,7 @@ typedef struct ucc_ec_cuda_executor_task_ops { | |||
typedef struct ucc_ec_cuda_executor { | |||
ucc_ee_executor_t super; | |||
ucc_ec_cuda_executor_mode_t mode; | |||
uint64_t requested_ops; | |||
ucc_ec_cuda_executor_task_ops_t ops; | |||
ucc_spinlock_t tasks_lock; | |||
ucc_ec_cuda_executor_state_t state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but shouldn't state
need to be allocated with cudaHostAlloc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's allocated with cudaHostAlloc see ucc_ec_cuda_ee_executor_mpool_ops
b022d11
to
df3ef72
Compare
@edgargabriel i removed unused function from EC/ROCM, please test |
@Sergei-Lebedev thank you, I can confirm that things still work and look good. Thanks! |
df3ef72
to
40a1c79
Compare
* EC/CUDA: enable memops for executor * EC/ROCM: remove unused functions
What
Enable wait kernel and memops when executor is used only for stream sync