Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC/CUDA: enable memops for executor #691

Merged
merged 2 commits into from
Jan 12, 2023

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Enable wait kernel and memops when executor is used only for stream sync

@Sergei-Lebedev
Copy link
Contributor Author

addresses #672

@edgargabriel
Copy link
Contributor

If I understand the pr correctly, this pr changes the ec interfaces and breaks ROCm support. Something to keep track of.

@Sergei-Lebedev
Copy link
Contributor Author

If I understand the pr correctly, this pr changes the ec interfaces and breaks ROCm support. Something to keep track of.

Yes, you are right. I can remove those functions from ROCm component, they are not used anyway, wdyt?

@edgargabriel
Copy link
Contributor

@Sergei-Lebedev ok, sounds good, ping me when you have that ready and I can run your branch through our internal CI

if (params->mask & UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES) {
eee->requested_ops = params->task_types;
} else {
/* if no task types provided assume all tasks types required */
Copy link
Collaborator

@samnordmann samnordmann Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "all tasks types"? As far as I understand requested_ops is a essentially a bool indicating if we have requested an op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in next PR I'm going to add different task_types for different collectives e.g. reduce task and copy task for reduce_scatter in TL/CUDA

eee->requested_ops = params->task_types;
} else {
/* if no task types provided assume all tasks types required */
eee->requested_ops = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES is used only in ucc_trigger_test, where we set task_types=0. This is also the only place where task_types is assigned some value. Then task_types is essentially only used at line 34 to assign requested_ops (to 0).

So, wouldn't it be equivalent to replace the present if/else block by

eee->requested_ops = params->mask & UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES

?
This way we could remove the variable task_types (and then maybe rename UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES to something like
UCC_EE_EXECUTOR_PARAM_FIELD_REQUESTED_OP)

equivalently, we could get rid of UCC_EE_EXECUTOR_PARAM_FIELD_TASK_TYPES and keep only task_types.

what do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it's currently just placeholder for next PR, I can remove it until then

@@ -115,6 +97,7 @@ typedef struct ucc_ec_cuda_executor_task_ops {
typedef struct ucc_ec_cuda_executor {
ucc_ee_executor_t super;
ucc_ec_cuda_executor_mode_t mode;
uint64_t requested_ops;
ucc_ec_cuda_executor_task_ops_t ops;
ucc_spinlock_t tasks_lock;
ucc_ec_cuda_executor_state_t state;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but shouldn't state need to be allocated with cudaHostAlloc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's allocated with cudaHostAlloc see ucc_ec_cuda_ee_executor_mpool_ops

@Sergei-Lebedev
Copy link
Contributor Author

@edgargabriel i removed unused function from EC/ROCM, please test

@edgargabriel
Copy link
Contributor

edgargabriel commented Dec 16, 2022

@Sergei-Lebedev thank you, I can confirm that things still work and look good. Thanks!

@Sergei-Lebedev Sergei-Lebedev enabled auto-merge (squash) January 12, 2023 07:44
@Sergei-Lebedev Sergei-Lebedev merged commit 18ffb1a into openucx:master Jan 12, 2023
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
* EC/CUDA: enable memops for executor

* EC/ROCM: remove unused functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants