Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: Implement the rma interface #10553

Merged
merged 3 commits into from
Nov 23, 2024
Merged

prov/efa: Implement the rma interface #10553

merged 3 commits into from
Nov 23, 2024

Conversation

jiaxiyan
Copy link
Contributor

Rename efa_dgram_rma.c to efa_rma.c and move it to prov/efa/src as a common RMA interface for both rdm and dgram ep type. Implement rdma write and inject.
Support inline rdma write.

@jiaxiyan jiaxiyan requested a review from a team November 18, 2024 18:58
prov/efa/src/efa_base_ep.c Outdated Show resolved Hide resolved
if (err)
return err;

return efa_rma_post_write(base_ep, msg, flags, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

application can call fi_write to the self ep, so you cannot naively pass a false here. You need to check whether the target is self in the call.

*/
static inline ssize_t efa_rma_post_write(struct efa_base_ep *base_ep,
const struct fi_msg_rma *msg,
uint64_t flags, bool self_comm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think self_comm needs to be defined within this function.

}

if (msg->addr == FI_ADDR_NOTAVAIL) {
self_comm = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we used to allow a FI_ADDR_NOTAVAIL to support local read. But per libfabric standard we shouldn't allow a NOTAVAIL addr for a user call. So in that regard I think we should drop the L244-246.

self_comm = true;
} else {
conn = efa_av_addr_to_conn(base_ep->av, msg->addr);
self_comm = !(conn && conn->ep_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? self_comm should be true when the conn->ep_addr is identical to the local base_ep's addr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to https://github.com/ofiwg/libfabric/blob/main/prov/efa/src/rdm/efa_rdm_pke.c#L561 where
self_comm is true when peer is null and efa_rdm_ep_get_peer returns NULL when av_entry->conn.ep_addr is NULL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think the pke write function was copied from the pke_read function, which needed to handle the local read path in the rdm code base where the peer is set as NULL.

I think for efa-raw, we don't need to handle this case. We should always require a valid conn

prov/efa/src/efa_rma.c Outdated Show resolved Hide resolved
…m_ep

These fields are not being used by dgram_ep any more.

Signed-off-by: Jessie Yang <[email protected]>
prov/efa/src/efa_msg.c Outdated Show resolved Hide resolved
prov/efa/src/efa_base_ep.h Show resolved Hide resolved
ibv_wr_start(qp->ibv_qp_ex);
base_ep->is_wr_started = true;
}
qp->ibv_qp_ex->wr_id = (uintptr_t)msg->context;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a future change will be like we will enforce FI_CONTEXT, and if flags doesn't have FI_COMPLETION, we will ignore context and make wr_id as NULL. Then in the cq read, we will treat NULL wr_id as the indicator of no-completion. This can be a future PR though.

Copy link
Contributor

@shijin-aws shijin-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the commit message of 0ae84a4. You need to explain why using the flags of util_ep is correct, i.e. in which case FI_COMPLETION should be in the flags?

Need to use the flags of util_ep for CQ, which includes FI_COMPLETION
flag for FI_TRANSMIT and FI_RECV unless FI_SELECTIVE_COMPLETION is set.

Signed-off-by: Jessie Yang <[email protected]>
return err;

EFA_SETUP_IOV(iov, buf, len);
return efa_rma_writev(ep_fid, &iov, &desc, 1, dest_addr, addr, key, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this earlier ... Can we call efa_rma_post_write directly here.

return err;

EFA_SETUP_IOV(iov, buf, len);
return efa_rma_readv(ep_fid, &iov, &desc, 1, src_addr, addr, key, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment, can we call efa_rma_post_read directly by setting up the MSG

Rename efa_dgram_rma.c to efa_rma.c and move it to prov/efa/src as
a common RMA interface for both rdm and dgram ep type.
Update that dgram does not support rma.
Implement rdma write and inject. Support inline rdma write.

Signed-off-by: Jessie Yang <[email protected]>
@j-xiong j-xiong merged commit f12f5ea into ofiwg:main Nov 23, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants