-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/efa: Implement the rma interface #10553
Conversation
prov/efa/src/efa_rma.c
Outdated
if (err) | ||
return err; | ||
|
||
return efa_rma_post_write(base_ep, msg, flags, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application can call fi_write to the self ep, so you cannot naively pass a false
here. You need to check whether the target is self in the call.
prov/efa/src/efa_rma.c
Outdated
*/ | ||
static inline ssize_t efa_rma_post_write(struct efa_base_ep *base_ep, | ||
const struct fi_msg_rma *msg, | ||
uint64_t flags, bool self_comm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think self_comm needs to be defined within this function.
02de00e
to
de460ea
Compare
prov/efa/src/efa_rma.c
Outdated
} | ||
|
||
if (msg->addr == FI_ADDR_NOTAVAIL) { | ||
self_comm = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we used to allow a FI_ADDR_NOTAVAIL
to support local read. But per libfabric standard we shouldn't allow a NOTAVAIL addr for a user call. So in that regard I think we should drop the L244-246.
prov/efa/src/efa_rma.c
Outdated
self_comm = true; | ||
} else { | ||
conn = efa_av_addr_to_conn(base_ep->av, msg->addr); | ||
self_comm = !(conn && conn->ep_addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? self_comm
should be true when the conn->ep_addr
is identical to the local base_ep's addr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to https://github.com/ofiwg/libfabric/blob/main/prov/efa/src/rdm/efa_rdm_pke.c#L561 where
self_comm is true when peer is null and efa_rdm_ep_get_peer returns NULL when av_entry->conn.ep_addr is NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think the pke write function was copied from the pke_read function, which needed to handle the local read path in the rdm code base where the peer is set as NULL.
I think for efa-raw, we don't need to handle this case. We should always require a valid conn
…m_ep These fields are not being used by dgram_ep any more. Signed-off-by: Jessie Yang <[email protected]>
a269444
to
42d918f
Compare
ibv_wr_start(qp->ibv_qp_ex); | ||
base_ep->is_wr_started = true; | ||
} | ||
qp->ibv_qp_ex->wr_id = (uintptr_t)msg->context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a future change will be like we will enforce FI_CONTEXT
, and if flags doesn't have FI_COMPLETION
, we will ignore context
and make wr_id
as NULL. Then in the cq read, we will treat NULL wr_id as the indicator of no-completion. This can be a future PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the commit message of 0ae84a4. You need to explain why using the flags of util_ep is correct, i.e. in which case FI_COMPLETION
should be in the flags?
Need to use the flags of util_ep for CQ, which includes FI_COMPLETION flag for FI_TRANSMIT and FI_RECV unless FI_SELECTIVE_COMPLETION is set. Signed-off-by: Jessie Yang <[email protected]>
prov/efa/src/efa_rma.c
Outdated
return err; | ||
|
||
EFA_SETUP_IOV(iov, buf, len); | ||
return efa_rma_writev(ep_fid, &iov, &desc, 1, dest_addr, addr, key, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this earlier ... Can we call efa_rma_post_write directly here.
prov/efa/src/efa_rma.c
Outdated
return err; | ||
|
||
EFA_SETUP_IOV(iov, buf, len); | ||
return efa_rma_readv(ep_fid, &iov, &desc, 1, src_addr, addr, key, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment, can we call efa_rma_post_read directly by setting up the MSG
Rename efa_dgram_rma.c to efa_rma.c and move it to prov/efa/src as a common RMA interface for both rdm and dgram ep type. Update that dgram does not support rma. Implement rdma write and inject. Support inline rdma write. Signed-off-by: Jessie Yang <[email protected]>
Rename efa_dgram_rma.c to efa_rma.c and move it to prov/efa/src as a common RMA interface for both rdm and dgram ep type. Implement rdma write and inject.
Support inline rdma write.