Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new API] two-phased qb_ipcs_create required #325

Open
jnpkrn opened this issue Sep 26, 2018 · 8 comments
Open

[new API] two-phased qb_ipcs_create required #325

jnpkrn opened this issue Sep 26, 2018 · 8 comments
Labels

Comments

@jnpkrn
Copy link
Contributor

jnpkrn commented Sep 26, 2018

Signatures

/* Creates a listening UNIX socket, akin to qb_ipcs_create */
qb_ipcs_listener_t* qb_ipcs_listener(const char *name);

/* Gets socket descriptor from listener;  to prevent further
   use (the same information is now at two places at minimum),
   deallocates l, so _it must not be reused elsewhere from that
   point on_ */
int qb_ipcs_listener_fd(const qb_ipcs_listener_t *l);

/* Wraps a listening UNIX socket, presumably created with
   qb_ipcs_listener and subsequently extracted with qb_ipcs_listener_fd
   (e.g. the name gets validated), extracting "name" using
   getsockname(2);  _fd must not be reused elsewhere from that point on_ */
qb_ipcs_listener_t* qb_ipcs_listener_from_fd(int fd);

/* Reuses existing UNIX socket wrapped in a listener object;
   qb_ipcs_listener + qb_ipcs_create_from_listener equals qb_ipcs_create;
   to prevent further use, deallocates l, so _it must not be reused
   elsewhere from that point on_ */
qb_ipcs_service_t* qb_ipcs_create_from_listener(qb_ipcs_listener_t *l,
                                                int32_t service_id,
                                                enum qb_ipc_type type,
                                                struct qb_ipcs_service_handlers *handlers);

Allow safe nesting wrt. NULL checking, e.g.,

qb_ipcs_create_from_listener(qb_ipcs_listener_from_fd(qb_ipcs_listener_fd(qb_ipcs_listener("foo"))),
                             42, QB_IPC_SHM, handlers);

The qb_ipcs_listener_t abstraction is meant to allow for (and enforce)
extra checking on libqb side (so that the same internal coherency as with
monolithic qb_ipcs_create is achieved), as that type is only used as
an opaque pointer in the client program.

@jnpkrn jnpkrn added the RFE label Sep 26, 2018
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Sep 26, 2018

Note that this would also (finally, one wants to say) offer client
programs to unset close-on-exec flag for cases it would be counter
productive (what if pre-fork parent establishes the listening socket
only to be picked, with qb_ipcs_listener_from_fd, by an after-fork
child with a brand new process image that was just exec'd?
btw. this looks exactly like a sequence every forking daemon
utilizing libqb's IPC subsystem should exercise unless parent-child
synchronization is in place -- it's highly undesirable for daemon
launcher process to indicate successful startup when the forked off
executive part fail on already occupied IPC channel the moment later!).

I'm currently split whether it should be a QB branded flag that can
be passed to qb_ipcs_listener() for convenience or whether authors
of client programs should be kept on their own with this.

Nonetheless, it must be documented that normally, this socket is
preconfigured with O_NONBLOCK and FD_CLOEXEC flags (respective to
particular fcntl command). And that only FD_CLOEXEC can reasonably
be mangled with, everything else asks for conflicting with intended
state of affairs on libqb side (but then, qb_ipcs_listener_from_fd()
can check or reset some vital flags on its own).

@jfriesse
Copy link
Member

API looks sane, but what is the use case?

Also I was unable to understand following part:

btw. this looks exactly like a sequence every forking daemon
utilizing libqb's IPC subsystem should exercise unless parent-child
synchronization is in place -- it's highly undesirable for daemon
launcher process to indicate successful startup when the forked off
executive part fail on already occupied IPC channel the moment later!).

If it is about daemonization process, then why should daemon do the exec after fork?

I can imagine situation with child processes, but sharing fd is just asking for a big troubles.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Sep 27, 2018 via email

@jfriesse
Copy link
Member

jfriesse commented Oct 1, 2018

On 27/09/18 00:18 -0700, Jan Friesse wrote: If it is about daemonization process, then why should daemon do the exec after fork?
In case it's a hierarchical daemon of daemons (e.g. pacemaker).

So it's not about daemonization.

I can imagine situation with child processes, but sharing fd is just asking for a big troubles.
Why? This is the same mechanism that, e.g., systemd uses to share the notification socket into services accommodating that signaling scheme.

Because current libqb IPC is not prepared for concurrent access.

Could you please explain the intended use case?


-- Jan (Poki)

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 3, 2018 via email

@jfriesse
Copy link
Member

jfriesse commented Oct 3, 2018

Ok, now it makes a little more sense. I have doubts about the solution really solves any real problem (especially when daemon is systemd enabled and notifies its state), but as long as old api is kept, then why not.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 3, 2018

Yes, it indeed does help with real use cases, details forthcoming.
This is not a hypothetical exercise if you think it is.

jnpkrn added a commit to jnpkrn/pacemaker that referenced this issue Apr 17, 2019
[2/4: pacemakerd to trust pre-existing processes via new checks instead]

In pacemakerd in the context of entrusting pre-existing processes,
we now resort to procfs-based solution only in boundary, fouled cases,
and primarily examine the credentials of the processes already
occupying known IPC end-points before adopting them.

The commit applies the new helpers from 1/1 so as to close the both
related sensitive problems, CVE-2018-16877 and CVE-2018-16878, in
a unified manner, this time limited to the main daemon of pacemaker
(pacemakerd).

To be noted that it is clearly not 100% for this purpose for still
allowing for TOCTTOU, but that's what commit (3/3) is meant to solve
for the most part, plus there may be optimizations solving this concern
as a side effect, but that requires an active assistance on the libqb
side (ClusterLabs/libqb#325) since any
improvement on pacemaker side in isolation would be very
cumbersome if generally possible at all, but either way
means a new, soft compatibility encumberment.

As a follow-up to what was put in preceding 1/3 commit, PID of 1 tracked
as child's identification on FreeBSD (or when socket-based activation is
used with systemd) is treated specially, incl. special precaution with
child's PID discovered as 1 elsewhere.

v2: courtesy of Yan Gao of SUSE for early discovery and report for
    what's primarily solved with 4/4 commit, in extension, child
    daemons in the initialization phase coinciding with IPC-feasibility
    based process scan in pacemakerd in a way that those are missed
    (although they are to come up fully just moments later only
    to interfere with naturally spawned ones) are now considered so
    that if any native children later fail for said clash, the
    pre-existing counterpart may get adopted instead of ending up
    with repeated spawn-bury loop ad nauseam without real progress
    (note that PCMK_fail_fast=true could possibly help, but that's
    rather a big hammer not suitable for all the use cases, not
    the ones we try to deal with gracefully here)
jnpkrn added a commit to jnpkrn/pacemaker that referenced this issue Apr 17, 2019
[2/4: pacemakerd to trust pre-existing processes via new checks instead]

In pacemakerd in the context of entrusting pre-existing processes,
we now resort to procfs-based solution only in boundary, fouled cases,
and primarily examine the credentials of the processes already
occupying known IPC end-points before adopting them.

The commit applies the new helpers from 1/1 so as to close the both
related sensitive problems, CVE-2018-16877 and CVE-2018-16878, in
a unified manner, this time limited to the main daemon of pacemaker
(pacemakerd).

To be noted that it is clearly not 100% for this purpose for still
allowing for TOCTTOU, but that's what commit (3/3) is meant to solve
for the most part, plus there may be optimizations solving this concern
as a side effect, but that requires an active assistance on the libqb
side (ClusterLabs/libqb#325) since any
improvement on pacemaker side in isolation would be very
cumbersome if generally possible at all, but either way
means a new, soft compatibility encumberment.

As a follow-up to what was put in preceding 1/3 commit, PID of 1 tracked
as child's identification on FreeBSD (or when socket-based activation is
used with systemd) is treated specially, incl. special precaution with
child's PID discovered as 1 elsewhere.

v2: courtesy of Yan Gao of SUSE for early discovery and report for
    what's primarily solved with 4/4 commit, in extension, child
    daemons in the initialization phase coinciding with IPC-feasibility
    based process scan in pacemakerd in a way that those are missed
    (although they are to come up fully just moments later only
    to interfere with naturally spawned ones) are now considered so
    that if any native children later fail for said clash, the
    pre-existing counterpart may get adopted instead of ending up
    with repeated spawn-bury loop ad nauseam without real progress
    (note that PCMK_fail_fast=true could possibly help, but that's
    rather a big hammer not suitable for all the use cases, not
    the ones we try to deal with gracefully here)
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jun 18, 2019

Have discovered that ability to force non-abstract sockets where they'd
otherwise get used (#248) -- likely together with pre-existing cases
where that had been already the case -- will break client programs that
run as non-root (making me wonder if that change was ever tested in
cluster stack context).

The main problem is that /var/run is intentionally not world-writable,
meaning that bind(3) triggered with unprivileged process so as to lay
the socket there will effectively fail with EACCES or similar
-- that's across the board of the OSes, all-permissive arrangement
yet to be spotted.

For pacemaker in particular, the solution is as "simple" as adopting
the solution per this request, as envisioned for other reasons already.
The master daemon of pacemaker (running as root, naturally) can then
get such sockets set up, especially for its non-root children, meaning
this particular problem would get solved along.

Any timeframe we can expect this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants