Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[version_1] IPC: server temporary channel priority loss #354

Conversation

jnpkrn
Copy link
Contributor

@jnpkrn jnpkrn commented Jun 11, 2019

1.x backport of #352.

There's some slight reserve for when bigger PID ranges are in use.
The method to yield the limit on prefix string was derived from
practical experience (rather than based on exact calculations).

Signed-off-by: Jan Pokorný <[email protected]>
Using i7-6820HQ CPU yields these results:

Before: ~2:54
After: ~2:26
Speedup: ~16%

The main optimization lies in how run_function_in_new_process helper is
constructed, since now, there's an actual synchronization between the
parent and its child (that needs to be prioritized here, which is
furthermore help with making the parent immediately give up it's
processor possession) after the fork, so that a subsequent sleep is
completely omitted -- at worst (unlikely), additional sleep round(s)
will need to be undertaken as already arranged for (and now, just
400 ms is waited rather than excessive 1 second).

Another slight optimization is likewise in omission of sleep where
the control gets returned to once the waited for process has been
suceesfully examined post-mortem, without worries it's previous
life is still resounding.

Signed-off-by: Jan Pokorný <[email protected]>
Roles specifications are currently not applied and are rather
a preparation for the actual meaningful use to come.

Signed-off-by: Jan Pokorný <[email protected]>
This way, this core part can be easily reused where needed.
Note that "ready_signaller" similarity with run_ipc_server is not
accidental, following commit will justify it.

Signed-off-by: Jan Pokorný <[email protected]>
Compared to the outer world, libqb brings rather unintuitive approach
to priorities within a native event loop (qbloop.h) -- it doesn't do
an exhaustive high-to-low priorities in a batched (clean-the-level)
manner, but rather linearly adds a possibility to pick the handling
task from the higher priority level as opposed to lower priority ones.

This has the advantage of limiting the chances of starvation and
deadlock opportunities in the incorrectly constructed SW, on the other
hand, it means that libqb is not fulfilling the architected intentions
regarding what deserves a priority truthfully, so these priorities are
worth just a hint rather than urgency-based separation.

And consequently, a discovery of these deadlocks etc. is deferred to
the (as Murphy's laws have it) least convenient moment, e.g., when
said native event loop is exchanged for other (this time priority
trully abiding, like GLib) implementation, while retaining the same
basic notion and high-level handling of priorities on libqb
side, in IPC server (service handling) context.

Hence, demonstration of such a degenerate blocking is not trivial,
and we must defer such other event loop implementation.  After this
hassle, we are rewarded with a practical proof said "high-level
handling [...] in IPC server (service handling) context" contains
a bug (which we are going to subsequently fix) -- this is contrasted
with libqb's native loop implementation that works just fine even
prior that fix.

Signed-off-by: Jan Pokorný <[email protected]>
It turns out that while 7f56f58 allowed for less blocking (thus
throughput increasing) initial handling of connections from clients
within the abstract (out-of-libqb managed) event loop, it unfortunately
subscribes itself back to such polling mechanism for UNIX-socket-check
with a default priority, which can be lower than desired (via explicit
qb_ipcs_request_rate_limit() configuration) for particular channel
(amongst attention-competing siblings in the pool, the term here
refers to associated communication, that is, both server and
on-server abstraction for particular clients).  And priority-based
discrepancies are not forgiven in true priority abiding systems
(that is, unlikele with libqb's native event loop harness as detailed
in the previous commit, for which this would be soft-torelated hence
the problem would not be spotted in the first place -- but that's
expliicitly excluded from further discussion).

On top of that, it violates the natural assumption that once (single
threaded, which is imposed by libqb, at least between initial accept()
and after-said-UNIX-socket-check) server accepts the connection, it
shall rather take care of serving it (at least within stated initial
scope of client connection life cycle) rather than be rushing to accept
new ones -- which is exactly what used to happen previously once the
library user set the effectively priority in the abstract poll
above the default one.

It's conceivable, just as with the former case of attention-competing
siblings with higher priority whereby they could _infinitely_ live on
at the expense of starving the client in the initial handling phase
(authentication) despite the library user's as-high-as-siblings
intention (for using the default priority for that unconditionally
instead, which we address here), the dead lock is imminent also in
this latter accept-to-client-authentication-handling case as well
if there's an _unlimited_ fast-paced arrival queue (well, limited
by with number of allowable open descriptors within the system,
but for the Linux built-in maximum of 1M, there may be no practical
difference, at least for time-sensitive applications).

The only hope then is that such dead-locks are rather theoretical,
since a "spontaneous" constant stream of either communication on
unrelated, higher-prio sibling channels, or of new connection arrivals
can as well testify the poor design of the libqb's IPC application.
That being said, unconditional default priority in the isolated
context of initial server-side client authentication is clearly
a bug, but such application shall apply appropriate rate-limiting
measures (exactly on priority basis) to handle unexpected flux
nonetheless.

The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass.

Signed-off-by: Jan Pokorný <[email protected]>
It's misleading towards a random code observer, at least,
hiding the fact that what failed is actually the queing up
of some handling to perform asynchronously in the future,
rather than invoking it synchronously right away.

Signed-off-by: Jan Pokorný <[email protected]>
Make the qbipcs.h module interdependence clear (also shedding light to
some semantic dependencies) as well.

Signed-off-by: Jan Pokorný <[email protected]>
We want to run every and each test we can, without reliance on
transitive deoendencies and environment "invariants".

Signed-off-by: Jan Pokorný <[email protected]>
@jnpkrn jnpkrn force-pushed the version_1-ipc-server-temporary-channel-priority-loss branch from 5f50866 to c5cb0db Compare June 12, 2019 14:47
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jun 12, 2019

With Fedora build, I've noticed that qb_ipcs_rate_limit needs to be
outside the #ifdef HAVE_GLIB conditional. Will send an update against
master as well.

@jnpkrn jnpkrn merged commit c5cb0db into ClusterLabs:version_1 Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant