[version_1] IPC: server temporary channel priority loss #354

jnpkrn · 2019-06-11T14:26:41Z

1.x backport of #352.

There's some slight reserve for when bigger PID ranges are in use. The method to yield the limit on prefix string was derived from practical experience (rather than based on exact calculations). Signed-off-by: Jan Pokorný <[email protected]>

Using i7-6820HQ CPU yields these results: Before: ~2:54 After: ~2:26 Speedup: ~16% The main optimization lies in how run_function_in_new_process helper is constructed, since now, there's an actual synchronization between the parent and its child (that needs to be prioritized here, which is furthermore help with making the parent immediately give up it's processor possession) after the fork, so that a subsequent sleep is completely omitted -- at worst (unlikely), additional sleep round(s) will need to be undertaken as already arranged for (and now, just 400 ms is waited rather than excessive 1 second). Another slight optimization is likewise in omission of sleep where the control gets returned to once the waited for process has been suceesfully examined post-mortem, without worries it's previous life is still resounding. Signed-off-by: Jan Pokorný <[email protected]>

Roles specifications are currently not applied and are rather a preparation for the actual meaningful use to come. Signed-off-by: Jan Pokorný <[email protected]>

This way, this core part can be easily reused where needed. Note that "ready_signaller" similarity with run_ipc_server is not accidental, following commit will justify it. Signed-off-by: Jan Pokorný <[email protected]>

Compared to the outer world, libqb brings rather unintuitive approach to priorities within a native event loop (qbloop.h) -- it doesn't do an exhaustive high-to-low priorities in a batched (clean-the-level) manner, but rather linearly adds a possibility to pick the handling task from the higher priority level as opposed to lower priority ones. This has the advantage of limiting the chances of starvation and deadlock opportunities in the incorrectly constructed SW, on the other hand, it means that libqb is not fulfilling the architected intentions regarding what deserves a priority truthfully, so these priorities are worth just a hint rather than urgency-based separation. And consequently, a discovery of these deadlocks etc. is deferred to the (as Murphy's laws have it) least convenient moment, e.g., when said native event loop is exchanged for other (this time priority trully abiding, like GLib) implementation, while retaining the same basic notion and high-level handling of priorities on libqb side, in IPC server (service handling) context. Hence, demonstration of such a degenerate blocking is not trivial, and we must defer such other event loop implementation. After this hassle, we are rewarded with a practical proof said "high-level handling [...] in IPC server (service handling) context" contains a bug (which we are going to subsequently fix) -- this is contrasted with libqb's native loop implementation that works just fine even prior that fix. Signed-off-by: Jan Pokorný <[email protected]>

It turns out that while 7f56f58 allowed for less blocking (thus throughput increasing) initial handling of connections from clients within the abstract (out-of-libqb managed) event loop, it unfortunately subscribes itself back to such polling mechanism for UNIX-socket-check with a default priority, which can be lower than desired (via explicit qb_ipcs_request_rate_limit() configuration) for particular channel (amongst attention-competing siblings in the pool, the term here refers to associated communication, that is, both server and on-server abstraction for particular clients). And priority-based discrepancies are not forgiven in true priority abiding systems (that is, unlikele with libqb's native event loop harness as detailed in the previous commit, for which this would be soft-torelated hence the problem would not be spotted in the first place -- but that's expliicitly excluded from further discussion). On top of that, it violates the natural assumption that once (single threaded, which is imposed by libqb, at least between initial accept() and after-said-UNIX-socket-check) server accepts the connection, it shall rather take care of serving it (at least within stated initial scope of client connection life cycle) rather than be rushing to accept new ones -- which is exactly what used to happen previously once the library user set the effectively priority in the abstract poll above the default one. It's conceivable, just as with the former case of attention-competing siblings with higher priority whereby they could _infinitely_ live on at the expense of starving the client in the initial handling phase (authentication) despite the library user's as-high-as-siblings intention (for using the default priority for that unconditionally instead, which we address here), the dead lock is imminent also in this latter accept-to-client-authentication-handling case as well if there's an _unlimited_ fast-paced arrival queue (well, limited by with number of allowable open descriptors within the system, but for the Linux built-in maximum of 1M, there may be no practical difference, at least for time-sensitive applications). The only hope then is that such dead-locks are rather theoretical, since a "spontaneous" constant stream of either communication on unrelated, higher-prio sibling channels, or of new connection arrivals can as well testify the poor design of the libqb's IPC application. That being said, unconditional default priority in the isolated context of initial server-side client authentication is clearly a bug, but such application shall apply appropriate rate-limiting measures (exactly on priority basis) to handle unexpected flux nonetheless. The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass. Signed-off-by: Jan Pokorný <[email protected]>

It's misleading towards a random code observer, at least, hiding the fact that what failed is actually the queing up of some handling to perform asynchronously in the future, rather than invoking it synchronously right away. Signed-off-by: Jan Pokorný <[email protected]>

Make the qbipcs.h module interdependence clear (also shedding light to some semantic dependencies) as well. Signed-off-by: Jan Pokorný <[email protected]>

We want to run every and each test we can, without reliance on transitive deoendencies and environment "invariants". Signed-off-by: Jan Pokorný <[email protected]>

jnpkrn · 2019-06-12T14:50:14Z

With Fedora build, I've noticed that qb_ipcs_rate_limit needs to be
outside the #ifdef HAVE_GLIB conditional. Will send an update against
master as well.

jnpkrn added 9 commits June 12, 2019 16:27

tests: ipc: allow for easier tests debugging by discerning PIDs/roles

248010a

Roles specifications are currently not applied and are rather a preparation for the actual meaningful use to come. Signed-off-by: Jan Pokorný <[email protected]>

doc: qbloop.h: document pros/cons of using built-in event loop impl

e2d5be4

Make the qbipcs.h module interdependence clear (also shedding light to some semantic dependencies) as well. Signed-off-by: Jan Pokorný <[email protected]>

CI: travis: add (redundant for now, but...) libglib2.0-dev prerequisite

c5cb0db

We want to run every and each test we can, without reliance on transitive deoendencies and environment "invariants". Signed-off-by: Jan Pokorný <[email protected]>

jnpkrn force-pushed the version_1-ipc-server-temporary-channel-priority-loss branch from 5f50866 to c5cb0db Compare June 12, 2019 14:47

jnpkrn merged commit c5cb0db into ClusterLabs:version_1 Jun 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[version_1] IPC: server temporary channel priority loss #354

[version_1] IPC: server temporary channel priority loss #354

jnpkrn commented Jun 11, 2019

jnpkrn commented Jun 12, 2019

[version_1] IPC: server temporary channel priority loss #354

[version_1] IPC: server temporary channel priority loss #354

Conversation

jnpkrn commented Jun 11, 2019

jnpkrn commented Jun 12, 2019