-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[version_1] IPC: server temporary channel priority loss #354
Merged
jnpkrn
merged 9 commits into
ClusterLabs:version_1
from
jnpkrn:version_1-ipc-server-temporary-channel-priority-loss
Jun 14, 2019
Merged
[version_1] IPC: server temporary channel priority loss #354
jnpkrn
merged 9 commits into
ClusterLabs:version_1
from
jnpkrn:version_1-ipc-server-temporary-channel-priority-loss
Jun 14, 2019
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There's some slight reserve for when bigger PID ranges are in use. The method to yield the limit on prefix string was derived from practical experience (rather than based on exact calculations). Signed-off-by: Jan Pokorný <[email protected]>
Using i7-6820HQ CPU yields these results: Before: ~2:54 After: ~2:26 Speedup: ~16% The main optimization lies in how run_function_in_new_process helper is constructed, since now, there's an actual synchronization between the parent and its child (that needs to be prioritized here, which is furthermore help with making the parent immediately give up it's processor possession) after the fork, so that a subsequent sleep is completely omitted -- at worst (unlikely), additional sleep round(s) will need to be undertaken as already arranged for (and now, just 400 ms is waited rather than excessive 1 second). Another slight optimization is likewise in omission of sleep where the control gets returned to once the waited for process has been suceesfully examined post-mortem, without worries it's previous life is still resounding. Signed-off-by: Jan Pokorný <[email protected]>
Roles specifications are currently not applied and are rather a preparation for the actual meaningful use to come. Signed-off-by: Jan Pokorný <[email protected]>
This way, this core part can be easily reused where needed. Note that "ready_signaller" similarity with run_ipc_server is not accidental, following commit will justify it. Signed-off-by: Jan Pokorný <[email protected]>
Compared to the outer world, libqb brings rather unintuitive approach to priorities within a native event loop (qbloop.h) -- it doesn't do an exhaustive high-to-low priorities in a batched (clean-the-level) manner, but rather linearly adds a possibility to pick the handling task from the higher priority level as opposed to lower priority ones. This has the advantage of limiting the chances of starvation and deadlock opportunities in the incorrectly constructed SW, on the other hand, it means that libqb is not fulfilling the architected intentions regarding what deserves a priority truthfully, so these priorities are worth just a hint rather than urgency-based separation. And consequently, a discovery of these deadlocks etc. is deferred to the (as Murphy's laws have it) least convenient moment, e.g., when said native event loop is exchanged for other (this time priority trully abiding, like GLib) implementation, while retaining the same basic notion and high-level handling of priorities on libqb side, in IPC server (service handling) context. Hence, demonstration of such a degenerate blocking is not trivial, and we must defer such other event loop implementation. After this hassle, we are rewarded with a practical proof said "high-level handling [...] in IPC server (service handling) context" contains a bug (which we are going to subsequently fix) -- this is contrasted with libqb's native loop implementation that works just fine even prior that fix. Signed-off-by: Jan Pokorný <[email protected]>
It turns out that while 7f56f58 allowed for less blocking (thus throughput increasing) initial handling of connections from clients within the abstract (out-of-libqb managed) event loop, it unfortunately subscribes itself back to such polling mechanism for UNIX-socket-check with a default priority, which can be lower than desired (via explicit qb_ipcs_request_rate_limit() configuration) for particular channel (amongst attention-competing siblings in the pool, the term here refers to associated communication, that is, both server and on-server abstraction for particular clients). And priority-based discrepancies are not forgiven in true priority abiding systems (that is, unlikele with libqb's native event loop harness as detailed in the previous commit, for which this would be soft-torelated hence the problem would not be spotted in the first place -- but that's expliicitly excluded from further discussion). On top of that, it violates the natural assumption that once (single threaded, which is imposed by libqb, at least between initial accept() and after-said-UNIX-socket-check) server accepts the connection, it shall rather take care of serving it (at least within stated initial scope of client connection life cycle) rather than be rushing to accept new ones -- which is exactly what used to happen previously once the library user set the effectively priority in the abstract poll above the default one. It's conceivable, just as with the former case of attention-competing siblings with higher priority whereby they could _infinitely_ live on at the expense of starving the client in the initial handling phase (authentication) despite the library user's as-high-as-siblings intention (for using the default priority for that unconditionally instead, which we address here), the dead lock is imminent also in this latter accept-to-client-authentication-handling case as well if there's an _unlimited_ fast-paced arrival queue (well, limited by with number of allowable open descriptors within the system, but for the Linux built-in maximum of 1M, there may be no practical difference, at least for time-sensitive applications). The only hope then is that such dead-locks are rather theoretical, since a "spontaneous" constant stream of either communication on unrelated, higher-prio sibling channels, or of new connection arrivals can as well testify the poor design of the libqb's IPC application. That being said, unconditional default priority in the isolated context of initial server-side client authentication is clearly a bug, but such application shall apply appropriate rate-limiting measures (exactly on priority basis) to handle unexpected flux nonetheless. The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass. Signed-off-by: Jan Pokorný <[email protected]>
It's misleading towards a random code observer, at least, hiding the fact that what failed is actually the queing up of some handling to perform asynchronously in the future, rather than invoking it synchronously right away. Signed-off-by: Jan Pokorný <[email protected]>
Make the qbipcs.h module interdependence clear (also shedding light to some semantic dependencies) as well. Signed-off-by: Jan Pokorný <[email protected]>
We want to run every and each test we can, without reliance on transitive deoendencies and environment "invariants". Signed-off-by: Jan Pokorný <[email protected]>
jnpkrn
force-pushed
the
version_1-ipc-server-temporary-channel-priority-loss
branch
from
June 12, 2019 14:47
5f50866
to
c5cb0db
Compare
With Fedora build, I've noticed that |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1.x backport of #352.