-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352
IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352
Conversation
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
Not sure why kronosnet CI is not passing on FreeBSD 12 ( |
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
58a8d86
to
2e573d9
Compare
It turns out that the bug is effectively unspottable with libqb's na(t)ive [*] actually it's too harsh, since it's unstated whether this implementation Do not merge yet, the new unit test currently errs on the safe side, marking |
2e573d9
to
e9339be
Compare
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
e9339be
to
d90d45f
Compare
Honestly, this was the most painful unit test ever for me. @chrissie-c if you are OK with this, I'd prefer pushing it myself. |
58f7a27
to
cf3d6e0
Compare
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
include/qb/qbloop.h
Outdated
* priorities, these are rather advisory, however. For the true real-world | ||
* systems with requirements of deterministic responses, you may be better | ||
* served with these priorities strictly separated and abided, which is the | ||
* case, e.g., with GLib. On the other hand, for early stages and timing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have too high opinion about libqb loop, but it may be helpful to be a little more specific. What do you mean by "... are rather advisory"? Why it doesn't fulfill "deterministic" guarantees (and maybe a small note why Glib does so?)? Also are you sure that your advice is really true for all (or at least most of the) "real-world" systems (Do you have any "hard" data)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording there is hard to understand for me, it's not in idiomatic modern English. I'll work out a better phrasing and let you know what I come up with.
On 28/05/19 02:03 -0700, Jan Friesse wrote:
jfriesse commented on this pull request.
I don't really have too high opinion about libqb loop, but it may be
helpful to be a little more specific. What do you mean by "... are
rather advisory"?
It's basically a hint rather than strict priority ordering abided
when handling order of the concurrent events is to be decided.
AFAIK, for the three priorities available, the ratio of serving
them is 3:2:1 -- it's just a probabilistic method of dealing with
priorities, which is rather atypical, and makes (high-rate) high-prio
batch event stream possibly blocked due to slow to handle (problem
of the application, but nevertheless...) low prio events.
Why it doesn't fulfill "deterministic" guarantees (and maybe
a small note why Glib does so?)?
Deterministic ("absolutistic") as opposed to probabilistic
(inherently nondeterministic), see above.
If you want to take me too literally, of course, when there's
the only event source (at possessions of both differently prioritized
channels) for the event handler, it's still deterministic, since this
source has a full control of the requests (until they are running
fully asynchronously), but it's gone when there are independent
sources.
Also are you sure that your advice is really true for all (or at
least most of the) "real-world" systems?
I am actually not aware of systems where probabilistic approach to
priorities would get used (apart from libqb). Hand-written loop
management is also typically based on implicit, strict priorities
(statically hard-coded order of handling the particular events).
The experience (see the unit test) tells that GLib behaves in what
I refer to as deterministic manner. That's somewhat more intuitive,
yet more demanding when it comes to the correct usage (requiring
starvation/deadlock-avoiding design considerations).
It's basically "process niceness vs. real-time scheduling" analogy.
Libqb favours the former (without declaring that clearly, until
the commit in question), GLib et al. the latter.
Any suggestions how to word this better while still guiding the libqb
users towards correct expectations welcome.
…--
Jan (Poki)
|
cf3d6e0
to
5233bea
Compare
Reworded and extended the commentary in |
5233bea
to
9aa5165
Compare
Ok, but is it necessarily bad?
Yep. So why not to describe it and let user to choose? Maybe this behavior is better fit?
Current loop is as deterministic as glib implementation. It's just different behavior.
So no hard data
Do you have any hard data?
It's easier to describe glib behavior, but I wouldn't call it more deterministic
Kind of right analogy and maybe nice to have it in description,.
Because chrissie is on it I will not add my description but idea is to really describe how current implementation works and what may be drawbacks. That's it. No need to describe feeling and/or give advice without any real evidence.
|
My idea for working is much simpler and to the point. "The priorities passed to the libqb loop functions are advisory and provide no guarantees as to the order the callbacks will be invoked." I don't think we need to say any more than that. Offering comparisons with other libraries just runs the risk of things getting out of date. |
@chrissie-c Yep, this short description serve probably as well as more in-dept description of current implementation, so ACK. |
Such a cross-fire about the comment.
On 29/05/19 01:09 -0700, Jan Friesse wrote:
> On 28/05/19 02:03 -0700, Jan Friesse wrote:
>> I don't really have too high opinion about libqb loop, but it may
>> be helpful to be a little more specific. What do you mean by "...
>> are rather advisory"?
>
> It's basically a hint rather than strict priority ordering abided
> when handling order of the concurrent events is to be decided.
> AFAIK, for the three priorities available, the ratio of serving
> them is 3:2:1 -- it's just a probabilistic method of dealing with
> priorities, which is rather atypical, and makes (high-rate)
Ok, but is it necessarily bad?
Definitely not. It's explained that it's an easy way out of starvation
problems. It's surprising, though, and I think this fact deserves
to be reflected.
> high-prio batch event stream possibly blocked due to slow to handle
> (problem of the application, but nevertheless...) low prio events.
Yep. So why not to describe it and let user to choose? Maybe this
behavior is better fit?
Users have the liberty of choice, there's no doubt about that, and
when they decide to stick with libqb's native implementation, they
should be perfectly aware of the trade-off they make when doing so.
If they are not OK with that, they will better be served elsewhere.
>> Why it doesn't fulfill "deterministic" guarantees (and maybe a
>> small note why Glib does so?)?
>
> Deterministic ("absolutistic") as opposed to probabilistic
> (inherently nondeterministic), see above. If you want to take me
> too literally, of course, when there's the only event source (at
> possessions of both differently prioritized channels) for the
> event handler, it's still deterministic, since this source has
> a full control of the requests (until they are running fully
> asynchronously), but it's gone when there are independent sources.
Current loop is as deterministic as glib implementation. It's just
different behavior.
I don't see how you come to this conclusion.
GLib makes a promise to serve higher priority events before proceeding
with lower priority ones. Scheduling plan is a function of incoming
events (and their priorities) only. If handling one high prio event
generates another one, it is deterministically ordered before any
events to handle that are of lower priority.
Libqb doesn't make such promise, rather it interleaves the priority
levels (assuming there's always some more events to handle) per some
kind of proportional system. Scheduling plan is a function of
incoming events, and some hidden state that determines which step in
said proportional system is to be followed. If handling one high prio
event generates another one, it is deterministically ordered within
events of the same priority (FIFO, I hope), but non-deterministically
ordered with respect to the lower prio ones (that moreover can
generate some other event[s] that can even be handled prior to said
just queued high prio one, depending on said hidden state!).
>> Also are you sure that your advice is really true for all (or at
>> least most of the) "real-world" systems?
>
> I am actually not aware of systems where probabilistic approach to
> priorities would get used (apart
So no hard data
It's a straw man -- will you give me the budget for this en masse,
quantitative research? Or what hard data are you after?
It's actually very simple, you'll achive the very same effect if you
offload any priority considerations out to event handling callbacks
themselves, each callback having a static counter i incremented and
modulo-checked per some factor:
- high priority: !(i % 1)
- med priority: !(i % 2)
- low priority: !(i % 3)
and you get the same functional behaviour as to what libqb offers.
I am yet to see a fairness balancing like that, it may exist,
but yeah, hard data would be handy.
On the other hand, it's quite common to see something like (pseudo-code):
while not terminate:
ret = poll(fds, nfs, 500)
if ret == -1:
if errno == EINTR:
continue
else:
return -errno
elif ret == 0:
continue
for pf in prioritized_fds:
new_round = False
for f in fds:
if f.fd == pf.fd and f.events & f.revents:
pf.handler()
new_round = True
break
if new_round:
break
> from libqb). Hand-written loop management is also typically based
> on implicit, strict priorities (statically
Do you have any hard data?
I'd get them for you if time capacity only allowed...
To flip it, do you have any counter-examples?
> hard-coded order of handling the particular events). The experience
> (see the unit test) tells that GLib behaves in what I refer to as
> deterministic manner. That's somewhat more intuitive, yet more
> demanding
It's easier to describe glib behavior, but I wouldn't call it more
deterministic
See above.
> when it comes to the correct usage (requiring
> starvation/deadlock-avoiding design considerations). It's basically
> "process niceness vs. real-time scheduling" analogy. Libqb favours
> the former (without
Kind of right analogy and maybe nice to have it in description,.
Ok, let me amend that comment.
declaring that clearly, until the commit in question), GLib et al.
the latter. Any suggestions how to word this better while still
guiding the libqb users towards correct expectations welcome.
Because chrissie is on it I will not add my description but idea is
to really describe how current implementation works and what may be
drawbacks. That's it. No need to describe feeling and/or give advice
without any real evidence.
It's basically a friendly advice to the fellow programmer willing to
use the facilities of libqb -- I am thinking about it in terms of what
I'd make clear to insight-less me being about to use qbloop.h.
I think that's the best to see it, incl. giving pointers to what
to use when the other, more intuitive (since "priority" is really
established as urgency in the field, since the age of IRQs if not
much older) behaviour is actually requested.
…--
Jan (Poki)
|
17be286
to
9606a27
Compare
1f409dd
to
b05828e
Compare
It turns out that while 7f56f58 allowed for less blocking (thus throughput increasing) initial handling of connections from clients within the abstract (out-of-libqb managed) event loop, it unfortunately subscribes itself back to such polling mechanism for UNIX-socket-check with a default priority, which can be lower than desired (via explicit qb_ipcs_request_rate_limit() configuration) for particular channel (amongst attention-competing siblings in the pool, the term here refers to associated communication, that is, both server and on-server abstraction for particular clients). And priority-based discrepancies are not forgiven in true priority abiding systems (that is, unlikele with libqb's native event loop harness as detailed in the previous commit, for which this would be soft-torelated hence the problem would not be spotted in the first place -- but that's expliicitly excluded from further discussion). On top of that, it violates the natural assumption that once (single threaded, which is imposed by libqb, at least between initial accept() and after-said-UNIX-socket-check) server accepts the connection, it shall rather take care of serving it (at least within stated initial scope of client connection life cycle) rather than be rushing to accept new ones -- which is exactly what used to happen previously once the library user set the effectively priority in the abstract poll above the default one. It's conceivable, just as with the former case of attention-competing siblings with higher priority whereby they could _infinitely_ live on at the expense of starving the client in the initial handling phase (authentication) despite the library user's as-high-as-siblings intention (for using the default priority for that unconditionally instead, which we address here), the dead lock is imminent also in this latter accept-to-client-authentication-handling case as well if there's an _unlimited_ fast-paced arrival queue (well, limited by with number of allowable open descriptors within the system, but for the Linux built-in maximum of 1M, there may be no practical difference, at least for time-sensitive applications). The only hope then is that such dead-locks are rather theoretical, since a "spontaneous" constant stream of either communication on unrelated, higher-prio sibling channels, or of new connection arrivals can as well testify the poor design of the libqb's IPC application. That being said, unconditional default priority in the isolated context of initial server-side client authentication is clearly a bug, but such application shall apply appropriate rate-limiting measures (exactly on priority basis) to handle unexpected flux nonetheless. The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass. Signed-off-by: Jan Pokorný <[email protected]>
It's misleading towards a random code observer, at least, hiding the fact that what failed is actually the queing up of some handling to perform asynchronously in the future, rather than invoking it synchronously right away. Signed-off-by: Jan Pokorný <[email protected]>
b05828e
to
bec6285
Compare
Fixed, looks good to me now. |
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
Can we have Ken's text in the comment please? |
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)
They are not representative as to the tricky nature I'd like to point Is the current form a blocker in any place? If so, in which in Point of arbiters is to judge fairly, not with the hive mind of the Apparently, the discussed matter was rather interesting for others to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point of arbiters is to judge fairly, not with the hive mind of the
clique they happen to be in. I'd appreciate if that was the case.
You're in luck, it is.
When several people agree on a perspective different from yours, you should at least consider the possibility that maybe they're not irrational colluders trying to defeat progress, and instead that just maybe your submission is not accomplishing its intended goal.
Apparently, the discussed matter was rather interesting for others to
learn about (thanks @wferi for your attendance here!), and I think the
current form discusses the possible shortcomings in about the right
depth to the random comer to the libqb's event loop.
I agree that some elaboration of what the priorities mean and how they interact would be useful. I disagree that you're successfully accomplishing that here, or that such a lengthy discussion belongs in the API documentation.
bec6285
to
d90d45f
Compare
Tried to rehash and re-allocate the commentary to follow the suggestion |
Make the qbipcs.h module interdependence clear (also shedding light to some semantic dependencies) as well. Signed-off-by: Jan Pokorný <[email protected]>
We want to run every and each test we can, without reliance on transitive deoendencies and environment "invariants". Signed-off-by: Jan Pokorný <[email protected]>
0ef6302
to
b46a574
Compare
(typo fix) |
TBH I'm unhappy you pulled this change without my approval. I'm still not happy with the comment in qbloop.h and I really don't see the need for the glib code in the test suite. Yes, there are a lot of good things in the check_ipc.c patch here (for which thank you), but they really should have gone in a separate PR rather than bundled with this one. |
On 12/06/19 08:07 -0700, Chrissie Caulfield wrote:
TBH I'm unhappy you pulled this change without my approval.
It was OK'd by Ken without further responses.
I'm still not happy with the comment in qbloop.h and I really don't
see the need for the glib code in the test suite.
To demonstrate the problem (and fix thereof), as is customary on the
project, and what will hopefully be followed in the future, also to
retain the clarity of the encountered problems in the past and about
the desired behaviours, in a reproducible manner.
That's why unit tests are important, and if they only softly ask
for a new dependency (and without it no such tests can actually
exist), there's hardly any conflict.
Yes, there are a lot of good things in the check_ipc.c patch
here (for which thank you), but they really should have gone
in a separate PR rather than bundled with this one.
That's the matter of causality. Would the familiarity with the
context be that implicit as you seem to suggest, the problem would
not have been introduced with the respective refactoring in the
past (I touched on this several times in this discussion).
So it forms a logical unit addressing the particular problem space.
That's how patchsets are usually structured, since otherwise it's
easy to miss something for unnecessary postponing and/or to suffer
from open loops which were once a low hanging fruit for closing.
It's also economical, several minds are concentrated in particular
direction _once_ rather than repeatedly with more mental energy burnt.
Of course, there are trade-offs to be made, but this was only a single
commit for which detaching doesn't make much sense, given its
"on-topicness".
…--
Poki
|
Ken is not libqb maintainer, much as I respect him. I am. While I'm all in favour of good unit tests - of course. The ones we have are already annoyingly fragile, adding more complexity is not helpful. Especially adding a comparison with a different library is just asking for unexpected failures. |
[sorry, my former response was in part based on misinterpreting Re complexity: That's why one of the commits introduced better traceability to which I see the risk factors, new failures stemming there go onto my shoulders. But was reasonably confident that the cut wasn't that massive, and total Frankly, I am constantly on the verge regarding complexity. |
I obviously didn't make myself clear. While the initial patch was great, and the general updates to check_ipc.c are welcomed, there are things I do NOT want in libqb and think you've overstepped the mark in committing them without approval.
These comments also apply to your patch (also committed without my approval) to the version_1 branch. Thank you. |
On 17/06/19 08:32 -0700, Chrissie Caulfield wrote:
I obviously didn't make myself clear.
While the initial patch was great, and the general updates to
check_ipc.c are welcomed, there are things I do NOT want in libqb
and think you've overstepped the mark in committing them without
approval.
- The glib code has no place here. It's adding pointless complexity
to the test suite for almost no perceivable value - it needs
removing.
It was explained multiple times that requiring such a cut is
undeserved.
Let me explain from another perspective.
Libqb IPC servers clearly require integration with some arbitrary
event loop to which it can be glued with three basic callbacks.
From examples, it's apparent that GLib was one such target for a long
long time -- configure.ac had a check for the presence of that
library (and don't forget, it's all optional, no GLib around is
entirely fine, examples won't get that support compiled in and the
respective unit tests will get skipped, and that's it). Beyond that,
likely to make libqb fully self-contained (see my note in the previous
discussion wrt. knowledge retention), also a built-in event loop is
delivered, with rather different characteristics.
Pulling any other event loop external implementation into the unit
tests could be seen as abrasive, I agree. In case of GLib, I cannot
say that, and such tests clearly demonstrate two things actually:
1. that there was indeed a bug -- not ever exposable with said
native event loop, so for the purpose of regressions checking,
it makes a tonne of sense to employ said once (ages ago)
accustomed integration library (mind that there can be no such
meaningful test otherwise), and validation of fix thereof
(commits were purposefully ordered to make this transition
from red to green observable!)
2. with a slight modification of the unit test (as described in the
comment), it can be demonstrated clearly that GLib event loop
works on the stonger basis than the native loop does, so it's
rather trivial to verify the claims newly added to the
documentation (see below), which in turn (again, as mentioned
multiple times) were shown as rather critical in a sense that
was there a full awareness of that, the original bug wouldn't,
very likely, have been introduced (see defensive coding below)
- The comment in qbloop.h is borderline incomprehensible and needs
fixing.
Excuse me, but what is incomprehensible?
I think we had enough of this straw man.
Ken seemed to comprehend that well in the last iteration.
It's all for the sake of defensive coding.
Do you want the same mistakes to be repeated over, or you don't
follow that defensive coding also means mentoring others to avoid
falling into the traps?
These comments also apply to your patch (also committed without my
approval) to the version_1 branch.
I took it all for a closed chapter now, which I thought is
what it truly deserves -- rest in piece. Problem is solved,
reproducibly regression-tested, and the wider space documented
to prevent unexpected surprises to us and others.
Exhumation should be well justified, IMHO.
(To be honest, I also wasn't ever consent to a removal of the linker
section efficient offloading of the log message book-keeping,
primarily for ABI destructing effect, and the removal wasn't even
fully concluded to at least retain some traits of the efficiency
gains that won't be possible if there's no enforcement on the
compile-time-constantness on some arguments that used to be enforced
like that in the past, see the Lars' posts on that topic on the list
-- perhaps something deserving more attention)
Speaking of what requires attention, in pacemaker, we are still
looking at being able to use open-socket-once approach to avoid race
conditions (still allowing some avoidable chances for DoS otherwise)
and perhaps to build a sort of a socket based activation out of that
(any guidance towards having such mechanism workable with libqb
welcome): #325
…--
Jan (Poki)
|
https://build.opensuse.org/request/show/946348 by user yan_gao + dimstar_suse - Retry if posix_fallocate is interrupted with EINTR (#453) (gh#ClusterLabs/libqb#451, bsc#1193737, bsc#1193912) * bsc#1193737-0001-Retry-if-posix_fallocate-is-interrupted-with-EINTR-4.patch - IPC: server: avoid temporary channel priority loss, up to deadlock-worth (gh#ClusterLabs/libqb#352, rh#1718773, bsc#1188212) (forwarded request 946347 from yan_gao)
It turns out that while 7f56f58 allowed for less blocking (thus
throughput increasing) initial handling of connections from clients
within the abstract (out-of-libqb managed) event loop, it unfortunately
subscribes itself back to such polling mechanism for UNIX-socket-check
with a default priority, which can be lower than desired (via explicit
qb_ipcs_request_rate_limit() configuration) for particular channel
(amongst attention-competing siblings in the pool, the term here
refers to associated communication, that is, both server and
on-server abstraction for particular clients).
On top of that, it violates the natural assumption that once (single
threaded, which is imposed by libqb, at least between initial accept()
and after-said-UNIX-socket-check) server accepts the connection, it
shall rather take care of serving it (at least within stated initial
scope of client connection life cycle) rather than be rushing to accept
new ones -- which is exactly what used to happen previously once the
library user set the effectively priority in the abstract poll
above the default one.
It's conceivable, just as with the former case of attention-competing
siblings with higher priority whereby they could infinitely live on
at the expense of starving the client in the initial handling phase
(authentication) despite the library user's as-high-as-siblings
intention (for using the default priority for that unconditionally
instead, which we address here), the dead lock is imminent also in
this latter accept-to-client-authentication-handling case as well
if there's an unlimited fast-paced arrival queue (well, limited
by with number of allowable open descriptors within the system,
but for the Linux built-in maximum of 1M, there may be no practical
difference, at least for time-sensitive applications).
The only hope then is that such dead-locks are rather theoretical,
since a "spontaneous" constant stream of either communication on
unrelated, higher-prio sibling channels, or of new connection arrivals
can as well testify the poor design of the libqb's IPC application.
That being said, unconditional default priority in the isolated
context of initial server-side client authentication is clearly
a bug, but such application shall apply appropriate rate-limiting
measures (exactly on priority basis) to handle unexpected flux
nonetheless.