IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352

jnpkrn · 2019-05-16T16:00:05Z

It turns out that while 7f56f58 allowed for less blocking (thus
throughput increasing) initial handling of connections from clients
within the abstract (out-of-libqb managed) event loop, it unfortunately
subscribes itself back to such polling mechanism for UNIX-socket-check
with a default priority, which can be lower than desired (via explicit
qb_ipcs_request_rate_limit() configuration) for particular channel
(amongst attention-competing siblings in the pool, the term here
refers to associated communication, that is, both server and
on-server abstraction for particular clients).

On top of that, it violates the natural assumption that once (single
threaded, which is imposed by libqb, at least between initial accept()
and after-said-UNIX-socket-check) server accepts the connection, it
shall rather take care of serving it (at least within stated initial
scope of client connection life cycle) rather than be rushing to accept
new ones -- which is exactly what used to happen previously once the
library user set the effectively priority in the abstract poll
above the default one.

It's conceivable, just as with the former case of attention-competing
siblings with higher priority whereby they could infinitely live on
at the expense of starving the client in the initial handling phase
(authentication) despite the library user's as-high-as-siblings
intention (for using the default priority for that unconditionally
instead, which we address here), the dead lock is imminent also in
this latter accept-to-client-authentication-handling case as well
if there's an unlimited fast-paced arrival queue (well, limited
by with number of allowable open descriptors within the system,
but for the Linux built-in maximum of 1M, there may be no practical
difference, at least for time-sensitive applications).

The only hope then is that such dead-locks are rather theoretical,
since a "spontaneous" constant stream of either communication on
unrelated, higher-prio sibling channels, or of new connection arrivals
can as well testify the poor design of the libqb's IPC application.
That being said, unconditional default priority in the isolated
context of initial server-side client authentication is clearly
a bug, but such application shall apply appropriate rate-limiting
measures (exactly on priority basis) to handle unexpected flux
nonetheless.

In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)

jnpkrn · 2019-05-16T22:31:53Z

Not sure why kronosnet CI is not passing on FreeBSD 12 (ipc.test).
It worked for me without a glitch with 12.0-RELEASE r341666 in VM.

In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)

jnpkrn · 2019-05-22T20:37:17Z

It turns out that the bug is effectively unspottable with libqb's na(t)ive
event loop implementation (qbloop.h). Naive, for it doesn't even
implement priority levels properly[*], making broken SW behave as if
everything was alright :-/

[*] actually it's too harsh, since it's unstated whether this implementation
is this forgiving on purpose (although for soft real-time purposes, it'd
be stretch to call this a desired property), and it does some sort of
(again naive) prioritity balancing

Do not merge yet, the new unit test currently errs on the safe side, marking
a failure even after the fix -- need to find some reliable/reproducible
balance.

In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)

jnpkrn · 2019-05-24T20:35:58Z

Honestly, this was the most painful unit test ever for me.

@chrissie-c if you are OK with this, I'd prefer pushing it myself.

In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)

jfriesse · 2019-05-28T09:03:20Z

include/qb/qbloop.h

+ * priorities, these are rather advisory, however.  For the true real-world
+ * systems with requirements of deterministic responses, you may be better
+ * served with these priorities strictly separated and abided, which is the
+ * case, e.g., with GLib.  On the other hand, for early stages and timing


I don't really have too high opinion about libqb loop, but it may be helpful to be a little more specific. What do you mean by "... are rather advisory"? Why it doesn't fulfill "deterministic" guarantees (and maybe a small note why Glib does so?)? Also are you sure that your advice is really true for all (or at least most of the) "real-world" systems (Do you have any "hard" data)?

The wording there is hard to understand for me, it's not in idiomatic modern English. I'll work out a better phrasing and let you know what I come up with.

jnpkrn · 2019-05-28T14:08:35Z

On 28/05/19 02:03 -0700, Jan Friesse wrote: jfriesse commented on this pull request. I don't really have too high opinion about libqb loop, but it may be helpful to be a little more specific. What do you mean by "... are rather advisory"?

It's basically a hint rather than strict priority ordering abided when handling order of the concurrent events is to be decided. AFAIK, for the three priorities available, the ratio of serving them is 3:2:1 -- it's just a probabilistic method of dealing with priorities, which is rather atypical, and makes (high-rate) high-prio batch event stream possibly blocked due to slow to handle (problem of the application, but nevertheless...) low prio events.

Why it doesn't fulfill "deterministic" guarantees (and maybe a small note why Glib does so?)?

Deterministic ("absolutistic") as opposed to probabilistic (inherently nondeterministic), see above. If you want to take me too literally, of course, when there's the only event source (at possessions of both differently prioritized channels) for the event handler, it's still deterministic, since this source has a full control of the requests (until they are running fully asynchronously), but it's gone when there are independent sources.

Also are you sure that your advice is really true for all (or at least most of the) "real-world" systems?

I am actually not aware of systems where probabilistic approach to priorities would get used (apart from libqb). Hand-written loop management is also typically based on implicit, strict priorities (statically hard-coded order of handling the particular events). The experience (see the unit test) tells that GLib behaves in what I refer to as deterministic manner. That's somewhat more intuitive, yet more demanding when it comes to the correct usage (requiring starvation/deadlock-avoiding design considerations). It's basically "process niceness vs. real-time scheduling" analogy. Libqb favours the former (without declaring that clearly, until the commit in question), GLib et al. the latter. Any suggestions how to word this better while still guiding the libqb users towards correct expectations welcome.

…

-- Jan (Poki)

jnpkrn · 2019-05-28T18:44:03Z

Reworded and extended the commentary in qbloop.h, fixing a typo along.
Honestly, I have no idea to what extent the non-pollable sources in
GLib's mainloop are subject to priority-based serialization, so narrowed
the stated example down just to that. Also added (currently extraneous)
prerequisite on GLib, so that those new tests won't get accidentally
avoided.

jfriesse · 2019-05-29T08:09:04Z

On 28/05/19 02:03 -0700, Jan Friesse wrote: jfriesse commented on this pull request. I don't really have too high opinion about libqb loop, but it may be helpful to be a little more specific. What do you mean by "... are rather advisory"?
It's basically a hint rather than strict priority ordering abided when handling order of the concurrent events is to be decided. AFAIK, for the three priorities available, the ratio of serving them is 3:2:1 -- it's just a probabilistic method of dealing with priorities, which is rather atypical, and makes (high-rate)

Ok, but is it necessarily bad?

high-prio batch event stream possibly blocked due to slow to handle (problem of the application, but nevertheless...) low prio events.

Yep. So why not to describe it and let user to choose? Maybe this behavior is better fit?

Why it doesn't fulfill "deterministic" guarantees (and maybe a small note why Glib does so?)?
Deterministic ("absolutistic") as opposed to probabilistic (inherently nondeterministic), see above. If you want to take me too literally, of course, when there's the only event source (at possessions of both differently prioritized channels) for the event handler, it's still deterministic, since this source has a full control of the requests (until they are running fully asynchronously), but it's gone when there are independent sources.

Current loop is as deterministic as glib implementation. It's just different behavior.

Also are you sure that your advice is really true for all (or at least most of the) "real-world" systems?
I am actually not aware of systems where probabilistic approach to priorities would get used (apart

So no hard data

from libqb). Hand-written loop management is also typically based on implicit, strict priorities (statically

Do you have any hard data?

hard-coded order of handling the particular events). The experience (see the unit test) tells that GLib behaves in what I refer to as deterministic manner. That's somewhat more intuitive, yet more demanding

It's easier to describe glib behavior, but I wouldn't call it more deterministic

when it comes to the correct usage (requiring starvation/deadlock-avoiding design considerations). It's basically "process niceness vs. real-time scheduling" analogy. Libqb favours the former (without

Kind of right analogy and maybe nice to have it in description,.

declaring that clearly, until the commit in question), GLib et al. the latter. Any suggestions how to word this better while still guiding the libqb users towards correct expectations welcome.

Because chrissie is on it I will not add my description but idea is to really describe how current implementation works and what may be drawbacks. That's it. No need to describe feeling and/or give advice without any real evidence.

…
-- Jan (Poki)

chrissie-c · 2019-05-29T08:35:32Z

My idea for working is much simpler and to the point.

"The priorities passed to the libqb loop functions are advisory and provide no guarantees as to the order the callbacks will be invoked."

I don't think we need to say any more than that. Offering comparisons with other libraries just runs the risk of things getting out of date.

jfriesse · 2019-05-29T08:46:59Z

@chrissie-c Yep, this short description serve probably as well as more in-dept description of current implementation, so ACK.

jnpkrn · 2019-05-29T14:11:46Z

Such a cross-fire about the comment.

On 29/05/19 01:09 -0700, Jan Friesse wrote: > On 28/05/19 02:03 -0700, Jan Friesse wrote: >> I don't really have too high opinion about libqb loop, but it may >> be helpful to be a little more specific. What do you mean by "... >> are rather advisory"? > > It's basically a hint rather than strict priority ordering abided > when handling order of the concurrent events is to be decided. > AFAIK, for the three priorities available, the ratio of serving > them is 3:2:1 -- it's just a probabilistic method of dealing with > priorities, which is rather atypical, and makes (high-rate) Ok, but is it necessarily bad?

Definitely not. It's explained that it's an easy way out of starvation problems. It's surprising, though, and I think this fact deserves to be reflected.

> high-prio batch event stream possibly blocked due to slow to handle > (problem of the application, but nevertheless...) low prio events. Yep. So why not to describe it and let user to choose? Maybe this behavior is better fit?

Users have the liberty of choice, there's no doubt about that, and when they decide to stick with libqb's native implementation, they should be perfectly aware of the trade-off they make when doing so. If they are not OK with that, they will better be served elsewhere.

>> Why it doesn't fulfill "deterministic" guarantees (and maybe a >> small note why Glib does so?)? > > Deterministic ("absolutistic") as opposed to probabilistic > (inherently nondeterministic), see above. If you want to take me > too literally, of course, when there's the only event source (at > possessions of both differently prioritized channels) for the > event handler, it's still deterministic, since this source has > a full control of the requests (until they are running fully > asynchronously), but it's gone when there are independent sources. Current loop is as deterministic as glib implementation. It's just different behavior.

I don't see how you come to this conclusion. GLib makes a promise to serve higher priority events before proceeding with lower priority ones. Scheduling plan is a function of incoming events (and their priorities) only. If handling one high prio event generates another one, it is deterministically ordered before any events to handle that are of lower priority. Libqb doesn't make such promise, rather it interleaves the priority levels (assuming there's always some more events to handle) per some kind of proportional system. Scheduling plan is a function of incoming events, and some hidden state that determines which step in said proportional system is to be followed. If handling one high prio event generates another one, it is deterministically ordered within events of the same priority (FIFO, I hope), but non-deterministically ordered with respect to the lower prio ones (that moreover can generate some other event[s] that can even be handled prior to said just queued high prio one, depending on said hidden state!).

>> Also are you sure that your advice is really true for all (or at >> least most of the) "real-world" systems? > > I am actually not aware of systems where probabilistic approach to > priorities would get used (apart So no hard data

It's a straw man -- will you give me the budget for this en masse, quantitative research? Or what hard data are you after? It's actually very simple, you'll achive the very same effect if you offload any priority considerations out to event handling callbacks themselves, each callback having a static counter i incremented and modulo-checked per some factor: - high priority: !(i % 1) - med priority: !(i % 2) - low priority: !(i % 3) and you get the same functional behaviour as to what libqb offers. I am yet to see a fairness balancing like that, it may exist, but yeah, hard data would be handy. On the other hand, it's quite common to see something like (pseudo-code): while not terminate: ret = poll(fds, nfs, 500) if ret == -1: if errno == EINTR: continue else: return -errno elif ret == 0: continue for pf in prioritized_fds: new_round = False for f in fds: if f.fd == pf.fd and f.events & f.revents: pf.handler() new_round = True break if new_round: break

> from libqb). Hand-written loop management is also typically based > on implicit, strict priorities (statically Do you have any hard data?

I'd get them for you if time capacity only allowed... To flip it, do you have any counter-examples?

> hard-coded order of handling the particular events). The experience > (see the unit test) tells that GLib behaves in what I refer to as > deterministic manner. That's somewhat more intuitive, yet more > demanding It's easier to describe glib behavior, but I wouldn't call it more deterministic

See above.

> when it comes to the correct usage (requiring > starvation/deadlock-avoiding design considerations). It's basically > "process niceness vs. real-time scheduling" analogy. Libqb favours > the former (without Kind of right analogy and maybe nice to have it in description,.

Ok, let me amend that comment.

declaring that clearly, until the commit in question), GLib et al. the latter. Any suggestions how to word this better while still guiding the libqb users towards correct expectations welcome. Because chrissie is on it I will not add my description but idea is to really describe how current implementation works and what may be drawbacks. That's it. No need to describe feeling and/or give advice without any real evidence.

It's basically a friendly advice to the fellow programmer willing to use the facilities of libqb -- I am thinking about it in terms of what I'd make clear to insight-less me being about to use qbloop.h. I think that's the best to see it, incl. giving pointers to what to use when the other, more intuitive (since "priority" is really established as urgency in the field, since the age of IRQs if not much older) behaviour is actually requested.

…

-- Jan (Poki)

It turns out that while 7f56f58 allowed for less blocking (thus throughput increasing) initial handling of connections from clients within the abstract (out-of-libqb managed) event loop, it unfortunately subscribes itself back to such polling mechanism for UNIX-socket-check with a default priority, which can be lower than desired (via explicit qb_ipcs_request_rate_limit() configuration) for particular channel (amongst attention-competing siblings in the pool, the term here refers to associated communication, that is, both server and on-server abstraction for particular clients). And priority-based discrepancies are not forgiven in true priority abiding systems (that is, unlikele with libqb's native event loop harness as detailed in the previous commit, for which this would be soft-torelated hence the problem would not be spotted in the first place -- but that's expliicitly excluded from further discussion). On top of that, it violates the natural assumption that once (single threaded, which is imposed by libqb, at least between initial accept() and after-said-UNIX-socket-check) server accepts the connection, it shall rather take care of serving it (at least within stated initial scope of client connection life cycle) rather than be rushing to accept new ones -- which is exactly what used to happen previously once the library user set the effectively priority in the abstract poll above the default one. It's conceivable, just as with the former case of attention-competing siblings with higher priority whereby they could _infinitely_ live on at the expense of starving the client in the initial handling phase (authentication) despite the library user's as-high-as-siblings intention (for using the default priority for that unconditionally instead, which we address here), the dead lock is imminent also in this latter accept-to-client-authentication-handling case as well if there's an _unlimited_ fast-paced arrival queue (well, limited by with number of allowable open descriptors within the system, but for the Linux built-in maximum of 1M, there may be no practical difference, at least for time-sensitive applications). The only hope then is that such dead-locks are rather theoretical, since a "spontaneous" constant stream of either communication on unrelated, higher-prio sibling channels, or of new connection arrivals can as well testify the poor design of the libqb's IPC application. That being said, unconditional default priority in the isolated context of initial server-side client authentication is clearly a bug, but such application shall apply appropriate rate-limiting measures (exactly on priority basis) to handle unexpected flux nonetheless. The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass. Signed-off-by: Jan Pokorný <[email protected]>

It's misleading towards a random code observer, at least, hiding the fact that what failed is actually the queing up of some handling to perform asynchronously in the future, rather than invoking it synchronously right away. Signed-off-by: Jan Pokorný <[email protected]>

jnpkrn · 2019-06-05T09:32:47Z

Regarding "test_ipc_dispatch_shm_native_prio_deadlock_provoke"
failing, I've seen that as well, but it went away after the "fix",
I think I'll take a closer look.

Fixed, looks good to me now.

In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)

chrissie-c · 2019-06-06T13:09:22Z

Can we have Ken's text in the comment please?

In the high-load (or high-rate-config-change) scenarios, pacemaker-fenced would be unable to provide service when basically DoS'd with CIB update notifications. Try to reconcile that with elevated priority of the server's proper listening interface in the mainloop, at worst, it will try to fence with slightly outdated config, but appears to be less bad than not carrying the execution at all, for instance. Other daemons might be considered as well. Prerequisites: - ClusterLabs/libqb#352 (libqb used to contain a bug due to which one particular step in the initial-client-connection-accepting-at-the-server procedure that would be carried out with hard-coded (and hence possibly lower than competing events') priority, which backfires exactly in this case (once the pacemaker part is fixed -- by the means of elevating priority for the API end-point of fenced so that it won't get consistently overridden with a non-socket-based event source/trigger) How to verify: - mocked/based -N (see commit adding that module to mocked based daemon)

jnpkrn · 2019-06-06T14:40:54Z

They are not representative as to the tricky nature I'd like to point
out -- and which likely caused the problem I fixed in the first place.

Is the current form a blocker in any place? If so, in which in
particular?

Point of arbiters is to judge fairly, not with the hive mind of the
clique they happen to be in. I'd appreciate if that was the case.

Apparently, the discussed matter was rather interesting for others to
learn about (thanks @wferi for your attendance here!), and I think the
current form discusses the possible shortcomings in about the right
depth to the random comer to the libqb's event loop.

kgaillot

Point of arbiters is to judge fairly, not with the hive mind of the
clique they happen to be in. I'd appreciate if that was the case.

You're in luck, it is.

When several people agree on a perspective different from yours, you should at least consider the possibility that maybe they're not irrational colluders trying to defeat progress, and instead that just maybe your submission is not accomplishing its intended goal.

Apparently, the discussed matter was rather interesting for others to
learn about (thanks @wferi for your attendance here!), and I think the
current form discusses the possible shortcomings in about the right
depth to the random comer to the libqb's event loop.

I agree that some elaboration of what the priorities mean and how they interact would be useful. I disagree that you're successfully accomplishing that here, or that such a lengthy discussion belongs in the API documentation.

include/qb/qbloop.h

jnpkrn · 2019-06-07T14:57:07Z

Tried to rehash and re-allocate the commentary to follow the suggestion
for less wordiness and more practical usability. Also put the more, IMHO,
correct (out of all range of arguments in the discussion) stress on why
some implementors may come short and will need to use another event
loop implementation ... CS [therefore IT, when done right] is in essence
based on problem reducibility, and more powerful means are more likely
to be easily reducible to weaker power requiring solutions, but not the
other way around (see, for instance, it took ages for yum to adopt the
proper solution to the "problem of satisfiability", for which there are
stock solutions, called SAT solvers -- all the struggle prior to that
could be avoided if this problem reducibility gist was duly followed).

Make the qbipcs.h module interdependence clear (also shedding light to some semantic dependencies) as well. Signed-off-by: Jan Pokorný <[email protected]>

We want to run every and each test we can, without reliance on transitive deoendencies and environment "invariants". Signed-off-by: Jan Pokorný <[email protected]>

jnpkrn · 2019-06-07T15:38:43Z

(typo fix)

chrissie-c · 2019-06-12T15:07:27Z

TBH I'm unhappy you pulled this change without my approval. I'm still not happy with the comment in qbloop.h and I really don't see the need for the glib code in the test suite. Yes, there are a lot of good things in the check_ipc.c patch here (for which thank you), but they really should have gone in a separate PR rather than bundled with this one.

jnpkrn · 2019-06-12T16:10:01Z

On 12/06/19 08:07 -0700, Chrissie Caulfield wrote: TBH I'm unhappy you pulled this change without my approval.

It was OK'd by Ken without further responses.

I'm still not happy with the comment in qbloop.h and I really don't see the need for the glib code in the test suite.

To demonstrate the problem (and fix thereof), as is customary on the project, and what will hopefully be followed in the future, also to retain the clarity of the encountered problems in the past and about the desired behaviours, in a reproducible manner. That's why unit tests are important, and if they only softly ask for a new dependency (and without it no such tests can actually exist), there's hardly any conflict.

Yes, there are a lot of good things in the check_ipc.c patch here (for which thank you), but they really should have gone in a separate PR rather than bundled with this one.

That's the matter of causality. Would the familiarity with the context be that implicit as you seem to suggest, the problem would not have been introduced with the respective refactoring in the past (I touched on this several times in this discussion). So it forms a logical unit addressing the particular problem space. That's how patchsets are usually structured, since otherwise it's easy to miss something for unnecessary postponing and/or to suffer from open loops which were once a low hanging fruit for closing. It's also economical, several minds are concentrated in particular direction _once_ rather than repeatedly with more mental energy burnt. Of course, there are trade-offs to be made, but this was only a single commit for which detaching doesn't make much sense, given its "on-topicness".

…

-- Poki

chrissie-c · 2019-06-13T07:44:13Z

Ken is not libqb maintainer, much as I respect him. I am.

While I'm all in favour of good unit tests - of course. The ones we have are already annoyingly fragile, adding more complexity is not helpful. Especially adding a comparison with a different library is just asking for unexpected failures.

jnpkrn · 2019-06-13T15:21:04Z

[sorry, my former response was in part based on misinterpreting
check_ipc.c as qbloop.h]

Re complexity:

That's why one of the commits introduced better traceability to which
role amongst multiple processes employed in the test is the source of
the message -- I've enjoyed much fun without that originally, which
let me to add that.

I see the risk factors, new failures stemming there go onto my shoulders.

But was reasonably confident that the cut wasn't that massive, and total
time trimming was partially a side effect of adding new synchronization
that was needed between a particular phase of execution of what I call
alphaclient (running its priority events as a neverending stream) and
a regular client (that's in GLib case demonstrably starved out with
the former). Since the mechanism was already devised, it was a low
hanging fruit to use it also for server-to-client synchronization,
for which purpose there was this sleep(1) formerly.

Frankly, I am constantly on the verge regarding complexity.
Best solution for that is to have rather clearly defined building blocks
without excessive stretch, and their frequent combination to achieve the
objectives, I think. While I increased the complexity, it was an
amortized amount not proportional to the number of addressed code
locations, exactly for that reusing, and that better traceability was
also meant to counter that effect. Next level, when it will seem
unbearable, will likely be to have an auxiliary module with
test-agnostic helpers (said building blocks) apart and well documented,
perhaps. I don't know of better approach, sadly, apart from the worst
stance -- having it rot away for fears to touch anything.

chrissie-c · 2019-06-17T15:32:35Z

I obviously didn't make myself clear.

While the initial patch was great, and the general updates to check_ipc.c are welcomed, there are things I do NOT want in libqb and think you've overstepped the mark in committing them without approval.

The glib code has no place here. It's adding pointless complexity to the test suite for almost no perceivable value - it needs removing.
The comment in qbloop.h is borderline incomprehensible and needs fixing. Either remove it completely or replace it with my shortened suggestion: "The priorities passed to the libqb loop functions are advisory and provide no guarantees as to the order the callbacks will be invoked."

These comments also apply to your patch (also committed without my approval) to the version_1 branch.

Thank you.

jnpkrn · 2019-06-17T19:58:42Z

On 17/06/19 08:32 -0700, Chrissie Caulfield wrote: I obviously didn't make myself clear. While the initial patch was great, and the general updates to check_ipc.c are welcomed, there are things I do NOT want in libqb and think you've overstepped the mark in committing them without approval. - The glib code has no place here. It's adding pointless complexity to the test suite for almost no perceivable value - it needs removing.

It was explained multiple times that requiring such a cut is undeserved. Let me explain from another perspective. Libqb IPC servers clearly require integration with some arbitrary event loop to which it can be glued with three basic callbacks. From examples, it's apparent that GLib was one such target for a long long time -- configure.ac had a check for the presence of that library (and don't forget, it's all optional, no GLib around is entirely fine, examples won't get that support compiled in and the respective unit tests will get skipped, and that's it). Beyond that, likely to make libqb fully self-contained (see my note in the previous discussion wrt. knowledge retention), also a built-in event loop is delivered, with rather different characteristics. Pulling any other event loop external implementation into the unit tests could be seen as abrasive, I agree. In case of GLib, I cannot say that, and such tests clearly demonstrate two things actually: 1. that there was indeed a bug -- not ever exposable with said native event loop, so for the purpose of regressions checking, it makes a tonne of sense to employ said once (ages ago) accustomed integration library (mind that there can be no such meaningful test otherwise), and validation of fix thereof (commits were purposefully ordered to make this transition from red to green observable!) 2. with a slight modification of the unit test (as described in the comment), it can be demonstrated clearly that GLib event loop works on the stonger basis than the native loop does, so it's rather trivial to verify the claims newly added to the documentation (see below), which in turn (again, as mentioned multiple times) were shown as rather critical in a sense that was there a full awareness of that, the original bug wouldn't, very likely, have been introduced (see defensive coding below)

- The comment in qbloop.h is borderline incomprehensible and needs fixing.

Excuse me, but what is incomprehensible? I think we had enough of this straw man. Ken seemed to comprehend that well in the last iteration. It's all for the sake of defensive coding. Do you want the same mistakes to be repeated over, or you don't follow that defensive coding also means mentoring others to avoid falling into the traps?

These comments also apply to your patch (also committed without my approval) to the version_1 branch.

I took it all for a closed chapter now, which I thought is what it truly deserves -- rest in piece. Problem is solved, reproducibly regression-tested, and the wider space documented to prevent unexpected surprises to us and others. Exhumation should be well justified, IMHO. (To be honest, I also wasn't ever consent to a removal of the linker section efficient offloading of the log message book-keeping, primarily for ABI destructing effect, and the removal wasn't even fully concluded to at least retain some traits of the efficiency gains that won't be possible if there's no enforcement on the compile-time-constantness on some arguments that used to be enforced like that in the past, see the Lars' posts on that topic on the list -- perhaps something deserving more attention) Speaking of what requires attention, in pacemaker, we are still looking at being able to use open-socket-once approach to avoid race conditions (still allowing some avoidable chances for DoS otherwise) and perhaps to build a sort of a socket based activation out of that (any guidance towards having such mechanism workable with libqb welcome): #325

…

-- Jan (Poki)

https://build.opensuse.org/request/show/946348 by user yan_gao + dimstar_suse - Retry if posix_fallocate is interrupted with EINTR (#453) (gh#ClusterLabs/libqb#451, bsc#1193737, bsc#1193912) * bsc#1193737-0001-Retry-if-posix_fallocate-is-interrupted-with-EINTR-4.patch - IPC: server: avoid temporary channel priority loss, up to deadlock-worth (gh#ClusterLabs/libqb#352, rh#1718773, bsc#1188212) (forwarded request 946347 from yan_gao)

jnpkrn mentioned this pull request May 16, 2019

High: pacemaker-fenced cannot be blocked with CIB updates forever ClusterLabs/pacemaker#1573

Merged

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch 3 times, most recently from 58a8d86 to 2e573d9 Compare May 22, 2019 20:30

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from 2e573d9 to e9339be Compare May 22, 2019 20:42

jnpkrn closed this May 24, 2019

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from e9339be to d90d45f Compare May 24, 2019 20:33

jnpkrn reopened this May 24, 2019

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from 58f7a27 to cf3d6e0 Compare May 24, 2019 20:38

jfriesse reviewed May 28, 2019

View reviewed changes

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from cf3d6e0 to 5233bea Compare May 28, 2019 18:40

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from 5233bea to 9aa5165 Compare May 28, 2019 18:50

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch 2 times, most recently from 17be286 to 9606a27 Compare May 29, 2019 15:07

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from 1f409dd to b05828e Compare June 4, 2019 11:09

jnpkrn added 2 commits June 5, 2019 10:36

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from b05828e to bec6285 Compare June 5, 2019 09:31

kgaillot requested changes Jun 6, 2019

View reviewed changes

jnpkrn closed this Jun 7, 2019

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from bec6285 to d90d45f Compare June 7, 2019 14:47

jnpkrn reopened this Jun 7, 2019

kgaillot approved these changes Jun 7, 2019

View reviewed changes

jnpkrn added 2 commits June 7, 2019 17:37

doc: qbloop.h: document pros/cons of using built-in event loop impl

7f0dc55

Make the qbipcs.h module interdependence clear (also shedding light to some semantic dependencies) as well. Signed-off-by: Jan Pokorný <[email protected]>

CI: travis: add (redundant for now, but...) libglib2.0-dev prerequisite

b46a574

We want to run every and each test we can, without reliance on transitive deoendencies and environment "invariants". Signed-off-by: Jan Pokorný <[email protected]>

jnpkrn force-pushed the ipc-server-temporary-channel-priority-loss branch from 0ef6302 to b46a574 Compare June 7, 2019 15:38

jnpkrn merged commit b46a574 into ClusterLabs:master Jun 10, 2019

This was referenced Jun 11, 2019

[version_1] IPC: server temporary channel priority loss #354

Merged

tests: ipc: fix the no-GLib conditionalizing #355

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352

IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352

jnpkrn commented May 16, 2019

jnpkrn commented May 16, 2019

jnpkrn commented May 22, 2019

jnpkrn commented May 24, 2019

jfriesse May 28, 2019 •

edited

Loading

chrissie-c May 29, 2019

jnpkrn commented May 28, 2019 via email

jnpkrn commented May 28, 2019

jfriesse commented May 29, 2019 •

edited

Loading

chrissie-c commented May 29, 2019

jfriesse commented May 29, 2019 •

edited

Loading

jnpkrn commented May 29, 2019 via email •

edited

Loading

jnpkrn commented Jun 5, 2019

chrissie-c commented Jun 6, 2019

jnpkrn commented Jun 6, 2019

kgaillot left a comment

jnpkrn commented Jun 7, 2019

jnpkrn commented Jun 7, 2019

chrissie-c commented Jun 12, 2019

jnpkrn commented Jun 12, 2019 via email

chrissie-c commented Jun 13, 2019

jnpkrn commented Jun 13, 2019

chrissie-c commented Jun 17, 2019

jnpkrn commented Jun 17, 2019 via email

IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352

IPC: server: avoid temporary channel priority loss, up to deadlock-worth #352

Conversation

jnpkrn commented May 16, 2019

jnpkrn commented May 16, 2019

jnpkrn commented May 22, 2019

jnpkrn commented May 24, 2019

jfriesse May 28, 2019 • edited Loading

Choose a reason for hiding this comment

chrissie-c May 29, 2019

Choose a reason for hiding this comment

jnpkrn commented May 28, 2019 via email

jnpkrn commented May 28, 2019

jfriesse commented May 29, 2019 • edited Loading

chrissie-c commented May 29, 2019

jfriesse commented May 29, 2019 • edited Loading

jnpkrn commented May 29, 2019 via email • edited Loading

jnpkrn commented Jun 5, 2019

chrissie-c commented Jun 6, 2019

jnpkrn commented Jun 6, 2019

kgaillot left a comment

Choose a reason for hiding this comment

jnpkrn commented Jun 7, 2019

jnpkrn commented Jun 7, 2019

chrissie-c commented Jun 12, 2019

jnpkrn commented Jun 12, 2019 via email

chrissie-c commented Jun 13, 2019

jnpkrn commented Jun 13, 2019

chrissie-c commented Jun 17, 2019

jnpkrn commented Jun 17, 2019 via email

jfriesse May 28, 2019 •

edited

Loading

jfriesse commented May 29, 2019 •

edited

Loading

jfriesse commented May 29, 2019 •

edited

Loading

jnpkrn commented May 29, 2019 via email •

edited

Loading