OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice) #22

jnpkrn · 2019-03-16T14:26:16Z

Observing recently as of now unpreventable configuration error:

made me think how can we fix these undesirable shortcomings in our
cluster stack.

Borrowing from conclusion
ClusterLabs/pcs#197 (comment):

It's more a wider systemic flaw of never genuinely considering
consequences of:

running semantically-matching instances of the agent in parallel

not preventing some patterns of agents' usage, or conversely,
not enforcing some constraints to be used unconditionally for
the configuration to be allowed

To untwist it, we probably need a top-down approach, hence stating
the expectations clearly in the OCF standard, as proposed for 1.

Complicated semantics of mount is exactly one such example where both
aspects shall be covered in the standard expressly:

possible intertwisting of different parameter sets agent instances
on stop operation (and perhaps elsewhere)
for bind mount points, there could be a way to arrange for
"last to leave the resource will trigger full-fledged stop", i.e.,
a concept built over an enforced uniform (stop order the exact
inverse of start order) ordering of the bind instance to be
fully inside the life-time of the other managed mount point it
happens to delegate further (bind mount point would always
had to be stacked under true mount, borrowing its target path
as its own source path, never umounting on stop)

For 1., the standard shall be clear on the precautions agents are
meant to take to assure the general sanity:

see the proposal
ClusterLabs/resource-agents#1304 (comment)
and also the requirement on the resource manager to explicitly
avoid parallel executions of the same-parameter-sets (subject to
definition) instances

For 2., the metadata-level way of expressing "combinability" (stackability)
of the agents shall be devised. Prior art in rgmanager can be a useful
source of inspiration.

The text was updated successfully, but these errors were encountered:

jnpkrn · 2019-03-16T14:41:53Z

Note that systemd type of resources has the combinability/stackability
problem inherently resolved (After=, Conflicts=, etc.).

Initscripts are, from today's perspective, diminishingly weak, but
for them, it's at least a well-known fact they are only good for an
isolated run (complex relationships are better expressed directly
within a single initscript), and concern 1. doesn't apply to them,
since they are inherently singletons within systems (unlike with
template unit files if they are to get any sort of native support).

jnpkrn · 2019-04-02T16:46:14Z

See also an idea of stackable-1 profile and of profiles overall
that could accommodate such an opt-in extension very gracefully and
in a unified manner. Consequently, this would stand as a main motivator
for this framework of profiles on top of non-optional bare-bones OCF
core standard.

jnpkrn mentioned this issue Mar 16, 2019

RA: Filesystem: bind mount point unmounting postcondition check failure (for being member of 2+ mounts in misordered chaining scenario) has the power to unintendedly(!) kill all processes ClusterLabs/resource-agents#1304

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice) #22

OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice) #22

jnpkrn commented Mar 16, 2019 •

edited

Loading

jnpkrn commented Mar 16, 2019

jnpkrn commented Apr 2, 2019

OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice) #22

OCF's tunnel vision regarding fully isolated runs of the agents (not the case in practice) #22

Comments

jnpkrn commented Mar 16, 2019 • edited Loading

jnpkrn commented Mar 16, 2019

jnpkrn commented Apr 2, 2019

jnpkrn commented Mar 16, 2019 •

edited

Loading