Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not calling Subscription methods in onComplete #481

Open
olotenko opened this issue Feb 20, 2020 · 74 comments
Open

Not calling Subscription methods in onComplete #481

olotenko opened this issue Feb 20, 2020 · 74 comments

Comments

@olotenko
Copy link

https://github.com/reactive-streams/reactive-streams-jvm/blob/master/tck/src/main/java/org/reactivestreams/tck/SubscriberBlackboxVerification.java#L127 - the test and the requirement are not enforceable.

The requirement is unnecessary for Publisher s whose Subscription is meant to be as good as cancelled at the time onComplete is called (reads: before Publisher calls onComplete) - it is required to behave as no-op. The detection of whether onComplete is in progress at the time a Subscription method is called, is not possible.

@akarnokd
Copy link
Contributor

Do you doubt the TCK can detect this case or do you want to detect this case and then, well, not really do much about it?

@olotenko
Copy link
Author

I doubt this rule adds value to Subscriber correctness. I doubt this rule is enforceable in real life.

If a Publisher fails when a Subscription method is called during onComplete, you need a rule (I believe the requirement for Subscription to be as good as cancelled is good enough) and the test that is going to show the problem with the Publisher.

@akarnokd
Copy link
Contributor

Yes, upon a call to onComplete or onError, the Subscription should be considered cancelled and any future request or cancel calls should be no-ops. So at best, you'd be wasting a cancel/request call and in RxJava, such calls have no effect observable effect.

Knowing non Rx-style implementations, I'm almost certain this rule has been added to prevent the overhead and delayed side-effects caused by calling request/cancel to affect state/performance.

What's the real worry about rule §2.3 is that, in principle, it prevents retrying the same Publisher by triggering subscribe from within onError. We have techniques to avoid infinite recursion so we side-step this limitation in the spec. Besides, with PublisherVerification, such onError-subscribe calls happen above the TCK and thus it can never really know. For the SubscriberVerification both the TCK and your implementation would have set up a very specific scenario, which, afaik, is currently not even implemented.

@viktorklang
Copy link
Contributor

It's a Subscriber check. So it is attempting to check that a Subscriber isn't interacting with the Subscription in a place where it shouldn't. Given how many implementations which run this TCK, I'm not sure what the problem seems to be?

@olotenko
Copy link
Author

olotenko commented Feb 20, 2020

In a concurrent world it is not possible to know whether Publisher entered onComplete / onError of its immediate Subscriber.

Efficiency argument is a good argument for not calling cancel a lot, but not a good argument for requiring it is not called. In a concurrent world it is inevitable that you either sometimes call request / cancel during onComplete, or miss calling request / cancel when you should have called them.

@viktorklang
Copy link
Contributor

@olotenko

In a concurrent world it is inevitable that you either sometimes call request / cancel during onComplete, or miss calling request / cancel when you should have called them.

That would be a bug.

@akarnokd
Copy link
Contributor

A Publisher talks to its Subscriber directly via calling onXXX methods thus it can know at least there was a cancel call while onComplete was executing.

You can try heuristics to inspect the call stack in cancel for synchronous cancellations. However, there are legitimate operators that would call cancel from an onComplete. For example, a Subscriber 1 to Publisher 1 acting as a gate for Subscriber 2 to Publisher 2 and Subscriber 1 onComplete, calls Subscription 2 cancel. A stack analysis would still find an onComplete and a cancel on a trace.

The spec doesn't specify sanctions and many constructs automatically defend against such "rogue" calls so I don't see any issues for creating dependable implementations.

@viktorklang
Copy link
Contributor

2.3 is about the Subscriber.

Subscriber.onComplete() and Subscriber.onError(Throwable t) MUST NOT call any methods on the Subscription or the Publisher.

While processing the onComplete or onError signal, the subscription or the Publisher should not be touched by the Subscriber.

@olotenko
Copy link
Author

@viktorklang a bug in what?

Subscription.cancel is thread-safe, and does not require any further coordination with anything. This means it can happen during onComplete / onError. This means a conforming Publisher should be ok with this. This means there is no special need for Subscriber to watch out whether it is calling cancel "during" onComplete / onError. In other words, if a Publisher is known to be able to deal with concurrent calls to cancel, what's the value in attempting to preclude synchronous calls specifically?

@olotenko
Copy link
Author

Yes, that's the rule I am talking about. It makes no sense in the presence of the other requirements.

@viktorklang
Copy link
Contributor

@olotenko The spec aims to create well-behaved integrations, and the Subscriber adhering to this rule means that it is less likely to cause when connecting to other implementations.

I still fail to see what issue this is causing for you. Can you elaborate?

@olotenko
Copy link
Author

Basically, the 2.3 rule of the spec requires mutual exclusion between Subscription method invocations and onError / onComplete.

@viktorklang
Copy link
Contributor

@olotenko I still don't see what your issue is. Upon processing an onError/onComplete signal the Subscriber should not touch the subscription or Publisher, which is fine, since it has already signalled an unrecoverable failure or end-of-stream. How is this creating issues for you?

@olotenko
Copy link
Author

olotenko commented Feb 25, 2020

Let me rephrase. "Mutual exclusion" above is a literal requirement to have a non-reentrant lock around onComplete, onError, request and cancel. You cannot require this for an efficient system with no control of how the request or cancel are timed.

If you see that concurrent cases are necessarily allowed (and all implementations that I've seen do not do this locking business, if you want a "proof by consensus"), how can the synchronous invocation of cancel or request during onComplete be even problematic?

For these reasons rule 2.3 has zero meaning. So referring to it anywhere, especially in TCK, is not proving anything.

How is this creating issues for you?

No, the question should be posed the other way. What problems does it solve?

@viktorklang
Copy link
Contributor

@olotenko To call cancel and request after receiving a onError or onComplete indicates a bug, since that is a nonsensical operation, is it not?

@olotenko
Copy link
Author

@viktorklang I am trying to get across a very simple argument: the Subscriber cannot always tell if onComplete has been called or not.

As simple as that.

Rule 2.3 justification part says "during". This is handwavy. Let me make it concrete. It means "onComplete enter happens-before request enter, and request enter happens-before onComplete exit is forbidden".

All I am saying that you can throw away any statement following the first happens-before. Any clause about what the Subscriber should do, cannot be enforced, because the Subscriber does not know whether the Publisher already entered onComplete or not.

If this is too hard to see, imagine the Publisher is suspended after reaching the first instruction of onComplete. Subscriber cannot know this. The concurrent part of Subscriber emits request. This is not in violation of the uses of Subscription - they are meant to be concurrent. But the spec will flag this as violating 2.3 - because it is "during onComplete".

Now, I can assume that maybe there is something useful in restricting program order specifically? A conforming Publisher supports concurrent invocations of request the timing of which with respect to onComplete is unknown and not controllable, like above. Why would the invocation of request in program order be any special? Is the Publisher failing if I do call it not concurrently? Is the Subscriber failing in its duty to.... do what exactly? Ruin the total order of on* signals? Cause errors in the Publisher where there is none? Duplicate signals? These are the things that matter. I don't see which of these are broken.

cancel and request are meant to be no-op upon reaching onComplete/onError. This means it is safe to call them. This means it is better to call them than not, when you need to be sure.

No, I do not agree that the invocation of cancel during onComplete is a bug. The simplest possible code can have common termination routine used to cancel, process onError and onComplete. I see no reason to forbid this setup. I can see even several examples of real implementations where the simplicity of having a common routine beats the complexity of machinery necessary to reliably pass rule 2.3 - plus remember that it is not really detecting any fault with temporal logic.

Ok, the simplest thing is a Processor - that is, it is both a Subscriber and a Publisher. The Publisher part may fail - and immediately call cancel. Or the Publisher part may decide that more data is needed - and immediately call request. Blindly following 2.3 you need to introduce massive locking to ensure that the Publisher part is mutually exclusive with on* signals, and this actually needs doing with the upstream's view of when on* is considered invoked.

2.3 simply is not saving anyone from any real fault, but is definitely an impediment to efficient implementations of Processor.

@viktorklang
Copy link
Contributor

@olotenko It would seem like you are conflating the Publisher issuing the onComplete signal and the Subscriber receiving the onComplete signal.

cancel and request are meant to be no-op upon reaching onComplete/onError. This means it is safe to call them. This means it is better to call them than not, when you need to be sure.

This does not make any sense to me, when the Subscriber receives either onError or onComplete, it has no reason to interact with the subscription—the communication channel is already "closed".

No, I do not agree that the invocation of cancel during onComplete is a bug. The simplest possible code can have common termination routine used to cancel, process onError and onComplete. I see no reason to forbid this setup. I can see even several examples of real implementations where the simplicity of having a common routine beats the complexity of machinery necessary to reliably pass rule 2.3 - plus remember that it is not really detecting any fault with temporal logic.

Do you have a concrete example?

Ok, the simplest thing is a Processor - that is, it is both a Subscriber and a Publisher. The Publisher part may fail - and immediately call cancel.

It would seem like you are conflating the upstream Subscription and the downstream Subscription in that case?

Blindly following 2.3 you need to introduce massive locking to ensure that the Publisher part is mutually exclusive with on* signals, and this actually needs doing with the upstream's view of when on* is considered invoked.

No. Processing of a signal in the Subscriber does not need any locking, especially for the case where the Subscriber is synchronous (runs on the calling thread). Can you illustrate your interpretation of this problem with some code?

@viktorklang
Copy link
Contributor

@olotenko Initially when we started formalizing the spec we were able to run exchanges between Publishers and Subscribers on a typical dev laptop at ~200mops, are you having requirements exceeding that number?

@olotenko
Copy link
Author

I am not conflating anything.

I have explained the problem in terms of temporal logic. You do understand happens-before, right? This is as close as it is going to get to disclosing internal code.

@viktorklang
Copy link
Contributor

@olotenko The following looks to me to indicate a misunderstanding/conflation between signal issuing and signal processing:

All I am saying that you can throw away any statement following the first happens-before. Any clause about what the Subscriber should do, cannot be enforced, because the Subscriber does not know whether the Publisher already entered onComplete or not.

@olotenko
Copy link
Author

@viktorklang

signal issuing and signal processing

This is a statement without temporal logic. How are you going to reason about "during", "before" and "after", if there is no temporal logic? I've translated the "during" into the temporal logic statements that make sense to me in the context of how the TCK test is implemented.

@viktorklang
Copy link
Contributor

viktorklang commented Feb 26, 2020 via email

@olotenko
Copy link
Author

:) and all you can measure in the test is that request exit happened-before onComplete exit. You cannot prove anything about the timing of when it entered request.

You can neither specify this, nor test reliably.

@viktorklang
Copy link
Contributor

viktorklang commented Feb 26, 2020 via email

@olotenko
Copy link
Author

olotenko commented Feb 26, 2020

Calling request after observing on* - I agree! But that's not what I disagree with. :)

I disagree with the spec requiring an impossible condition that is not the same as observing the on*, and I disagree with the TCK test that does not even test that.

Also, I posit that there is no statement about the required order of on* and request in temporal logic that can be tested. Because there's a gap between "observing on* has been called" and calling request. You can only require that if the caller of request observed that it is after on* has been invoked, it shouldn't invoke request. But this is not enforceable by observing side effects.

@akarnokd
Copy link
Contributor

akarnokd commented Feb 26, 2020

I've just run into a false positive with rule 2.3.

I'm implementing the Eclipse Microprofile Reactive Streams Operator coupled. It has the requirement that if the outlet Publisher completes, it should reach over to the inlet and cancel its Subscription. This means that the stacktrace of who invoked cancel will contain onComplete, even though it didn't came from the inlet Subscriber.onComplete.

Also there is another rule for when the outlet fails with onError. Guess what's going to be in the stacktrace of cancel now!

Since the MPRS uses the Reactive Streams TCK internally, I can't just disable these rule tests.

@olotenko
Copy link
Author

@akarnokd bingo! This is the sort of case I had in mind.

Anything that is Processor<Publisher<A>,A>-like, is under threat of false positives of this kind. I did not want to bring up specific cases, because I don't want the test to be fixed for specific cases. It just is not possible to catch the condition that actually makes sense.

@viktorklang
Copy link
Contributor

@akarnokd Sounds more like an issue with the test than the spec tho? I'm all for making the test case better—if you have any ideas on how to improve it, you know the drill :)

@akarnokd
Copy link
Contributor

@viktorklang Sure, I'll look into the test implementation.

@olotenko
Copy link
Author

:) yes, just make it harder to reproduce

@OlegDokuka
Copy link
Member

OlegDokuka commented Mar 1, 2020

Reading the whole thread I hardly seeing any problems with the rule discussed. The problem discussed are unrelated to the rule.

I guess @viktorklang clearly stated that Subscriber must not call subscription .cancel or .request from onComplete or onError.

It means if a Subscriber receives an onError call - it does not have to use
subscription with-in that call and in the future. (of course, I can see other parties calling "disposual" on the Subscriber simultaniosly while complete|error occurse, but it does not violate the rule 2.3 (@viktorklang correct me if i'm wrong).
Agree, from the Publisher perspective, everything should be as @DougLea said, Subscriber should stop emitting any signals using it subscription eventually, and that statement is totaly true. Once a Publisher emmited a terminal signal, the Subscriber may do another operation on the subscription same time. So it may happen, the subscriber may emit "request|cancel" the same time as onComplete|Error is occuring which is racing and that is fine (rule says zero about that from my point of view).

It also means if Subscriber receives onComplete call - it does not have to use subscription as well (same as above).

For many cases, if racing occurse:

For example, ZipOperator, in which inner subscribers expose subscription to an outer one, the simplest InnerSubscribers can do is replacing existing subscription with NoOpSubscription. In such a case, if a Subscriber is going to be called after onError|Complete appeared (and yes, ZipOutter has no clue about that), the call appeared will no effect on original subscription and will end-up calling NoOpSubscription.

On the other hand, it does not violate what is written in spec, when both onComplete and Subscription cancel|request are happening simultaniously (a.k.a racing). It means that it is fine that subscriber called subscription.cancel before it got observed onComplete|Error


Summing up what I have seen in this thread. There is a mess between understanding the rule which simply says "do not call subscription please in error|complete methods cuz it may cause more pain for publisher | subscription)" and including to that rule extra meaning which is "if onComplete|Error happened-befor subscription is tried to be called from some other places, then you MUST ensure it will not be called".

Hope I got it correctly and feel free to correct me @viktorklang @akarnokd @DougLea

Regards,
Oleh

@rkuhn
Copy link
Member

rkuhn commented Mar 1, 2020

Thanks @DougLea and @OlegDokuka, now I finally understood what this thread is about! As @OlegDokuka says, the rule itself does not demand any of the behavior discussed between @olotenko and @viktorklang, linear logic is not required for its understanding: it is simply the statement that the onError/onComplete handler itself shall not call the request/cancel methods. This is explicit in the wording: I could see the confusion above arising if it said “The Subscriber MUST NOT call … ”.

With this understanding, I think the intent clarification for this rule is misleading, as race conditions are not the target of this rule — races between request/onError for example are genuine and permitted.

@DougLea
Copy link
Contributor

DougLea commented Mar 1, 2020

So if we replace spec rule with my suggested:

Upon onComplete and/or onError, a subscriber must eventually stop calling any Publisher or Subscription methods.

We should replace intent with:

Sessions must eventually terminate. Among the ways to help ensure this, subscribers should not call Publisher or Subscription methods from within onComplete or onError methods.

Yes?

@OlegDokuka
Copy link
Member

OlegDokuka commented Mar 1, 2020

I think it should be a separate rule for Publisher if there is a need to clarify that Publisher after emitting a terminal signal should eventually observe no signals anymore.

I don't think that rule 2.3 has to be rewritten for the Subscriber's side since there are no challenges on doing getAndSet operations in order to replace existing Subscription with no-ops one (see MonoProcessor example). It means once replacement happened, any further interactions with the Subscriber instance will have no-op on the Publisher

I would like to say even more. Sometimes the internal ecosystem of every implementation may violate specification and does it in many cases. The point is that once there is integration with an external implementation then it should follow specification and provides all the guarantees (see StrictSubscriber)

@viktorklang
Copy link
Contributor

@DougLea @olotenko @rkuhn @OlegDokuka

It's important to remember 3.1. Exposing Subscriptions in potentially concurrent situation makes it really difficult to reason about what the source of truth is: If one "owner" of a Subscription calls cancel() all while another calls request—what is expected?

As for 2.3, a wording like "A Subscriber MUST eventually discard its Subscription once onError or onComplete has been received." would be absolutely fine in my book, but as @DougLea stated, does not help with TCK validation thereof.

One complicating factor of proposing this verbiage is that we cannot guarantee that it will not break any implementations of Publisher / Subscriber which is relying on the fact that the Subscription should not be touched onError/onComplete or thereafter. https://github.com/reactive-streams/reactive-streams-jvm/tree/v1.0.3#2.3

Sure, we could make an educated guess, but to be honest, changing the spec for "easier implementation" while taking the cost of potentially breaking others' TCK compliant code seems like a bad tradeoff in my book.

@olotenko
Copy link
Author

olotenko commented Mar 2, 2020

@viktorklang I don't see how replacing "MUST NOT call" with "eventually should stop calling" can be a breaking change.

If one "owner" of a Subscription calls cancel() all while another calls request—what is expected?

I have no problem reasoning about this. My mental model permits concurrent request and cancel, and allows the delivery of all the requested items (cancel is observed eventually) (except when requesting an infinite number of items or Long.MAX_VALUE - in this case delivering all the requested items is not allowed) as well as fewer than requested items (request is observed eventually). There is no requirement to observe request and cancel in a specific order, the protocol does not require signalling anything if fewer than requested items were delivered, and the Subscriber will never know when the Publisher decides it will not produce any more items (the other issue I raised several months ago; I am ok not to bring it up, but you see now it is related to knowing when the stream is closed :)).

All of these combined mean that the Subscriber should be prepared to receive all the items it requested, even if it knows cancel has been issued.

A slightly related view: in TCP/IP request is advertising how much of the window is free; that is, acknowledging the last consumed sequence number, so the sender then knows that window size space is available from that sequence number. You can do a bit more with request in reactive flow, but the reasoning method is similar: it only advertises how much the Subscriber is prepared to consume, it says nothing about how much the Subscriber needs.

This means if the Subscriber may leave something unfinished, some resources pinned, if it issues request with cancel concurrently, the Subscriber has to synchronize its view of the stream - so that the necessary cleanup occurs, and it is prepared to discard all items delivered after the cleanup without signalling errors. I can't see how any specific wording can help enforce the cleanup does occur. (This concurrent fight would have been reduced to nothing, if that other issue I raised were taken seriously - ie if the Publisher were required to signal termination when it observes cancel)

But I'd like to underscore that these problems do not depend on synchronization between cancel and request - the behaviour depends entirely on whether all requested items are delivered, or not. That is, the behaviour is the same even if cancel is fired after the last request, but before the last requested item is delivered.

@OlegDokuka et al.

  1. The spec as stated is unenforceable because of concurrent cases
  2. The test as implemented is the first test in the world that prohibits the use of linearizeable functions synchronously
  3. (Slightly related - the assertion that "closing a closed stream is a logical mistake" is actually not true. A more accurate description is "tautological")

@olotenko
Copy link
Author

olotenko commented Mar 2, 2020

@DougLea I think any wording that replaces "must" with a weaker verb is fine.

@rkuhn
Copy link
Member

rkuhn commented Mar 2, 2020

  1. The spec as stated is unenforceable because of concurrent cases

This is a red herring: the spec only says that Subscription.cancel MUST NOT be on the stack while the Subscriber.onError pertaining to that same Subscription is also on the stack. This clearly is enforceable. I still have not seen a motivation or example for why this rule should be changed. What I have seen is a lot of enlarging the scope beyond what is written and then showing that that leads to undesirable consequences.

I think any wording that replaces "must" with a weaker verb is fine.

This, as a general statement, is not true: we currently have the rule that people MUST NOT kill other people. Weakening this into SHOULD NOT is definitely a breaking change.

Now whether a weakening would break existing Publishers in this particular case is a much more difficult question (it would not break Akka Streams, for example, but that is due to its use of an overly safe implementation primitive).

@viktorklang
Copy link
Contributor

@rkuhn I appreciate your input on this, I think you're hitting the core of the matter—a Publisher cannot know whether a Subscriber is sync or async, so perhaps the rule should be clarified to specifically refer to the sending part of onComplete/onError rather than the all-encompassing observation of aforementioned signals?

@olotenko
Copy link
Author

olotenko commented Mar 2, 2020

@rkuhn

The spec only says...

Well, I am looking at the spec and it is not saying that. It is not saying anything about the stack, and the motivation plugs in a few red herrings about cycles and race conditions, and ambiguities about what "during" means. Plus I really really really fail to see what problem such a requirement solves. Why wouldn't a Publisher for whom such a behaviour might be a problem, plug in a test in cancel that if it is called "during" on*, do not enter the cycle. Just set a variable canceled to true. Now test it before entering the "cycle". Done.

Now whether a weakening would break existing Publishers in this particular case is a much more difficult question

No, not difficult at all. It may be hard to find this in this long thread, but I think it's been discussed there are provisions in the spec already that require the Publisher to handle these cases. In particular, since you agree the concurrent cancel or request should be handled by the Publisher, I can't imagine why weakening the MUST NOT call cancel to SHOULD NOT call cancel or ADVISORY to not call cancel would matter.

@OlegDokuka
Copy link
Member

@viktorklang I guess it is an extra rule to Publisher side that it should stop receiving any signals eventually. Do you have anything else in mind?

@viktorklang
Copy link
Contributor

@OlegDokuka The problem is that such a rule is generally unverifiable.

@olotenko
Copy link
Author

olotenko commented Mar 3, 2020

@DougLea I think that wording is very good. It made me think of the flip side of the same problem. I could not find the "guaranteed progress" easily. For example, as follows:

Guaranteed Progress

Subscription MUST eventually make progress to enable reaching a terminal state. This means the Subscriber MUST eventually call request or cancel after the last requested item has been delivered.

(Imagine a Subscriber received all the data it wanted, and because it received everything, it doesn't request any more - Publisher will be stuck, pinning resources. Something like a JSON parser requested two bytes, sees "{}", and doesn't want to parse any more - it has a valid object; the remainder of the content remains unconsumed (although possibly could result in a syntax error, if it's not just whitespace). The guaranteed progress requires Subscriber to drain Publisher, or cancel)

I am fully conscious of the fact that this is non-enforceable, and is subject to the Halting Problem. However, the intent is to make the designers of the Subscriber think under what circumstances the request is called after the last item has been delivered, whether there are ways to test that that condition is reachable, and whether they maybe need to consider setting up a timeout.

This is a flip side of:

Guaranteed Quiescence

Upon onComplete and/or onError, a Subscriber MUST eventually stop calling any Publisher or Subscription methods.

(your wording)

@rkuhn
Copy link
Member

rkuhn commented Mar 3, 2020

Well, I am looking at the spec and it is not saying that. It is not saying anything about the stack, and the motivation plugs in a few red herrings about cycles and race conditions, and ambiguities about what "during" means.

We all agree that the intent clarification for this rule needs to change because it is misleading, so let’s stop discussing this point. The remaining discussion is about the rule itself:

[§2.3] Subscriber.onComplete() and Subscriber.onError(Throwable t) MUST NOT call any methods on the Subscription or the Publisher.

This references two function symbols and then says that these “must not call X” — this can only be interpreted to apply to the implementation of said symbols in the respective Subscriber. To “call” methods is a well-defined notion in Java that does not require further definition. A reasonable extension is that methods called from these two symbols must respect the same constraint, otherwise the rule would be trivially uneffective. Thus, I have used normal logic (nothing linear) to prove my point. Statements to the contrary must include a proof that my derivation is flawed.

Now, what you propose is to shift the responsibility of avoiding cycles from Subscriber to Publisher: currently rule 2.3 mandates that the Subscriber must break a possible cycle, while you explicitly propose the opposite:

… plug in a test in cancel that if it is called "during" on*, do not enter the cycle.

It is obvious and irrefutable that this is a breaking change to the spec, so no matter how desirable it is, we cannot do it without changing the major version number. The only remaining question is whether this change is worth such a cost.

@olotenko
Copy link
Author

olotenko commented Mar 3, 2020

@rkuhn you have not demonstrated how a conformant Publisher (ie the one that supports concurrent cancel) can be broken by calling cancel synchronously.

I have given an example of how a non-conformant Publisher can deal with it. The burden of proof is not on me, because first one needs to demonstrate it is an issue for some conformant Publisher.

I get a feeling no one really knows what that rule saves from. I have seen claims of "races", "cycles", "logical errors", and some unknown problems with existing Publisher s.

Here's my reasoning of why the change is not breaking conforming Publisher s:

Premises:

  1. concurrent request / cancel can occur.
  2. Publisher is required to call onComplete / onError once and only once.
  3. Publisher is required to call on* serially

So the conformant Publisher can determine during request or cancel whether there is a thread that has committed to deliver an on* (2 - once and only once, if these are terminating signals), and establish a total order (3) with those invocations, if necessary to perform further invocations (say, if the invocation that has been committed is a onNext). From (2) it means that even after establishing the total order they should ensure any on* are not signalled after onComplete or onError.

I think it is up to the supporters of the idea that synchronous invocations of cancel or request are damaging, to show how such a conformant Publisher can fail to establish the existence of a thread that has committed to deliver an onComplete / onError signal.

@rkuhn
Copy link
Member

rkuhn commented Mar 3, 2020

Now we’re talking: while the responsibility for cycle avoidance would change, you are proving that cycles cannot actually occur, so we were arguing about the empty set.

It may be that the rules are overprotective because in some earlier iteration (before 1.0) your current proof didn’t work; or it may be that there is another issue that this rule guards against that I cannot recall right now. @viktorklang does this ring a bell? The only other point I can find is that we were worried that the Subscription’s resources have already been released when entering onError, but that is defined to not be a problem by §3.6 and §3.7 which require request/cancel to be NOPs after the Subscription has been considered cancelled.

If this holds, then we could indeed remove rule §2.3 without any effect on the spec.

Archeology:

@olotenko
Copy link
Author

olotenko commented Mar 3, 2020

@rkuhn I can think of a design that almost fits the current spec, where a thread executing cancel may fail to detect onComplete, when calling cancel synchronously, but the current spec has several provisions that disallow such a behaviour of cancel, because of the requirements it must fulfil for the concurrent uses.

@viktorklang
Copy link
Contributor

@olotenko @rkuhn If some other thread rather than the thread which receives the onComplete, calls cancel() is should in theory not be a problem for the Publisher, as it would be indistinguishable from a delayed onComplete reception—hence the problem would be confined to the Subscriber implementation (i.e. the Subscriber implementor is responsible for making sure that itself is internally sound).

@olotenko
Copy link
Author

olotenko commented Mar 3, 2020

the Subscriber implementor is responsible for making sure that itself is internally sound

Yes! (That's the starting point of my reasoning about separation of concerns :))

@viktorklang
Copy link
Contributor

@olotenko But that does not address the Publisher issue w.r.t. reentrancy.

@rkuhn
Copy link
Member

rkuhn commented Mar 3, 2020

Ah, you’re saying that with the current spec the Publisher may rely on rule 2.3 to not handle calls to cancel/request while onError/onComplete are ongoing? That would be a gray zone already, given that at the point of signal emission these Subscription methods need to equate to NOP. (Please let me know if I understood you correctly.)

@DougLea
Copy link
Contributor

DougLea commented Mar 4, 2020

I don't have anything new to say about this, so will just restate:: Specs (especially of protocols) normally include few if any "MUST NOT" clauses, which is one reason most lack progress guarantees. But we'd like to at least guarantee termination after onError/onComplete. The current rule 2.3 singles out one case that may otherwise prevent termination. Rephrasing a but more generally would be more accurately reflect intent, at the cost of no longer being able to test for this one particular case that may (or may not) cause non-termination. I think this is an OK tradeoff.

@olotenko
Copy link
Author

Example of the implementation that violates this part of the spec as it stands: helidon-io/helidon@1673ddf#diff-0f519ea307142812e3a65e47f248e24cR178

@OlegDokuka
Copy link
Member

@olotenko FYI. It is not if you put return instead of continue. if you see upstreamDone is true then requesting upstream makes zero sense (since it is done).

I do not fully understand why continue was put after you drained the queue and upstreams is done, but once you return there, you will not be violating anything

@olotenko
Copy link
Author

@OlegDokuka Both assumptions are not correct. We don't need to discuss what's wrong with these suggestions, nor any other issues with that code. I only point out yet another reasonable implementation that passes the TCK only in letter.

@OlegDokuka
Copy link
Member

OlegDokuka commented Mar 21, 2020

I guessed if you point to the implementation which violates something, then there should be reasons why it works like that. From what I saw there are bags in the implementation and only because of that the implementation violates the spec.

Also, I do not fully understand 'passes the TCK only in letter'. It either passes or not. Yup some rules are not testable, but this is another challenge, so better to contribute there rather than pushing spec changes because of the bags in the implementation.

Please correct me if I missed something important which should change my mind.

Cheers

@olotenko
Copy link
Author

No. I am questioning the rationality of the existence of the rule. So you can't use the argument that the code can be fixed to comply.

First show me what Publisher gets broken, if that rule is removed. From what problem is that rule saving me? I've asked this in several guises, but I don't recall a straight answer to neither part.

'passes the TCK only in letter'

Refers to the inability of the TCK to test the condition. Yes, some rules are not testable. This is one of them, as was my premise all along.

@OlegDokuka
Copy link
Member

OlegDokuka commented Mar 21, 2020

I can imagine a pooled resource that can be shared between several subscribers. If the subscriber is not going to follow the instruction, and continue calling subscription, then we can get an incorrect state or data from an unrelated source.

That is more as a unique case, and I tend to agree that most probably nothing is going to be broken, but as of me, this is a natural recommendation that MUST be followed.

If folks stop going to follow that, we can have more issues rather than profit.

And yes, what profit is going to be brought by removing that rule?

@olotenko
Copy link
Author

I agree with a general advice to stop calling subscription methods eventually.

what profit

Obviously, stop introducing unnecessary complications into perfectly good implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants