-
-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancellation may not always immediately happen at checkpoints #2973
Comments
For the record, not all I think the waiting itself will be cancellable, so it's not a problem of putting a cancel point right after waiting either. Maybe my impression is wrong, but I think this is a case where things seem fine. Specifically, I think It might be worth clarifying in the docs that a "checkpoint" is specifically something that yields back into trio (AFAIK), so like, just making an async function doesn't necessarily mean it has one. |
Oops, I did only mean trio calls, that was just a slip sorry (and when I repeated the claim later I did say
I don't have an application with this behaviour that I want to fix particularly. It's just I think that the docs should clarify that this could happen, or we should consider that we never want it to happen and then carefully check/fix all Trio functions (with a cancellation check after the wait/schedule/whatever is relevant). |
The reality is that only This is why the docs are not explicit about why the docs are not explicit about when it checks... it varies! And it should be kept an internal detail unless a specific need arises so that we retain flexibility for refactoring. Now what I would like to see when documenting cancellation is some distinction between cases where seeing |
@richardsheridan Let me get a few things out of the way before getting to the main point:
OK, to the main point:
My original example already makes it clear that not all Trio APIs have the behaviour I describe. The question is actually: could all Trio APIs have that behaviour? Would we want it? I don't think that is so clear. Operations that have (potential) side effects throughout their duration could have a rule: always raise It would also be very easy to add this rule for operations that wait for a condition and then atomically mutate at the last moment, like I discussed in my earlier comment. This includes reading from a memory channel, as in my original example. I believe that reading from a socket also falls into this category in both select and IOCP implementations, but I don't believe it is guaranteed in principle (the original IOCP implementation did an async read). The only API where I think this rule would be troublesome is where you wait to be able to perform a mutation, and then do it, but the mutation is not quite atomic, but perhaps involves a callback that will happen in just a moment (never waits indefinitely). Currently, you could give the strong cancellation guarantee (never has side effects if
This sounds like a very general statement, but it's actually not: the only aspect currently not documented is the last-moment check we're talking about now. As I've already said, a check at the start is always required (although it's allowed to have side-effects, as you mentioned for At the very least, I think the general docs should be updated to note that checkpoints may return without raising
Stealing from C++'s terminology of "strong exception guarantee" vs "weak exception guarantee", I would call these cases strong cancellation vs weak cancellation. Agreed they should definitely be documented. In fact, I think they probably normally are, but no doubt there are gaps. Certainly, I don't see anything about this for |
Here's the current state of affairs: import trio
async def no_internal_checkpoints():
pass
@trio.run
async def loops_forever():
with trio.fail_after(0.1): # does not fire, due to lack of checkpoints!
while True:
await no_internal_checkpoints() ...which is why https://github.com/python-trio/flake8-async provides the We can't introduce an implicit checkpoint on the way in to an Implicitly checkpointing on I'd be happy to see a PR for some more docs on this topic, but I don't think there's anything else we can do. |
That is unrelated to this issue. Your point is that an arbitrary async function might not include a checkpoint. That is true. But this issue is exclusively about async functions in the However, they are only guaranteed to be checkpoints on the "way in". That means they may complete successfully even if they are in a scope that is cancelled when they finish. My point is that:
|
I don't think we can guarantee checkpoint on exit - only make it somewhat more likely: async def some_fn(url):
response = await do_some_io()
# what we if get cancelled some time after this point?
x = process_result(response)
return x
async def some_fn_with_checkpoint_on_exit(url):
response = await do_some_io()
x = process_result(response)
await trio.lowlevel.checkpoint()
# adding a checkpoint just moves that problem to here.
return x and I really don't think that's worth the code complexity when it's still not guaranteed. If you have a suggested clarifying change to the docs I would be very happy to get that merged though!
Strictly speaking the invariant is that they either checkpoint, or raise an exception. You can therefore livelock with e.g.: import contextlib, trio
@trio.run
async def livelock():
# can't cancel because the loop body raises (and catches) an exception without checkpointing
with trio.move_on_after(0.1):
while True:
with contextlib.suppress(TypeError):
await trio.sleep("1") # raises an exception before checkpointing and while I don't like this code, it doesn't violate any of Trio's invariants. |
@Zac-HD Are you saying that having But your snippet isn't what I was asking for anyway: it would change the semantics, e.g. for a memory channel it would already have popped the item off and then silently discard it. Maybe I'm taking you too literally. But my original point is that it seemed like you could avoid changing semantics on more cases than you might expect (e.g. socket reads are actually wait for data to become available + do sync read, so you could put your final cancel check in between). Anyway, I've come to accept that there probably places where you really couldn't preserve semantics with my request (perhaps IOCP or SSL as I said earlier). So I've updated the original issue to request this behaviour only on particular cases where it can be guaranteed, and to update the documentation on the other cases. (I get you point about cancellation not being checked if there's an exception; I mentioned that in an earlier comment and I don't think it affects my point.) |
Updated issue text (after thinking about it a bit)
Summary
Any awaitable function in Trio (i.e.,
await trio.foo(...)
) is guaranteed to be a checkpoint. That means, aside from being a scheduling point (which is not relevant to this issue), that it is a cancellation point: i.e., if the function is called from within a cancellation scope that is cancelled (subject to shielding rules) then it will raisetrio.Cancelled
. Of course, a function that is waiting (in a sleep, for I/O data, etc.) can also be interrupted by a cancellation (with a few well-documented exceptions).But what if the cancellation scope becomes cancelled just as the function is finishing? Then, surprisingly (to me), it might not raise
trio.Cancelled
. Here's a simple example:This will print the message. That's because
await ev.wait()
simply returns, despite being in a cancelled scope at that point.What's more, if you swap the last two lines of
main()
(ev.set()
andn.cancel_scope.cancel()
) then the message will not be printed, becauseawait ev.wait()
raises atrio.Cancelled
exception instead. I would expect that the behaviour depends on the overall state at that moment (the event is in a set state and in a cancelled context, so one or the other deterministically wins). Instead, it depends on the order things happened, even between checkpoints. I can see why that is likely to have happened internally, but it still seems against the spirit of Trio.Why I care
Here's an almost-concrete example of why this is a problem, using my little aioresult library:
Here's the risk:
produce_result()
anduse_result()
are both cancelled but,await rc.wait_done()
returns without exception (because whenproduce_result()
finished withtrio.Cancelled
that set the event insideaioresult.ResultCapture
). That means thatrc.result()
raises anaioresult.TaskFailedException
(because the routine it wraps raised an exception). Which means that the simple cancellation get converted into a full-fat exception propagated out of the nursery!Now, I don't think this can happen because the event gets cancelled just before it's set in this case. But it seems like it's got a fragile dependency on undocumented behaviour here.
The request
I think there should be changes relating to this:
await trio.foo()
is in progress then it is possible for the routine to return without raisingtrio.Cancelled
.Event.wait()
andawait trio.MemoryReceiveChannel.receive()
, and probably alsotrio.lowlevel.ParkingLot
. I'm not so interested in the lower-level synchronization primitives (CapacityLimiter
,Lock
,Semaphore
andCondition
) but I suppose it applies to those too.I've not put my money where my mouth is by submitting a pull request!
Implementation
One problem with this idea is that I'm not sure how it could be efficiently implemented.
Ideally, late in the body of these functions (after waiting for their condition to be true) we would re-check for cancellation but without doing a schedule point because we have just been scheduled. That means that we want a sync-coloured cancellation check, but that was suggested 5 years ago in #961 and is not done yet.- I now think this would be best implemented by optionally allowingtrio.lowlevel.wait_task_reschedule()
to still call the abort function, and therefore end up raisingtrio.Cancelled
, even iftrio.lowlevel.reschedule()
has been called. I plan to add a comment with further details but that's the key idea. In the case oftrio.Event
, aside from passing a parameter to enable that behaviour, no other code changes whatsoever would be needed to change ("fix") its behaviour. For other synchronisation primitives (e.g., memory channels), it's a different matter. (To be continued...)Original issue text
Show original issue text
Short summary
My reading of the documentation is that a
await ...
should always end in aCancelled
being raised if it's in an enclosing cancel scope that is cancelled, but that is not actually always the case in practice. This is perhaps a design decision – should that rule always be enforced, or is it allowed for a cancellation request to take a bit of scheduler delay to implement. If the decision has already been made (or we make that decision now) then this may just end up being a documentation issue (or maybe I'm just reading too much into it), but even if so then it could do with clarifying.Actual behaviour
In reality, it is possible for a task to wake from an
await
without exception even if it is in a scope that has been explicitly cancelled.This gist gives a complete example.
It involves running two tasks in a nursery:
send_nowait()
) then immediately cancels the cancel scope for the nursery.receive()
on the channel in a loop.When I run it, the receive task gets one value from the memory channel before getting the cancellation exception.
Documentation
The documentation says that, at each checkpoint, Trio checks whether the task is cancelled:
Neither is really explicit about when within the checkpoint the cancellation state is checked. However, my subjective reading of them (especially the first one) is that is that they strongly imply that it is impossible for
await trio.foo()
to finish with anything other than atrio.Cancelled
if it is in a cancelled scope (barring usual shielding rules) when it completes.Most trio operations are conveniently in two parts: one that waits for the operation to become possible (e.g., socket read available, memory channel non-empty) and then one that synchronously performs that operation. So it certainly seems possible for the cancellation check to happen at the end of the wait step, in addition to before and (effectively) during.
The scheduling point would also happen alongside the wait, even if the condition is already true (and remains true). So, even if there is no actual wait, then it's possible that a cancellation would come into effect during the course of the
await
call, and the position of the cancellation check(s) still matters.Side note about asyncio
Just for curiosity, I tried analogous code with asyncio, using an
asyncio.TaskGroup
in place of a nursery and simulating a cancel by raising an exception from one of the tasks. It appears thatasyncio.Queue.get()
is not a scheduling point, because a receive loop will happily pop 100s of values off of it in a loop after the task that send them has long finished with an exception (but a small perturbation to the code will stop cause it to be cancelled before getting any).The text was updated successfully, but these errors were encountered: