Replies: 10 comments 28 replies
-
Just to offer the "idiomatic" CE3 solution to this problem: import cats.effect._
import java.io._
import scala.concurrent.duration._
object Hello extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val task = IO.bracketFull { _ =>
IO(
new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(new File("/tmp/hello")),
"utf-8"
)))
} { reader =>
IO.interruptible(false) {
var line = 0
while (true) {
println(s"Writing line $line ...")
reader.write("Hello, world!\n")
reader.flush()
line += 1
Thread.sleep(1000)
}
ExitCode.Success
}
} { (reader, outcome) =>
IO {
println(s"Ending with: $outcome")
reader.close()
()
}
}
// Race condition here...
task.timeout(10.seconds)
}
} |
Beta Was this translation helpful? Give feedback.
-
I mentioned on Discord that ...
To add extra food for thought ... what is Well, we're executing asynchronous operations that could be running on another thread, or another node on the network, etc. Asynchronous programs can fail outside the current thread, silently. A network node can disappear, a thread can get killed, a piece of code can die with an exception that never gets logged, etc. The only way to reliably notice malfunctions, the only way to notice non-termination actually, is by measuring time and triggering a concurrent operation that makes us understand that an error happened, as we may want to do something about it, like (forcefully) closing the underlying connection, to free it up for other clients. Asynchronous tasks involve client-server or producer-consumer communications. The client can notice that the server is down via timeouts. The server can notice that the client is stuck via timeouts. Etc. And here's the problem ... given a malfunctioning actor, if you have to wait on its confirmation, that entirely defeats the purpose of the If you actually need back-pressuring on those connections being returned to the pool (or something), that can always be done with a |
Beta Was this translation helpful? Give feedback.
-
(2) here's a related problem ... there's no way to "observe" cancelation from the IO.bracket {
IO(new AtomicBoolean(true))
} { isActive =>
IO {
while (isActive.get) {
println("Still active")
Thread.sleep(1000)
}
println("OK, OK, stopping...")
}
} { isActive =>
IO(isActive.set(false))
} This makes |
Beta Was this translation helpful? Give feedback.
-
This issue can be summarised once again as the perennial and unsolvable dilemma between two opposing notions of safety: deadlock safety and resource safety. Your argument is: if I'm cancelling something, and it's faulty/not cooperating, I will deadlock. The problem is that without back pressure, all nested brackets are broken, and fixing them requires changing the definition of the bracketed tasks (which you might not own) and add waiting code that is often beyond the reach of even maintainers, whereas with this model most code is deadlock safe (because it can be canceled with |
Beta Was this translation helpful? Give feedback.
-
First off, thanks for bringing this up, because it's a fascinating topic that deserves more attention. There's a lot to unpack here! What you're getting at is exactly what @SystemFw noted: backpressure, and more generally resource leaks. First off, it's important to underscore the fact that cancelation cannot be a must signal, it can only ever be a hint. This isn't just a limitation of the JVM, it's a limitation of resource-safe code in general. The only way to ensure that resources are deterministically acquired and released is to create critical sections in which cancelation is deferred. Now, it is best practice to ensure that such critical sections utilize
Pretending to respect the cancelation immediately would really just be a lie. We haven't canceled anything; the The final nail in the coffin, from my perspective, is the fact that users can always emulate this "fire and forget" model of cancelation by using
IMO this sounds like a good enhancement for |
Beta Was this translation helpful? Give feedback.
-
@djspiewak @SystemFw @oleg-py going to write a general answer, to not repeat the same answer 🙂 Regarding that Resource safety is absolutely meaningless if you don't see that there's misbehavior. And you can't see that there's misbehavior without that In real life that's awful — and we might be able to fix Cats-Effect 2's
That's also probably because on cancelation we should probably not respect the reverse order of acquisition. On cancelation the important thing is for resources to be closed as fast as possible, to prevent leaks. People ended up expecting that due to issues with the API of def bracket[R, A](acquire: IO[R])(use: R => IO[A])(
complete: R => IO[Unit],
cancel: R => SyncIO[Unit]
)
If you have to do "one last thing" on cancelation, then that's not cancelation. That's just normal user input that can be treated differently. That Haskell's base model is "cancelation does not back pressure" is IMO yet another confirmation that our model in CE3 might be wrong. Popularity isn't proof enough, but it's the strongest signal we have. Reactive Streams, RxJava, Project Reactor and others don't back-pressure cancelation. And I don't think Haskell's unsafety comes from that IMO, but from its concept of async exceptions and from believing that Your argument is that we can implement a Here's why the order of acquisition shouldn't matter much on cancelation, this being another instance of a leak: // Faulty release ;-)
val never = IO.interruptibly {
while (!Thread.currentThread.interrupted()) {}
}
val res = for {
r1 <- Resource(IO { (???, IO.unit) })
r2 <- Resource(IO { (???, never) })
} yield (r1, r2)
res.use { case (r1, r2) => IO.never }
.timeout(10.seconds) Note how:
And I don't think it's feasible to have people implement their own In other words, this isn't really a choice. Something that bothered me ever since we modified Cats Effect's protocol to back-pressured cancellation (although CE 2 was IMO far better behaved in this regard, even if it had some confusing corner cases, because at least some operations like |
Beta Was this translation helpful? Give feedback.
-
OK, I have thoughts — since yesterday. My proposal is basically this ...
So how about a configuration like this: sealed trait OnCancelBehavior
object OnCancelBehavior {
case class Backpressure(timeout: Option[FiniteDuration])
extends OnCancelBehavior
case object FireAndForget
extends OnCancelBehavior
val default = BackPressure(None)
} We can then pass this configuration where needed, like in overloads for def timeout(d: FiniteDuration, c: OnCancelBehavior): IO[A]
// ...
abstract class Resource[F[_], R] {
def use[B](c: OnCancelBehavior)(f: R => F[B]): F[B] = ???
} Something like this anyway. At least let's make it configurable. |
Beta Was this translation helpful? Give feedback.
-
An anecdote from a confused user, summarizing from gitter:
|
Beta Was this translation helpful? Give feedback.
-
The rationale for |
Beta Was this translation helpful? Give feedback.
-
I think it's fair to say that I disagree with Erik Meijer on quite a few things, including array covariance. :-) The problem here stems from the fact that cancelation, in idiomatic Cats Effect applications, is something that happens very frequently, under entirely normal circumstances, and then is often followed by a continuation which performs further operations. If cancelation were quite rare, this wouldn't really matter all that much since there would be limited (or no) possibility of runaway resource leak. That is certainly the case in the reactive streams ecosystem, but not here. Even outside of resource leaks, it isn't at all difficult to describe common scenarios where backpressuring cancelation produces the desired semantic, and non-backpressured cancelation results in unintuitive or even nondeterministic results. For example, imagine a All of this circles back to a very fundamental idea in the Cats Effect and Fs2 ecosystems: backpressure should always be the default. The reasoning here is relatively simple: resources are finite, therefore we should not design frameworks and applications which assume that resources are infinite. This is equivalent to saying that queues should always be bounded by default, and unbounded queueing should be opt-in rather than opt-out. Every unbounded queue is either an assertion that some external limiter is in place (a detail which most people overlook!) or that your memory is infinitely large (which seems unlikely). Even aside from the first principles argument, there are several practical scenarios which make default-backpressure an inescapably correct semantic. Load shedding comes to mind as a major one. When everything in an ecosystem preserves backpressure, then graceful load shedding on overloaded services happens effectively for free, without having to take any explicit action. This is an absolutely critical property, since the definition of resilience in any high-scale system is not the ability to successfully drink from a firehose of unbounded size, but rather to automatically self-heal regardless of the flow magnitude. Without load shedding, services will inevitably respond to unexpected traffic volumes by severely degrading, often permanently, and often taking down other systems at the same time. This brings us around to the Achilles heel of asynchronous applications: backpressure "gaps". The general accepted wisdom around asynchronous programming is that, when it works, it works extremely well, but when it breaks the results are proportionally much more catastrophic. The mere fact that a single process can quite easily accept enormous connection volumes at a time hints at just how catastrophic things can get if you don't balance things correctly. When running in a marathon, tripping in front of the full group will cause a massive chain reaction of problems if those behind you don't react immediately and correctly. The asynchronous analog of this is when some component of an asynchronous system starts slowing down. If you don't have backpressure at every link along the chain back to the producer, the resulting flow will build up at the first point at which unbounded resources were assumed, much like water bursting out of a broken pipe under high pressure. Thus, removing backpressure from a link in the chain should be a choice that developers make very intentionally, with great care, and with full understanding of the implications. Since cancelation is expected as part of the normal and frequent application operations in any application, it too must exert backpressure as its default mode of operation, otherwise all of the guarantees of the system fall apart. Anyone who prefers non-backpressuring cancelation can simply append |
Beta Was this translation helpful? Give feedback.
-
I may revive an old topic that was already discussed. Sorry about that, better late than never 🙂
Take this snippet of code, that continuously writes lines of text into a file (a resource), and we attempt killing it via a
timeout
...In Cats Effect 2.x this program behaves as expected (to me anyway) — it triggers a
TimeoutException
after 10 seconds and exits. It may also log anIOException
via the thread's default error handler (because that Writer ends up forcefully closed).In Cats Effect 3.x this program is stuck forever. It can't be killed by
SIGHUP
/SIGTERM
either, soIOApp
too is problematic — it definitely defies common sense, being very unlike normal app behavior (because you can interrupt aMain
with awhile (true)
, by default, in most programming environments I've seen).And I fear this is by design, a design (possibly imported from ZIO) with which I deeply disagree with. If this behavior is by design, it completely misses the main purpose of cancellation.
Cancellation is NOT user input. We have other ways for modeling user input. If we want to detect the client's intent to exit, we could share a
Defer
(aka fancyFuture
), or some sort of stream. Cancellation is forceful shutdown in case of a race condition. Cancellation is not a SHOULD, but a MUST. It doesn't matter if we now have anIO.blocking
that's well-behaved. Being able to do a forceful shutdown of resources should not rely on good client behavior.In the old Cats Effect 2.x the problem was that the resource could have been closed while in use, concurrently, leading to an
IOException
and possibly corrupted data. But that was better than this — one does expect corrupted data in case of cancellation, being normal. Whereas this behavior here can create a zombie process that can't be killed without aSIGKILL
, which is not normal, defeating common sense.In terms of network protocols, getting confirmation from the client that a "connection" is no longer used or closed is a pretty bad idea, because the client might misbehave. In case of a race condition (such as a
timeout
) acancel()
is not an operation that you want to back-pressure, before or after invoking it.The perennial example is the TCP protocol. Google is filled with articles on detecting "closed TCP connections" or "broken TCP connections". Detecting liveness is its own can of worms, but even closing a connection is pretty challenging, as it goes via a complex handshake, like this:
Note how the protocol has states like
FIN_WAIT_1
,TIMED_WAIT
,FIN_WAIT_2
,CLOSE_WAIT
. These states are reflected at the kernel level. They are timed-out by the kernel at some point, because either the client or the server can completely stop responding.Even with that timeout, because it's a generous one, with huge traffic you can still end up with so many opened file handles that you can exhaust the available file handles, with the kernel refusing to allocate more, temporarily, until the timeouts are triggered. You can then, of course, modify the OS's settings, e.g you can increase the file handles available (since they are cheap), you can decrease the timeouts, etc.
The question is ... if a client is misbehaving, should the app freeze because of it? If it's a long-running server we're talking about, the answer is almost always no — the right answer is for the client's connection to get killed. And it is true that sometimes we can't stop a client easily. For example inside a JVM process this can't be easily killed:
And that's fine.
IO.race
andIO.timeout
should still work on it though. Yes, this too creates a leak, which may end up crashing the process, but that's a better outcome than a frozen process that can't be restarted by a supervisor 😉Also,
IOApp
should forcefully kill the app when repeatedly gettingSIGHUP
/SIGTERM
signals. That you can't specify such a behavior via normal mechanisms (other than giving up and manually triggering aSystem.exit
at some point) indicates we have a problem.Speaking of, it always seemed wrong that
bracket
'srelease
triggers anIO[Unit]
, which can be asynchronous in case of cancellation. A release in case of cancellation should always be synchronous. If you have to block on anything to do a release in case of cancellation, that's just wrong.I always disliked
bracket
, feels like an improvement over the status quo, but it invalidates the reasoning we have withMonad
and I'm getting the feeling that it is a bad gift that keeps on giving 🤷♂️Sorry for my strong opinions, I just had an emotional reaction 😅
Beta Was this translation helpful? Give feedback.
All reactions