-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasons for modules #569
Comments
FunctionsI've written "Functions (units of functionality)" in my notes. I'll have "units of <noun>" written down for each of the rungs on the abstraction ladder, but I realize that those easily become just empty words, so let me try to back it up: you put code inside a function, and that code forms a unit. In the early days of assembly code, the whole code section was a single thing, and there was no strong syntactic boundary separating a function from its outside. It all ran on convention. When you went through the motions of jumping back to what called you, that's when the function ended. It was an easy thing to jump into the function at different starting points — somewhat weakening (what we now think about as) the integrity of the function boundary. I recall vividly that in Knuth's TAoCP books, the MIX machine uses as its calling convention a kind of self-modifying code which sets the return address of the caller on the jump instruction at the end of the function. (Yes, really!) MIX is and old standard, and that particular design decision didn't stand the test of time — indeed, the newer MMIX architecture uses a more traditional stack-based approach. More on this under "First-class functions", below. Functions provide a rudimentary separation of concerns. Many of the programs in "BASIC Computer Games" just ran on and did what they did with GOTOs. Maybe there was the occasional GOSUB and RETURN; I'll need to go back and check. If there was, it was still not fundamentally more "bundled up" into individual functional units than assembly code. In the paper The Use of Sub-Routines in Programmes, David Wheeler begins with this sentence: "A sub-routine may perhaps be described as a self-contained part of a programme, which is capable of being used in different programmes." Already from the start, the motivation was a kind of modularity. The separation of concerns that functions provide takes many forms, each helped by the pure idea of a function:
Why does lexical scoping (also known as "static scoping") fit so well with functions? From the angelic perspective, it's because this scoping discipline provides a form of stability, tying variable scopes to "blocks" in the code — that is, in the written program text. It doesn't get much more clearer than that: your variable lives until the Or rather, dynamic scoping is the natural thing for an interpreter to adopt. The interpreter follows the control flow of the program itself; if it carries around a shallow ("one level deep") set of name bindings, then dynamic scoping is pretty much inevitable. Lexical scoping is the perspective taken by the compiler, which follows the program text rather than the control flow. It took from the late 50s to the 80s for the Lisp world to make the transition from one to the other; indeed, they started with interpreters and only gradually became keen on compilation. Functions provide a guarantee which will be cast in a sharper light when it is broken later. It doesn't have a name in the literature that I know of, so let's call it the dynamic embedding guarantee. It has two parts:
Think of it as a "spark" running around executing your program. (This is the entity that debugging/stepping through code makes manifest.) Calling a function means transferring that spark. Because the dynamic extent of the function's execution (the interval on the timeline when the function was active) is strictly contained in that of the caller's function's execution, the whole program run forms a perfectly nested structure, a tree of calls. It is at this end of the spectrum that functions are perfectly oblivious, not "aware" at all of the process that's running them. Looking ahead towards the other end of the spectrum: actors are the total opposite — not only are they aware of the process running them, they've taken it over to the extent that they are the process that's running them. Function calls compose well. This is why we can talk of call trees, because a function can take on the role both as a callee and as a caller. I'll save the discussion about re-entrant functions till the "First-class functions" section. Same goes for when functions compose in two other ways. Here I'll just note that it's interesting that the angelic nature of functions demands a lot of them, and functions, it seems, are up to the task. There is a slight asymmetry between parameters and return values: in almost all languages, you pass 0 or more of the former, but receive exactly one of the latter. The introduction of tuples means there's a way to reconcile this apparent conflict — if what you're passing in is always a tuple, it can be both a single thing (to the implementation) and 0 or more things (to the user). It's a really nice idea, but I can't recall seeing it implemented perfectly in any language in practice. The same trick allows a language to emulate multiple return values — for some reason, I see this implemented well in many languages (C++, Perl, Raku, Bel). |
CoroutinesKnown under many names: fibers, green threads. (The term "green threads" comes from the fact that some part of Java implementation was called "Green" at some point.) I've written "units of cooperative multitasking" in my notes. The operative word is "cooperative" — if one agent refuses to pass on the baton, the multitasking doesn't work. Recall the dynamic embedding guarantee. We now weaken this in the smallest way possible: there is still only a single thread of control, but there is now a way to transfer it between active functions. ("Active" here simply means "currently being called", that is, we're "in" that function's body; there's an activation frame for it.) This breaks the LIFO assumption of call completion: functions now get the ability to "pause" execution and hand the baton to some other active function. In a sense, it also breaks the assumption of "single thread of control" — but it depends what you mean, actually. It does break it if you mean "we have one and only one call stack". It doesn't break it if you mean "at any given moment in time, exactly one function is executing". (Real threads break that, but not green threads.) Interestingly, this affords a greater separation of some concerns, even though it surely feels as if we're making things more mixed-up. The reasons for this I cannot verbalize well, but it has something to do with separating concerns along producer/consumer lines. Coroutines enable stream processing. A producer can generate data in individual sequence items, and the consumer can accept those sooner than if it had to wait for the whole sequence to be computed. The total "delay" involved is the same if we don't have real threads, but we might see partial results sooner. Unix pipelines are an excellent example of this. (Except that they can sometimes use more than cooperative multitasking.) Russ Cox has a good article on the original reason for CSP (communicating sequential processes) being separation of concerns. That is, just as functions help us separate some concerns, coroutines help us separate others. A demonic implementation concern that gets mentioned here is a "spaghetti stack" (or "cactus stack"). This is just the idea that the stack is no longer a singly linked list of call frames, but a tree of them. The leaves of this tree correspond to the active-or-paused functions. Since this is our first encounter with suspending the execution of a function, we also need to bring forth what (demonically) needs to be saved for the function to be able to resume later:
This is enough as far as "information that needs to be saved" goes, but there's one additional detail that matters from an angelic perspective: assume that the keyword for "cede control to caller" is Python implements exactly this model. I think it was Erik Meijer who said that the bidirectional aspect of Coroutines anticipate CSP (Communicating Sequential Processes) — more on which later — in the sense that you can implement the latter with the former. Or maybe better said that coroutines look like a line or a singly linked list (like Unix pipelines), but CSP looks more like a directed graph. |
First-class functionsTaking a step in an orthogonal direction: given how absolutely awesome functions are, is there any way they could be made more awesome? According to Christopher Strachey, yes: we can make functions first-class, that is, reified values in the same sense booleans and numbers are values. "First-class" is not a complicated concept. It means we're allowed to do these things:
For some reason, Computer Science rewards those who make concepts first-class citizens. My best guess as to why: Denotational Semantics already talks about "valuations" of program fragments: turning a bit of syntax into an actual value. By making those values first-class, we are allowing the program to hold and refer to those valuations directly. This tends to bring more direct power into the language. I've written elsewhere about the drawbacks of making things first-class. The Lisp family of languagues ended up being the ones who struggled with these questions. With the comfort of hindsight, it was simply a transition from dynamic scoping to lexical scoping... but in the coincidences of actual history, they were called "downward funargs" (function values passed as arguments) and "upwards funargs" (function values returned as return values). (John Shutt remarks on the fact that "funargs" implies that the "arguments"/downwards case somehow takes priority, which is indeed how the Lisp world thought of it.) Take the "downwards funarg" case first. This is what happens when you give an anonymous function to Now then, "upwards funargs". These were, historically, the real problem. Lexical scoping says it should just work: the function value returned should still have access to the outer function's variables. But from the perspective of dynamic scoping, and from the demonic perspective of "returning from a function means de-allocating its activation frame", upwards funargs are a real problem. Upwards funargs "escape" from the function that defined them. This is the first push from the world of stack allocation to the world of heap allocation. The historical answer to all this is closures: the joining of a function with the (lexical) environment in which it was created. The environment is what gives the closure access to the necessary outer variables. In the history of Computer Science, there is no greater story than that of FP and OOP setting off in two different directions and yet reaching the same goal. From the point of view of closures, the story is how closures accidentally take on all the power of objects. The koan about Qc Na recounts this. From a demonic perspective, the act of returning a closure from its active function means the closure has to save the active function's environment, thus turning into something the shape of an object. To me personally, JavaScript was the language that demonstrated this, since its first-class functions are faithful enough... but historically, Scheme was the language in which these realizations were made, and embodied. Next up is continuations. I would say that the closures⇄objects analogy gets even stronger in the form of continuations⇄actors. But all in due time. |
ContinuationsI have nothing in my notes about continuations. Fortunately, I've spent the last week implementing them, again, in Bel. I needed them for my "fastfuncs" mechanism, which is a cheaty way to make the Bel interpreter faster without having a full-fledged compiler. (I had falsely believed the fastfuncs could make calls of their own freely, but this ends up interacting badly with other continuation-based Bel functionality (such as exceptions), and so in the end, the fastfuncs need to implement continuations as well.) Continuations are what you get if you start from regular second-class functions, and then add both coroutine functionality and first-class-ness:
Holey manoley! I've never seen anyone describe it like that in the literature, but I believe it's true: continuations are simply first-class coroutines. Categorically, they are the pushout of coroutines and first-class functions, taking plain functions as their common base. A corollary of this is that if your languages has first-class continuations, you don't separately need it to have coroutines; they are but a special case. My mental model is this: functions abandon their role as the fundamental unit of computation, and hand it to a smaller entity: what typically is called a "basic block", characterized by the fact that jumps/state transitions can only happen between basic blocks, never within them. If it gives you warmth and comfort, you can think of these basic blocks as "mini-functions". They have the additional restrictions that they do not contain internal calls — a call counts as a jump, and so needs to happen at the end of a basic block. There is a one-to-one correspondence between the "control-flow graphs" people used to draw in the 70s, and basic blocks/"mini-functions". Continuations turn the call tree into a directed graph. In doing so, they also erase any distinction between "calling a function/passing a parameter" and "returning from a function/passing a return value" — these are both just instances of a invoking a continuation. This seems to be a difference that stems from turning coroutines first-class: by mixing "control" and "values", we destroy the idea of "caller". Continuations are useful as a basis for advanced language features: coroutines, exceptions, finally semantics, backtracking semantics... The implementation can still choose to keep them continuations hidden and "second-class" in userspace, if it does not wish to confer such awesome power to the language implemented. Coming from the other direction, even a language which itself does not expose first-class continuations could use a programming style which imitates them (using first-class functions). This is known as "Continuation-Passing Style" (CPS). The clearest example of this that I know about is Node.js with its callback parameters, circa 2009. It's now 12 years later, and Since I've spent the past week writing (Perl) code in continuation-passing style, I feel I'm somewhat an authority at this style, more than I ever was when I wrote Node.js code in that style. You're writing your basic blocks as functions, which either return normally, or invoke another basic block. All you need to make sure is that all the basic blocks you need from the current one are lexically in scope. You can do that like this:
This covers the hardest case, that of looping and forward jumps. The cases of sequencing and choice are easier, and I won't cover them. Altogether, we can express things in such a CPS style, using calls for anything that previously required Pro tip: if you try this at home, you should really do it in a language that implements Tail Call Optimization. The reason is that, since continuations don't believe there's a call stack at all, they will be sorely disappointed if there is one in the implementing language, and it overflows. Amusingly, the way this resolves itself in Node.js (and in my Bel fastfuncs) is that the stack gets reset regularly by async calls. (Very Late Edit: And, whatever you do, don't mutate your local variables! (Something like a It is perhaps here it's also worthwhile to mention that, as part of the development of Scheme, Steele and Sussman found out (by implementing it) that continuations and actors are equivalent... under certain other reasonable assumptions. More precisely, passing a value to a continuation and passing a message to an actor can be seen as equivalent in power. Perhaps the biggest difference in practice is that actors are fundamentally isolated on the process level (and therefore asynchronous), whereas continuations are agnostic in this dimension: they allow it, but don't require it. To Steele and Sussman, that didn't matter much; to Hewitt, it did. I feel I could write a lot more about continuations. I also have literally dozens of papers queued up about continuations, most of which I have not even skimmed. It's clear to me that they are powerful and fundamental, and that I still do not understand them well enough. The removal of the basic block's power to call other functions is significant. Here is the first time we're saying "you can do many things, but you cannot do this". Continuations do not work properly if we also allow normal function calls. It is by limiting them that we make them work as they should. This act of limitation is the start of removing control from functions, and making it a global/distributed concern rather than a local one. |
This is as far as I got tonight. Looking back at my progress, I think I would need two more evenings to finish up. I'll try to rise to that challenge. |
That's a weird list of languages with tuple support. I'm not sure what those share in common wrt multiple return values? |
Aye, I fear I did not do that bit justice. Not sure it's about tuples what I'm after. Let me unpack (pun totally intended) my given list of language examples in search of some unifying trend:
Python doesn't have destructuring as such, although it does have some mitigating ways to assign from an iterator into a tuple of lvalues. Which only leaves Java among the more popular languages. Java has absolutely nothing in this area of design space, which seems strange until you realize that Java was put on this earth to make the von Neumann bottleneck as narrow and uncomfortable as possible. |
Common Lisp and Scheme share that This is similar to how this happens in Lua, which is the only language with different semantics for "return values" than just unpacking (that lots of languages have): In Lua, a multiple value is collapsed in several positions, but it "auto-splats" when used as the last argument of a function, inside an object, and is blocked by parentheses. function f()
return 2, 3
end
function g(...) end
local x = f() -- Cannot splat: assigns x = 2
local v = { f() } -- Can splat: assigns v = {2,3}
g(f()) -- No parentheses: calls g(2, 3)
g((f()) -- Parenthesized: calls g(2)
g(1, f()) -- Last arg: calls g(1, 2, 3)
g(f(), 1) -- Not last arg: calls g(2, 1) |
This isn't even my main beef with the feature (although that sounds like a pretty horrible default). My main beef is that it feels like their goal was to deliberately create a second-class value in the language just for passing many values... OK in general maybe, but not in the context of a language whose central feature is to build rich tree structures out of a universal structure type. It's like the
hnn 😞
eww 😱
One thing that is really really hard in language design, for some reason, is to make parentheses in expressions keep meaning "just grouping, nothing else". |
Or most programming languages, really. I'm just describing it for the sake of completeness, but I certainly won't say I like it at all.
It'd be better, yes. Well, the language with the worst example of this that comes to mind right away is C++ (if a function is |
I haven't forgotten about this thread, but getting back into it is going to require bigger chunks of time — realistically, the next time I'll have those will be around Spring Festival. In the meantime, I just found From Procedures, Objects, Actors, Components, Services, to Agents – A Comparative Analysis of the History and Evolution of Programming Abstractions. It comes without a strong recommendation — I have only skimmed it very lightly, not read it in detail — but the title itself was similar enough to the thrust of this issue that I wanted to mention it. Might as well mention Sans-Papiers as First-Class Citizens, which I will need to pick apart and review in detail at some point as well. |
On Abstraction by Eric Scrivner:
This reminds me of two things. One is the point about premature abstraction that sometimes pops up when discussing abstraction with @jnthn — the hard part isn't creating an abstraction, the hard part is the mental work necessary to arrive at the right/useful abstraction. In the limit, refactoring makes us brave in constantly re-evaluating and re-creating abstractions as needed. The other point is that Bel's predicate-based type system seems to fit the description about "compression guided by resemblance and analogy". Unlike in OOP, there's not really a reified "thing" corresponding to a class or type. Just a predicate that answers |
I just realized one thing. Here's a sentence from cppreference:
So, a "function object" (which I've heard people call a "functor") is an object that acts like a function, in that you can use the This is approaching first-class functions from the other direction, adding "it's a function" to an object instead of (as is traditional) adding "it's a first-class value" to a function. It's first-class functions from an OO perspective. It clicked for me when reading this SO answer:
This is a very C/C++ perspective, when "regular functions" mean "lower-order functions" by default, and these don't contain state. (Um. Modulo the Of course, I realized all this on some rough level when writing the original comment about first-class functions:
Despite this, there's something concrete and satisfying about seeing the path actually walked in an objects-and-classes language, adding JavaScript doesn't have operator overloading, but the JavaScript MOP consists only of |
I don't know if it helps in any way, but "jumps/state transitions can only happen between basic blocks, never within them" — or, equivalently, "basic blocks are stretches of sequentially executed instructions" — is a kind of quotient construction; roughly a way to decrease resolution, making the notion of "instruction" bigger and more inclusive, and the notion of "(conditional) jump" more universal. Something related happens when looking for strongly connected components in a directed graph; the result is lower-resolution, but a part of the structure remains.
All of this is true, but (as I come back to re-read this) applies equally well to coroutines. I think the CPS transform has already happened, conceptually, at least, the moment we introduce |
It bothers me a little bit that we need to talk about a removal of power here; continuations in general feel incredibly enabling, in that they literally enable other features like generators, exceptions, nondeterminism, and other wonderful types of control flow. Note that in Continuation-Passing Style it is the style itself which imposes a restriction, and with basic blocks it is our definition of what a basic block is that imposes the restriction. It doesn't come from continuations themselves, which are highly unconstrained and non-constraining. Also important that the "stack paradigm" naturally suggested by regular LIFO function calls is also very restrictive, and by introducing continuations, we're transferring to a "heap paradigm". The non-obviousness and pain of doing so was the essence of the funarg problem, especially the upwards half where the stack is insufficient. The stack paradigm is great, not just because it's faster in practice but because it ties an important kind of RAII thinking into the structure of the code itself. One of the tricky parts of good compilation is to get enough static knowledge of data flow inside of a function to determine that nothing continuation-shaped escapes, and therefore things can be stack allocated and therefore faster. The nature of such determinations means that we can do this less often than we'd like (and even when we can, it takes a bit of work). |
I'm curious if you get anything useful out of this. (And/or a reddit discussion about that article.) |
Yes! My immediate reaction is that "substructural type systems are cool and useful" — vide Rust, and Perceus, and, I dunno, But I think I'll leave it at that. There's a sense in which the whole substructural thing falls under the heading "Types", which I almost didn't even include in the abstraction ladder/table of contents/topic plan for this issue. I have an irrational fear of type systems, equal and opposite to Bob Harper's love of them. I mean, clearly the regimentation they impose ends up helping and helps us see further. But on whose authority except its own does the type checker reject my program? I'm not hating on type checkers, I just don't think we should be slaves under them. I have more to say/write about well-foundedness, and the distinction between what Harper calls T++ and PCF++ (which mirrors what Hofstadter calls BLooP and FLooP) — it seems to me that ownership-tracking is to memory what T++/BLooP is to termination/totality/decidability. Possibly there's even a direct connection — but even if there isn't, the topics feel a little bit related. |
Relevant to the way continuations generalize regular functions and the stack paradigm (from a 2012 blog post by John Shutt about guarded continuations):
I called (1) above the "dynamic embedding guarantee". I don't think that I spelled it out that continuations obliterate the dynamic embedding guarantee. Also note how (1) is about (returning) "at least once", while (2) is about "at most once". Continuations but with (2) are sometimes called "one-shot continuations", I think. |
A quote I just ran into, which corresponds to where I was planning to make this issue end up:
|
Parts of this story gain a sharper edge by more carefully separating the pure/applicative/expression world from the monadic/heap-memory/sequential world, like PFPL does, which I'm currently reading with great appreciation. On page 307 (of the paper book, second edition), a bit after the midpoint of the book, things culminate in this factorial procedure declaration:
Given how what we're expressing here could be written in Raku as
Assignables only exist in the monadic part. Variables are typically associated with the pure part. I wrote this earlier:
The entire point of Harper/PFPL's separation is that side effects are locked into the monadic half of the language, letting the pure part keep many of its algebraic/equational properties. This is nothing new, of course; Haskell does exactly this with monads, and Koka does exactly this with effects. All the talk about side effects anticipates the later talk about process isolation and actors. It's by making the side effects and the heap memory explicit (like PFPL does) that we can later isolate those things, and hide/encapsulate them inside something. |
In recent years I've been thinking of 6model as "actor model ready". Do you get what I mean? Am I deluded? Have I mentioned / are you aware of Pony and ORCA? |
I think so, but to the extent I do, I'm not sure I agree. I'll quickly add that this might be because you have pieces of the 6model puzzle that I lack. In order to answer in more detail, I need to anticipate a few points from the future of this thread, specifically the last two topics in the projected list of "module-like things":
But as a baseline, the story must start with the status quo of shared memory, threads, and locks. I'll take all those mechanisms as given — there's oodles of literature on it — but I will mainly point out here that locks don't compose and threads don't scale. This book, specifically, spends the early parts showing how a perfectly well-designed class for single-threaded use can break under many-threaded use — it's humbling — the problem is a bit like "monsters from the 9th dimension" in that it manifests orthogonally to the correct code you wrote — and then spends the later parts advocating safer/higher-order building blocks, like atomic data, built-in cuncurrent data structures, and architectures built on the producer-consumer metaphor. (Further confirmed with BG's answer here of also covering "fork-join, parallel decomposition, and the new parallel bulk data operations".) Communicating Sequential Processes takes the initial interesting step that assignment (to shared memory) is not a primitive operation, communication is. Specifically, this is a two-party action, so every such communication is also a synchronization point, and establishes a happened-before relation à la Lamport. This simple shift in perspective makes things scalable, and the focus now shifts to the (quite possibly dynamic) structure of communication channels between processes. Shared mutable memory is no longer a problem, because there's no global memory to share; you can only share along channels. Actors similarly take a new primitive operation: (asynchronous) messaging. Actors are by construction so well-protected from each other that not only is the "shared mutable memory" problem completely defined away — but by themselves, actors don't seem to be able to synchronize anything between themselves. I'm still learning about this, but one solution seems to be synchronizers, which could maybe be seen as a common region in which synchronization on shared resources between actors can happen. These two models are clearly related. In fact, we can encode either model using the building blocks of the other. Which means we can zoom out to a sufficiently high-altitude perspective where they both look the same, and present a common solution next to threads/locks: while threads and locks start out with a fundamentally un-synchronized resource (shared mutable memory) and then tries unsuccessfully to apply massive amounts of cleverness and discipline to apply synchronization in just-enough places to restore correctness, CSP and actors start out with a fundamentally synchronized resource, and then spend the rest of their lives happily ever after. (Although actors also need to add synchronizers as an extra component.) Threads and locks are still popular despite being fundamentally unworkable, because people naturally tend to think of a solution as a concrete feature. CSP and actors start out by removing and restricting; more of a "negative feature". There's a quote about Dijkstra somewhere about how his main insight about concurrency is that it's not about adding something (such as "threading"), it's about removing a restriction/assumption (such as "the order of the statements (alone) determines the sequencing of the operations at runtime"). The story doesn't end there, either. There's something really interesting going on with what Rust and Koka are doing, basically exploiting the fact that "has a unique owner" is a necessary and sufficient condition for mutability. This is like carving out a "safe space" from the shared mutable memory, and the fact that something shaped a bit like a type system can do that is... interesting. You don't need to write Tying everything back to whether 6model is "actor model ready" — it would need to be in terms of some of the above primitives, I think. CSP/channels, or asynchronous messaging with provably no sharing, or (à la Rust) just provably no sharing. I'm not sure if 6model is more or less ready for any of those than other object systems. |
@raiph I don't think so, on both counts. Skimming the linked page, it sounds amazing, but it's also extremely short on specifics. I would be a thousand times less doubtful of the claims (sound + complete + concurrent) if there was a link to a paper containing algorithms and a number of proofs. Several questions arise, such as:
MoarVM's garbage collector is pretty cool, but it's far from this design along several dimensions: it's tracing/generational, not reference-counting; and it's based around threads and condvars, not actors and a mysterious lack of a need for synchronization. |
Ah; the quote I thought about is from this blog post:
|
A further thrust at the question of 6model's actor-model-readiness. Here's a quote by Robin Milner from his Turing award lecture:
I think this pinpoints something important: when Hewitt says "everything is an actor", he means something concrete/specific about his imagined system. 6model can reasonably claim that "everything is an object", and back that up with evidence in terms of abstraction/encapsulation boundaries. For it to take the step to claiming that "everything is an actor", it would need to witness that in terms of process boundaries and inviolable message-passing mechanisms, as the foundation of the whole model. (But when it's put like that, I'm not sure that is, or should be, the goal of 6model.) If what we mean is "some things can be actors" (rather than the whole system being actor-based), then I believe this was conclusively asserted with the release of jnthn's oo-actors module in 2014. 😄 |
Here's another reference for future reading and incorporation into the themes of this thread: The Purely Functional Software Deployment Model (Eelco Dolstra's PhD thesis). It's the basis for the Nix package manager, whose main selling point is that identical inputs give identical outputs — that is, builds are reproducible. As a relevant aside, it would appear to me that golang is following a convergent evolution here, independently discovering the value of reproducible builds [1] [2]. I hope to get back to both Nix and golang/vgo when I write about "Packages", listed in the OP as one of the tour destinations. The thrust of my argument will be just this, that as an abstraction, packages are not stable/usable unless they guarantee reproducible builds. It's a wonder we get anything done in mainstream software engineering, absent that guarantee. |
The abstraction ladder, identified in the OP, also spans another axis: that between values and resources. Computer science tends to talk about values, but software engineering has an understandable focus on resources. What's a resource? A database handle is a resource, a file handle is a resource — or they stand in for/represent resources, maybe. It depends how you categorize things. Threads and green threads are resources. Memory (including too-big-to-copy values in memory) is a resource. Input/output on a port is a resource (or the port itself is a resource?). Communication bandwidth is a resource. Network access is a resource. The abstract concept of compute (as in, computational throughput) is a resource. If I plugged a quantum co-processor from 2043 into my machine, it could constitute a resource. I can give these specific examples, and I feel there is something that they have in common, but I can't clearly delineate that something. Values can avoid being about "the real world" in some sense, but resources are anchored in the real, physical world, and inherit limitations/properties from it. There are probably physicists who love that connection, whereby physics kind of bleeds into computer science. It feels like there would be some straightforward connection to effects (and coeffects). Are effects simply about acting on a resource? The resource is the noun, the effect is the verb? At its most pure, lambda calculus can be all about values. At their most expansive, CSP and actor systems can be all about resources. The abstraction ladder seems to straddle the values-resources axis. I wish I understood better how these things are connected. |
I just stumbled over the userver framework, which looks nicely designed and quite flexible/powerful. This is what met me in the first tutorial I decided to click on:
In other words:
Later edit: It's all good and well to be sarcastic, but when coming back and reading the above, I didn't want that to be my main point/final word. Specifically, the authors of the userver framework are operating well within the expectations given to them. This (concurrent access) is just that big of a challenge — I think it's fair to say none of us knows the final answer — people of the Rust and Koka communities can see a bit further, sure, but in the end we're all still hunting for solutions, trying them out, and learning for the next iteration. |
I want to call attention to this polemic post whose main message is this:
I realized that I agree with this particular sentence because it's qualified, but then disagreed with the message of the rest of the post because I don't believe the qualification ("...your problem admits certain concessions...") applies in a general-purpose language. In the post above about first-class functions, we showed that the special subtype of first-class function that people historically called "upwards funargs" have a tendency to escape their location of birth, thereby bringing their lexical context with them in a closure:
Of course I should have defined a static embedding guarantee that this breaks along the way:
This is true (a) if there's only ever one global environment and not many small lexical ones, or (b) if functions are not first-class/mobile. I know that's a little bit abstract, so let's break the guarantee three times: In Bel:
In Perl 5:
In Alma (untested, but I have no reason to think it wouldn't work; #577):
Most modern mainstream languages have "given in" and introduced a mechanism that allows this kind of freedom of movement, whether via escaping first-class functions, or via heap objects being allowed to reference other heap objects. This is where the abstract graph of objects and references in memory turns from a tree into a graph, specifically one with cycles. That's also the point where reference counting becomes problematic, and garbage collection turns into a more robust solution. The qualification in the post ("Provided your problem admits certain concessions...") is no more nor less than the static embedding guarantee: things are not allowed to escape their location of birth, or mutually reference each other, directly or transitively. (Wait, what? Where did the "mutually reference" condition come from, weren't we talking about functions and lexical scopes? It comes from the Y combinator, which can be seen as a device that transforms firstclasshood of functions into function self-reference. Similarly, any "tying the knot" technique transforms freedom of heap reference (during object construction) into a cyclical object graph.) The HN discussion seems pretty balanced. The top comment right now is saying that this is a never-ending and unwinnable debate. The top answer to it is reporting spectacular success with (mostly) reference counting. The abstraction ladder also spans a spectrum between the specific (like reference counting) and the general (like garbage collection). Maybe the real lesson is that there is no one-size-fits-all point on that spectrum. |
I'm getting ahead of myself here, but not by a whole lot. I realized on the way to work today that there's a great transition between the idea of functions and the idea of modules. Briefly, it's this: thanks to lexical scoping, functions own much of their data, and it's perfectly hidden from the world. (We'll sweep complexities around stack/heap allocation, first-class closures, and escaping under the rug right now. Just assume they are appropriately taken care of.) Modules are primarily that, a "perfectly hidden" guarantee for some data, but wrapping a group of functions. Keywords like Various related concepts make this effect clearer. Separate compilation for optimizing the data parts that are private to the module. Types for communicating (statically) across module boundaries. Contract-based programming for refining ensure/provides relationships even further, and separation logic to reason about what memory is owned by what module. That's it. Modules are, in this sense, a kind of "jumbo function" — the "jumbo" concept borrowed from Levy — they are the answer to the question "what is the biggest programming language feature we can imagine with the features we appreciate in functions?". |
Found this on Wikipedia:
This is true. Still, I wonder what it would be like to live in an alternative time line, where concurrency was assumed from the start, and all data structures were concurrent (thread-safe, etc.), otherwise they were considered broken. As it is, we all kind of start with the unsafe ones because they are easy to think about and usually much more within reach (grabbing |
While lurking around on nlab a while ago, I found this:
I want to say several things at once. (Heh.) First off, I think that this contrasting formulation was genuinely new to me — where previously I had been fully aware of partial functions and (separately) multi-valued functions, I had considered them completely separate models/problems, and had not drawn any parallels between them. I had seen them show up (separately) in many places in both PL theory and programming practice, but I hadn't considered them part of the same "continuum", as it were. Secondly, both "partial function" and "multi-valued function" are instances of the red herring principle. To spell it out, a function maps exactly one element from the domain into an element in the codomain. This "exactly one" is the characteristic feature that make a certain small class of relations functions. But the "partial" in "partial function" changes "exactly one" to "zero or one", or "at most one", or "lone". A partial function is not a type of function, it's a widening of the concept of function (that is, all functions are (unintuitively) partial functions); it's a loosening of the characteristic constraint. Exactly analogously, the "multi-valued" in "multi-valued function" changes "exactly one" to "any number of", "zero or more", or "set of". A multi-valued function is not a type of function, it's a widening of the concept of function (that is, all functions are (unintuitively) multi-valued functions, as was pointed out above); the "exactly one" in the original constraint gets re-imagined as a singleton set, which is a special case of all possible subsets of the codomain. Thirdly, I now think I see how these two conceptual models (partial functions and multi-valued functions) keep popping up as language features. Just two examples:
Note there's no backtracking logic at all. Instead, the Somewhere else among the Alma issues I wrote about "changing the meaning of expressions/evaluation". It feels to me that in both these examples, that's what's going on here. There's a wrapper, enabling/encapsulating either partiality or multi-valuedness/nondeterminism, and then there are individual features inside which (like I think I haven't seen it this clearly before, partly because the wrappers are not always very explicit. In JavaScript, there's like an invisible wrapper around the whole expression, or at least to the nearest coalescing Fourthly, even though the partial case and the multi-valued case are "opposites", in the sense that they mostly care about different sides of "exactly one", it's also true that the multi-valued case subsumes/includes the partial case. However, I think that's theoretically interesting, but not always called out as relevant in a practical design. |
@raiph Hat tip to Deny Capabilities for Safe, Fast Actors. I wasn't aware of this paper before, even though it's from 2015. There's a talk video, where the connection is made with Pony (which it isn't in the paper, from what I can see). (Edit: This page on the pony website also mentions a blog post:)
The idea of deny capabilities (and specifically, saying what aliased references are not allowed to do) seems both non-obvious and "obvious in retrospect" — as in, I think some variant of this might well be the way forward for the ever-vexing challenge of sharing mutable state while also preventing data races "by construction". |
I stand by this definition, and I don't wish to change it. But (as mortals embedded in a single time line), our perspective on running nondeterministic algorithms is interesting. This is where people, like Floyd, jump directly to backtracking.
So, people read "nondeterministic choice" in the algorithm, and their brain immediately codes up the corresponding Floyd-style backtracking search. This is not exactly wrong, but it's a bit of a false equivalence. Instead, imagine this:
This program makes 8 iterations, doesn't backtrack anywhere, and always prints one of the 92 solutions to the 8-queens puzzle. Its only minor flaw is that (as far as we know) it doesn't run on a universal Turing machine. The secret is in the implementation of Or, you know, lest someone accuse you of not thinking 4th-dimensionally: imagine all possible runs of the program as a directed graph of states and transitions. (It's actually not that big. Yet another pons asinorum explanation: That is what nondeterminism is about. I've heard this particular style of it called angelic,
Just like you can't embed a Klein bottle into 3-space, you can't embed an angelic-nondeterministic program into determinism-space. I believe this statement is ultimately equivalent to "P ≠ NP". |
...of course, this true nondeterminism, with a magical Namely, the backtracking approach is the canonical way to compensate for not having a magical |
Although whether your program is in the presense of an angel or a demon/adversary is, of course, domain-dependent. I just came upon the quote from this paper again:
In the case of Whether you are playing against someone who is helping or hindering you depends on what phenomenon you are studying, I guess. In the case of nondeterministic search, the 92 solutions are what we are interested in, so it makes sense to have an oracle push you in that direction. In many other situations where a real world imposes itself whether we like it or not, it makes sense to model this real world as either non-aligned with our interests, or even actively against them. |
From You Don't Know JS: Async & Performance:
This distinction between "async" and "parallel" meshes well with what, in my head, I had as the distinction between "concurrent" and "parallel" (but I'll think of it as "async vs parallel" from now on): namely, there are two fundamental kinds of multitasking:
This distinction also puts into perspective the difference between "green thread" and "thread"; the former is always handled by a scheduler, the latter has the opportunity to run on a separate core, for example. |
ModulesYou know, it's funny. A module, at its barest definition, is a "managed namespace" — literally a set of statically known key/value bindings. But that also means that it's the "step up" that we're looking for, from individual functions (or other types of "behavior", including continuations) to groups of them. Why would you want to group functions together? I think quite possibly it's a human thing. The computer doesn't much care — the code all gets bundled together in the end, and damn the abstraction boundaries. But when I code up (say) a red-black tree, then there are functions that belong to the red-black tree abstractions, and functions that don't. From that point of view, modules are bounded contexts. No-one's forcing me to; I want to. I like boundaries. The perfect storm that was ES3 could do modules, but it did it with a function (acting as a "module constructor" and as a privacy boundary) and a returned object (acting as the "managed namespace" of public exports). I'm not saying that's the only way to do it; but seeing it done with as little as functions and objects makes you think, doesn't it? I think the one who got forever associated with this idea was David Parnas, with his 1971 paper "On the criteria to be used in decomposing systems into modules". It's still a good text, even though it shows its age. tl;dr: Don't modularize according to the "flow chart" (control flow and/or data flow) of your projeact — modularize according to separation of concerns and "information hiding" (that is, the separation of non-public implementation details from public API details). The ulterior motive when grouping my red-black tree functions together into a module isn't just that I like to group things that I think thematically go together. It's that they share a common representation of the red-black tree that they are operating on. Any change to that representation, or assumptions surrounding it, would affect those functions within the module boundary together, and no functions outside of the boundary. Things like types and interfaces, contracts and invariants, further help to make explicit those assumptions — which can serve both the module author inside of the boundary, and module users outside of it. I still think something like GOSPEL should be made part of the "you have to be at least this tall to ride" criterion of publishing a module. What especially thrills me about it is that it does for static analysis and interaction what classes do for API/language — it creates a new level of abstraction on which to be absolutely precise. |
Case in point: this blog post by Chris Wellons, Emacs 26 Brings Generators and Threads, spends most of its expository text about the new Emacs threads pointing out that they don't work:
As of this writing, we're up to Emacs 29.1, so the situation might for all I know have improved. But the bigger question is why this kind of thing happens with thread implementations — Emacs is just an example here. My main point is — nobody sets out to write a broken thread implementation! And still it happens all over the place. On the way back to lunch, I got a mental image of a Turing Machine with one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work on the single tape. That's threads. It's a fundamentally unworkable idea. The solutions, while obvious, are more complicated, or have more moving parts, or require more maintenance.
|
As a further (damning) data point on this, check out the opening paragraph of this recent blog article by Rachel by the Bay:
"If you do threads, you'd best not write to shared memory." Wise advice; what's stupid is that we created the possibility in the first place. |
I'm watching Kevlin Henney's 2017 talk Procedural Programming: It's Back? It Never Went Away. He mentions the book Software Architecture in Practice which outlines an architecture style called "Main program and subroutine" (in other words, procedural programming), and which has this quote:
(Emphasis mine, not Henney's or the authors'.) Henney says as an aside about the "single thread of control": "yeah, that's gonna come back and bite us later". It doesn't mention the children handing control back (in LIFO order) to their parents. Which, fair enough, that's not always the case, I guess. There's such a thing as tail calls. Other than that, it's quite stunningly similar to what I described as the "dynamic embedding guarantee" above. |
Whoa. And then later, towards the end of that same talk, Henney quotes a paper by David Gelernter and Nicholas Carriero named Coordination Languages and their Significance:
That is, lambda calculus in the small, pi calculus (or actors) in the large. I'd call that a pithy restatement of the point of this whole thread. (Edit: Two further good, clarifying quotes from the same paper.)
That is, mutability + sharing are bad when too close together; but tease them apart into computation and (separately) coordination, and they become manageable. (Another edit: I just feel I need to quote verbatim from the conclusions section, which again aligns with the goals of this whole thread.)
|
On the topic of modules, one of the most significant bits in the the design of a module system is whether to allow cyclic imports. This even hit Alma's very own module issue, #53, and is in some sense still unresolved.
As you can probably tell, I'm currently leaning towards allowing cycles. In a sense, allowing cycles between modules is tantamount to defining modules via |
A while after writing the above, I took the time to understand what nlab means by span, and how that ties into the understanding (or category-theoretical representation/implementation, perhaps) of partial functions and multivalued functions. It's actually quite simple. No, really. Start with a normal function, f: A → B; drawn as a category-theory diagram, this looks like two objects A and B, with an arrow f going between them. Now "split" this diagram into A ← D → B; the new object D in the middle (for "domain") is identical with A for the time being, and so the "→" arrow is the same f as before. The "←" arrow is an identity, since A and D are equal; being an identity, this arrow is both surjective and injective by definition. We have turned the thing into a span, but we've also effectively done nothing. Now we're set up to implement partial functions and multivalued functions:
Sweet. I particularly like how we create a new backwards arrow from nothing, and it is then "tuned" to successfully explain the otherwise-baffling partial and multivalued functions. Maybe a good way to say it is that the "→" arrow corresponds to a normal function, while the new factored-out "←" arrow corresponds to a kind of "multiplicity" knob we can tweak. |
Allow me to throw in here, without further comment, the first paragraph of the abstract of the paper When Concurrency Matters: Behaviour-Oriented Concurrency which I just found:
|
Inserting a small reminder here to myself, to later expand this comment into a small discussion about progress indicators, which are a nice illustration of... something. I guess independent processes (UI and task) with bidirectional communication. As part of researching this, I also found a concept by Donald Norman called Gulf of Execution, a kind of reification of the lack of insight the user has of what the system is currently doing. That feeling when you hit a submit button and... nothing. Is the page submitting my form data now, or did my click accidentally miss the button? There's no feedback, so there's no way to know. There's also the labor illusion, which arises from the UI and the task being two disconnected activities. The psychological feeling of seeing the progress bar move (either in real percentage or some indeterminate animation) is broken the moment we understand that the task will not progress any more, but the UI claims it will, or does. A bad or unreliable "remaining time" estimate is like a watered-down version of this. |
Hi, and happy 2024, I've been disciplined about remaining in pure lurk mode until I'm confident of contributing with a healthy signal-to-not-even-a-tangent ratio, so I will not mention stuff such as concurrency, actors / independent processes with communication, in case I accidentally mention other stuff like what relationship there is, if any, between things like the busy beaver game, unbounded indeterminacy, and quantum indeterminacy. Instead I am convinced you are exactly the right person, and this is exactly the right thread to be in, to ask that, if you have time this year, please consider reading and doing justice to what is being discussed in this article that was published yesterday: scheme modules vs whole-program compilation: fight, with commentary like this:
|
Happy 2024, @raiph.
Interesting. I will read. And yes, that does feel relevant here to this thread, maybe even to #302. Will get back to you once I've read the article. |
I'm not the first one to rage against threads, of course. How about, for example, this slide deck Why Threads Are A Bad Idea by John Ousterhout from nineteen ninety five? (Also summarized here.) Slides are two a page, so on page 3/slide 6, after some contextualization of threads, you finally find out why they are a bad idea:
"...one tape and N (uncoordinated) read/write heads, all competing and sterically clashing to do their work..."
"Use locks, they said. Your threads will be fine if you just use locks, they said." How utterly unfair! Not only do locks, as their primary job, cancel out the concurrency that was the entire point in the first place; there are also ample opportunities to simply hold them wrong and introduce new and painful ways to shoot yourself in the foot. As I wrote in an earlier comment: It doesn't have to be this way. Threads and locks are just poor building blocks. Take somehing utterly comparable, Kahn networks, and see them rush away towards the horizon, accelerating all the while, leaving threads and locks in the dust. Kahn networks scale. Kahn himself is highly aware of this, as seen in his final words in the article:
This (well-earned) pride in how well Kahn networks compose still haunts me. How can we make sure to do more of that, instead of investing in the castle built in a swamp that is locks and threads? |
The inimitable Stephen Wolfram writes about something very similar which he calls multicomputation. The rough idea being this: when we deal with deterministic algorithms, running them is an unambiguous, stepwise activity. But when we deal with a nondeterministic algorithm, the interesting thing is the entire global transition graph — since, almost by definition, most of that graph constitutes the road not taken in any particular run. Put a bit more simply or poetically, deterministic computation is narrow, but nondeterministic "multicomputation" is wide. There are things going on which involve choices of paths. Things such as winning a chess endgame, or angelic nondeterminism, take as their starting point that we can make informed choices now based on the desired result we want later. |
I find I keep thinking about concurrency. I just stumbled over Wikipedia's article about it, with a surprisingly cogent first two paragraphs, which I will quote verbatim (stripped of links and footnotes, though):
Several takeaways, in some order:
Pure expressions are tree-like, and the data dependencies are more like the "partial order" mentioned in the Wikipedia article. (That is, there are usually many possible ways to topologically sort and execute the individual subexpressions; as long as everything's pure, the result comes out the same.) This expresses a kind of concurrency, a decomposition of the problem where "happens-before" relationships do not appear exactly everywhere. But in monads and strictly sequential computation, they do — the |
Which a succinct description can't be expected to get into subtleties, the final clause of this:
Is glossing over the heart of why I say concurrency is part of the problem domain: how do we define an outcome as "not affected"? That's very much domain specific. When I used to teach on these topics, I had an example involving a data structure where we first applied locking to all of it, and then switched to fine-grained locking to get better throughput on updates, and then I sprung the question: is this still correct? It was a trick question: the problem specification wasn't actually clear either way on whether the partially applied update that we'd now made observable was problematic or not. |
Paraphrasing you in order to understand better myself, here's what I got: Yesterady I happened to read about trace monoids (aka dependency graphs, aka history monoids), which are like regular strings except that some pairs of adjacent characters are "loose", commuting with each other freely in zero or more substrings of the string. (Other parts are "fixed", and work like regular substrings.) We define equivalence on traces so that two traces with the same fixed sections, and loose sections which can be permuted into each other, are considered equal. Seen differently, we "quotient away" all the commuting that can go on, and consider each big chunk of inter-commutable substrings as a single individual thing (and the fixed parts a boring special case) — comparison then becomes a matter of directly comparing such things. The choice is in which things get to commute. ("Things"? They're characters when the traces are presented as strings, but more realistically we're talking about observable events. Hm, event structures seem a slightly distinct thing from trace monoids.) There's a spectrum of sorts between "nothing commutes" (and our trace degenerates to a string) and "everything commutes" (and our trace degenerates to a
By switching to fine-grained locking, some new pairs of events get to commute in the trace monoid. Is this still correct? It depends if we were fine with those events commuting. Were we fine with that? That's a question of specification, and most projects do not have a formal specification process (at least not to that level of detail), and do not ask that question before it arises in practice. "We don't quite know what we want" is the norm, or rather, formal specifications are considered to be for NASA and Boeing, but not for most of the rest of us. Certainly not a random project choosing between a |
This may or may not have been obvious when I wrote the above, but now I think think the above is about trying to invert a function.
Because the red herring principle applies in the latter two cases, what we're saying is that inverting Edit: Applying this perspective to parsing is surprisingly fruitful. You fix a grammar, and take your
|
Coming back to Pony and ORCA:
I just found this grimoire of memory management techniques, and Pony+ORCA are number 11 on the list:
This is reassuring (especially as it's written by someone who seems to know what he's talking about). I will try to find out more about how, specifically, Pony achieves this. |
I found another quote by Dijkstra, which talks about parallelism, nondeterminism, and concurrency. (It's in an EWD which criticizes Ada, then called "GREEN".)
Two takeaways:
|
Again, on parallelism versus concurrency; I just found this very nice summary by Guy Steele, in his talk "How to Think about Parallel Programming: Not", @ timestamp 29:45:
Interesting. I take two things from this:
|
Dropping this blog post, Ruby methods are colorless, into this ongoing issue, because it feels like it provides additional value in the discussion. I like the nesting relation Fiber < Thread < Ractor < Process; it feels clean, somehow. Hoping I can circle back and delve a bit more into the text of that post sometime later. |
Picking up the partial function/multivalued function trail a little bit (which is next door, conceptually, to failure semantics and nondeterminism): I found the first half of the video Converting Non-Deterministic Finite Automata to Deterministic Finite Automata quite helpful in gently providing a conceptual understanding of what's happening in automata:
The second half of the video basically goes through the powerset construction and how to carry it out in practice.
I plan to, when time permits, do a deeper dive into Finite Automata and Their Decision Problems; with all the confusion out there about nondeterminism, this seems to be the only way to ground things in a definitive truth. |
Issue #302, that enormous discussion that hovers around fexprs and Kernel but sometimes veers out into the unknown and asks really deep and important questions about computation itself (and how it's managed to manifest itself in our timeline's history of programming languages), contains the following paragraph (by me, replying to @raiph):
I'm on vacation, so I thought I would tackle this "tower" of abstractions, describing them one by one. I will try to do all of them justice, extracting both their angelic "essence" or perfect form of each feature — what remains, in a sense, when you don't have to be all that concerned with the inconveniences of actual implementation — and also their demonic "substance", the imperfect form of each feature — what inevitably happens when you try to trap it in a machine.
(As a concrete example: the angelic essence of functions is something like the isolation guarantees and re-use that they give you, whereas the demonic substance is what leads to stack overflows or references outliving their stack-allocated data.)
As much as possible, I will also refer to real actual languages out there, especially when they make for pure exemplars of the feature.
I will try to get through all this in a single evening, aiming for coherence of thought rather than comprehensiveness. I'm not 100% sure what I will get out of this, except that I'm hoping to have a better overview afterwards of these different rungs on the abstraction ladder:
There is still a clear connection with Alma: during the design of Alma I've sometimes encountered parts of the challenge of introducing these things into a language. (Maybe most keenly felt with modules and types, but also to some extent with coroutines.) I'm not saying that this issue will provide any insights that I wish I'd had earlier, but... it's something to aim towards, I guess. Or if not insights, then at least a much-needed vantage point.
This issue is called "reasons for modules", which is not a particularly good name. But it's how I've been thinking about it — modules sit somewhere towards the middle of the ladder of abstractions, and the word "module" means different things to different people, but they are the prototypical thing which helps bring structure and separation to a program. In some vague sense, all the features on the abstraction ladder are modules, of one kind or another.
The text was updated successfully, but these errors were encountered: