-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
design: describe the dice bytecode interpreter
Updates #1.
- Loading branch information
Showing
1 changed file
with
336 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,336 @@ | ||
# Proposal: Design of a bytecode interpreter for Go | ||
|
||
Author: Sebastien Binet | ||
|
||
Last updated: 2016-08-26 | ||
|
||
Discussion at https://github.com/go-interpreter/proposal/issue/1. | ||
|
||
## Abstract | ||
|
||
We propose to design and implement a bytecode interpreter for Go, | ||
which will be the foundation for a Go [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop). | ||
|
||
## Background | ||
|
||
It is common in science or exploratory work to iterate on a piece of code | ||
to solve a given problem. | ||
Having an interactive conversation with your program, _via_ an interactive | ||
prompt (aka a REPL), greatly speeds up such exploratory work: one can easily | ||
iterate on various algorithms, modifying the state of your program and data, | ||
and write new types and functions to _e.g._ plot the new state of your data. | ||
|
||
A side benefit of such an interpreter is the ability to embed it inside | ||
a Go application and provide both scriptability and extensibility. | ||
Designing such an API is outside the perimeter of this proposal. | ||
|
||
There are currently already partial solutions or whole implementations | ||
of a Go REPL on the market but none of those meets the following requirements: | ||
|
||
- easy `go get` installation | ||
- implement the whole Go language | ||
- be a real REPL, not just an "on-the-fly re-compilation + re-run the whole snippet" approach | ||
- JIT-able | ||
- performant | ||
|
||
## Proposal | ||
|
||
We propose to break the complicated issue of bringing a complete interpreter | ||
for Go (interactivity, whole-program interpretation, runtime, native functions, | ||
external functions, JITing, parsing source code, ...) into small pieces. | ||
|
||
The current proposal only deals with describing the bytecode interpreter | ||
(its overall design and its components), the opcodes and instructions which | ||
can be found in a bytecode stream and how these bytecodes can be interpreted and | ||
acted upon by the interpreter. | ||
|
||
There are many ways to implement an interpreter and as many options | ||
for the interpretation process: | ||
|
||
1. directly interpret from the source code | ||
2. interpret the source code after it has been transformed into an AST | ||
3. compile statements into bytecode instructions that are then executed | ||
|
||
We propose to go with option 3). | ||
Option 1) doesn't lend itself to optimizations nor very efficient execution. | ||
Option 2) is somewhat better: there are ways to programmatically manipulate | ||
and transform an AST. | ||
But with option 3) we should be able to reuse the whole corpus of optimizations | ||
coming from the new SSA backend of the official `gc` Go compiler. | ||
As explained in Rob Pike's talk at GopherCon-2016: ["The Design of the Go Assembler"](https://talks.golang.org/2016/asm.slide), | ||
the `cmd/internal/obj` package can be seen as a rather portable assembly language. | ||
This paves the way for considering it as a portable intermediate representation | ||
(IR) of Go code. | ||
|
||
The proposal is thus to use this conduit as the general infrastructure to | ||
generate the opcodes and bytecode for the new Go VM. | ||
The concrete _modus_ _operandi_ for leveraging `cmd/internal/obj` and | ||
the whole `gc` compiler infrastructure might still need to be properly fleshed | ||
out, but here are the current options: | ||
|
||
- create a proper `GOARCH` architecture directly under `cmd/internal` like | ||
the other `GOARCH=amd64`, `GOARCH=s390x`, etc... architectures and aim for | ||
Go 1.8, (we would need to declare our plans [here](https://groups.google.com/forum/#!topic/golang-dev/098vr4999Tk)) | ||
- vendor `cmd/compiler` at a given Go version (_e.g._ 1.7) and work off it, | ||
aiming for integration at a later date (if at all possible), | ||
- ??? | ||
|
||
### Instructions, opcodes and bytecode format | ||
|
||
We propose to reuse the opcodes and bytecode format as described in the [Dis VM](http://www.vitanuova.com/inferno/papers/dis.pdf) | ||
specification paper. | ||
The `Dis` VM was able to execute [Limbo](https://en.wikipedia.org/wiki/Limbo_%28programming_language%29) | ||
code. | ||
`Limbo` and `Go` share a common lineage and present similar features | ||
(channels, `select`, garbage collector, packages) so many (if not all) of | ||
the opcodes our VM will need are already present and the instruction set has | ||
been formally described. | ||
The on-disk object file format and overall organization has also been specified | ||
in the above paper. | ||
|
||
We intend to follow the general spirit of the specifications of the `Dis` VM | ||
and condense it inside a package named `dice`. | ||
The implementation of `dice` should be done from first principles, | ||
without looking at the `Dis` source code | ||
This is to ensure that `dice` can be licensed under `BSD-3`. | ||
|
||
The various `opcode`s are listed here: | ||
|
||
``` | ||
00 nop 20 headb 40 mulw 60 blew 80 shrl | ||
01 alt 21 headw 41 mulf 61 bgtw 81 bnel | ||
02 nbalt 22 headp 42 divb 62 bgew 82 bltl | ||
03 goto 23 headf 43 divw 63 beqf 83 blel | ||
04 call 24 headm 44 divf 64 bnef 84 bgtl | ||
05 frame 25 headmp 45 modw 65 bltf 85 bgel | ||
06 spawn 26 tail 46 modb 66 blef 86 beql | ||
07 runt 27 lea 47 andb 67 bgtf 87 cvtlf | ||
08 load 28 indx 48 andw 68 bgef 88 cvtfl | ||
09 mcall 29 movp 49 orb 69 beqc 89 cvtlw | ||
0A mspawn 2A movm 4A orw 6A bnec 8A cvtwl | ||
0B mframe 2B movmp 4B xorb 6B bltc 8B cvtlc | ||
0C ret 2C movb 4C xorw 6C blec 8C cvtcl | ||
0D jmp 2D movw 4D shlb 6D bgtc 8D headl | ||
0E case 2E movf 4E shlw 6E bgec 8E consl | ||
0F exit 2F cvtbw 4F shrb 6F slicea 8F newcl | ||
10 new 30 cvtwb 50 shrw 70 slicela 90 casec | ||
11 newa 31 cvtfw 51 insc 71 slicec 91 indl | ||
12 newcb 32 cvtwf 52 indc 72 indw 92 movpc | ||
13 newcw 33 cvtca 53 addc 73 indf 93 tcmp | ||
14 newcf 34 cvtac 54 lenc 74 indb 94 mnewz | ||
15 newcp 35 cvtwc 55 lena 75 negf 95 cvtrf | ||
16 newcm 36 cvtcw 56 lenl 76 movl 96 cvtfr | ||
17 newcmp 37 cvtfc 57 beqb 77 addl 97 cvtws | ||
18 send 38 cvtcf 58 bneb 78 subl 98 cvtsw | ||
19 recv 39 addb 59 bltb 79 divl 99 lsrw | ||
1A consb 3A addw 5A bleb 7A modl 9A lsrl | ||
1B consw 3B addf 5B bgtb 7B mull 9B eclr | ||
1C consp 3C subb 5C bgeb 7C andl 9C newz | ||
1D consf 3D subw 5D beqw 7D orl 9D newaz | ||
1E consm 3E subf 5E bnew 7E xorl | ||
1F consmp 3F mulb 5F bltw 7F shll | ||
``` | ||
|
||
We reserve the right to rename some of these `opcode`s to better reflect | ||
the naming conventions of our source language, Go. | ||
|
||
### Virtual Machine | ||
|
||
Once a Go package, command or code snippet has been compiled to our `dice` bytecode, | ||
that bytecode needs to be somehow executed. | ||
This job is performed by the `dice.VM` virtual machine: | ||
|
||
```go | ||
package dice | ||
|
||
type VM struct { | ||
frame *frame | ||
globals []reflect.Value | ||
} | ||
|
||
type frame struct { | ||
vm *VM | ||
caller *frame | ||
locals []reflect.Value | ||
pc int // program counter | ||
code []instruction | ||
} | ||
|
||
type instruction struct { | ||
opcode byte | ||
amode byte // address mode | ||
addrs uint64 // operands (src1, src2, dst) | ||
} | ||
|
||
func (vm *VM) run() { | ||
run(vm.frame) | ||
} | ||
|
||
func run(fr *frame) { | ||
for { | ||
code: | ||
for _, code := range fr.code { | ||
switch exec(fr, code) { | ||
case cfReturn: | ||
return | ||
case cfNext: | ||
// fetching next instruction | ||
case cfJump: | ||
break code | ||
} | ||
} | ||
} | ||
} | ||
|
||
func exec(fr *frame, code instruction) cfKind { | ||
switch code.opcode { | ||
case opADDF: | ||
// dst = src1 + src2 | ||
fr.pc++ | ||
case opCALL: | ||
run(&frame{caller:fr, pc:0, code: from(src)}) | ||
case opRET: | ||
// fetch result if any | ||
return cfReturn | ||
case opGO: | ||
go func() { | ||
run(&frame{caller:fr}) | ||
}() | ||
// etc... | ||
} | ||
} | ||
``` | ||
|
||
At this moment, the proposal is to be able to byte compile this simple Go package: | ||
|
||
```go | ||
package main | ||
|
||
func add(i, j int) int { | ||
return i+j | ||
} | ||
|
||
func main() {} | ||
``` | ||
|
||
and in a later stage, be able to run `add(40, 2)`. | ||
|
||
## Rationale | ||
|
||
Why do we implement yet another Go interpreter and a REPL? | ||
Aren't there already enough of them? | ||
|
||
Here is a list of alternatives: | ||
|
||
- [llgoi](https://github.com/llvm-mirror/llgo/blob/master/cmd/llgoi/llgoi.go) is a JIT-enabled interpreter built on top of `LLVM` and `llgo`. | ||
The first issue with `llgoi` is the somewhat painfull installation process. | ||
This pain point should be resorbed with time (and also by providing [snap based](https://groups.google.com/forum/#!msg/llgo-dev/ny8MgDlNkng/8kEvgzfuCQAJ) | ||
isntallations of `llgoi`. | ||
But the main issue is that `llgo` development is behind that of the reference | ||
implementation of `Go`: `gc`. | ||
Also, the pace of development of `LLVM` itself (very fast) and the version skew | ||
that may result on users' machines *might* set the scene for difficult user | ||
support and debugging sessions. | ||
|
||
- [ssainterp](https://github.com/go-interpreter/ssainterp) and [ssadump -run](https://godoc.org/golang.org/x/tools/cmd/ssadump) | ||
are based on the SSA suite developped at [golang.org/x/tools/go/ssa](https://godoc.org/golang.org/x/tools/go/ssa). | ||
They are able to parse and interpret a vast majority of valid Go code, | ||
but lack an interactive interpreter mode. | ||
`ssadump` code is also clearly stated as *NOT* meant to be used as a | ||
production-grade interpreter for Go but merely as an adjunct for testing | ||
the SSA construction algorithm. | ||
|
||
- [igo](https://github.com/sbinet/igo) and [go-eval](https://github.com/sbinet/go-eval) | ||
are projects salvaged from the pre `Go-1` era. | ||
`go-eval` does not lend itself easily to compilation optimizations and lacks | ||
support for `imports`, `goroutines`, type creation, ... | ||
|
||
- [gore](https://github.com/motemen/gore) supports the whole Go language but | ||
does not (completely cleanly) preserve state or side effects between | ||
2 interactive commands: `gore` recompiles on-the-fly your Go snippets and | ||
re-executes them. | ||
|
||
It seems necessary to implement some kind of a virtual machine to be able | ||
to provide an efficient and truly interactive interpreter for Go. | ||
|
||
The same question can be also raised about reimplementing a whole new VM. | ||
Couldn't we have somehow reused an already existing VM? | ||
`Python`, `Lua`, `JVM` and `Dis` come to mind. | ||
`Dis` is LGPL and thus not easily integrable in the usual Go ecosystem. | ||
`Python` and `Lua` have more permissive licenses, but their reference | ||
implementation are written in `C`, bringing either performance issues on the | ||
table (`cgo`) or throwing `go-get`-ability out of the window. | ||
There are however `Go` implementations (partial or complete) of these VMs: | ||
|
||
- https://github.com/Shopify/go-lua/blob/master/vm.go | ||
- https://github.com/flowlo/gothon/blob/master/frame.go | ||
|
||
The following issue at this point is the adequacy of their respective VM | ||
instructions sets with the Go language. | ||
|
||
Finally, why do we use the `Dis` VM instructions set, instead of a more recent | ||
or more in vogue set, such as [LLVM bitcode](http://llvm.org/docs/BitCodeFormat.html) | ||
and its associated [LLVM assembly](http://llvm.org/docs/LangRef.html), or the | ||
nascent [`wasm` bytecode](https://webassembly.github.io/) format? | ||
|
||
The `LLVM` solution suffers (to a lesser extent) from the same issues than the `llgoi` approach. | ||
We should note though there exists a pure-Go project to interact with the `LLVM` `IR`: | ||
[llir/llvm](https://github.com/llir/llvm). | ||
This project is still a work in progress at this time of writing (August 2016). | ||
|
||
`wasm` is probably a very strong and sensible option, and poised to take over | ||
the whole web industry. | ||
Unfortunately, there is only a work in progress `C/C++` project at the moment (August 2016), | ||
so it is probably a bit early to write code to target it. | ||
However, `wasm` is definitely a backend to monitor: `gopherjs`, a project transpiling | ||
Go code into `JavaScript` will probably target it at some point. | ||
|
||
## Compatibility - Open issues | ||
|
||
There are a few interesting issues when interpreting Go code in an interactive | ||
fashion. | ||
|
||
1. Should we allow mid-way imports of packages ? | ||
``` | ||
go> slice := []string{"HELLO", "GO"} | ||
go> import "strings" | ||
go> println(strings.ToLower(slice[0])) | ||
``` | ||
|
||
What if `slice` was instead named `strings`? | ||
Should we allow shadowing of variables by package identifiers? | ||
Should we instead re-shadow the package identifier with the variable | ||
identifier? | ||
The latter seems like the more idiomatic Go behaviour, or at least the | ||
behaviour a gopher would expect if she were to write the program in | ||
a compiled environment (_i.e.:_ with `goimports` putting the `import` | ||
statement at the top) | ||
|
||
2. Support for `cgo` and `import "C"` ? | ||
3. Support for packages with assembly ? (from the `stdlib` or otherwise) | ||
4. Calls to `syscalls` ? Should they be somehow recognized and performed | ||
on a dedicated `goroutine`? What should `os.Exit` do? and how? | ||
5. How to efficiently implement iteration over maps? | ||
6. How to implement `unsafe`? Should we? | ||
7. How to implement the definition of new types? | ||
Package `reflect` has some support for this (`StructOf`, `ArrayOf`, ...) but | ||
it currently has no support for defining new interface types nor any new | ||
named types. | ||
8. In an interactive interpreter, how do we define methods for a named type? | ||
When, and how, do we tell the interpreter that the method set of a given | ||
named type is done? | ||
9. What is the most efficient way to write the `opcode` dispatch loop? | ||
A huge switch? ([go-lua](https://engineering.shopify.com/79963844-announcing-go-lua) | ||
reported issues with huge switches and migrated to a jump table.) | ||
|
||
## Implementation | ||
|
||
1. `dice.{VM,frame,instruction}` implementation leading to the execution | ||
of already decoded instructions, | ||
2. implementation of the bytecode stream decoder, | ||
3. implementation of the bytecode encoder, | ||
4. implementation of the interactive prompt of the REPL (with limitations), | ||
5. implementation of dynamically importing packages at the REPL level. | ||
This probably needs either a working `buildmode=plugin` from the `go` tool, | ||
or a complete handling of dynamically loading bytecode object files. | ||
|