Here is a compiler producing bytecode for a simplified implementation of the OCaml Virtual Machine, called Mini-ZAM.
This project is part of the Advanced Compilation course of my professional training at Sorbonne University (2022 2023).
I made the choice of OCaml ecosystem for the compiler implementation :
- OCaml V4.13.1
- Menhir 20211230 tool for syntax analysis
- Use OPAM 2.1.1 Ocaml PAckage Manager for Menhir installation
From version 1.1,
I added a typer to the compiler, it launches just before the compilation process.
Typing is a fundamental component of the programming language specification,
called the type system.
So, I took advantage of this project to start learning typing concepts.
In summary, on the one hand there is strong typing.
We must give the type of each variable, each function argument, etc. It is very rigorous, safe, but very constraining.
On the other hand, there is weak typing, allowing more freedom, but which can cause big problems
if programmers do not apply strict coding rules.
See more details.
Fortunately, there is a middle way, probably the most advanced, used in functional languages:
polymorphic types and inference. Expressions will define dynamically the variables types,
or the possible types until a decision is made as late as possible.
It is very powerful because it allows to define (for example) multi-types (polymorphic)
functions while being type-safe.
To do this, there are classic algorithms, not easy to implement, such as the inference algorithm W.
See more details with Hindley–Milner type system.
In conclusion, typing is a complex dimension of programming languages, very powerful. Fun fact, in functional programming, there is an expression that says, "type-directed programming"!
Good news, to go further, here is a Lambda calculus typer
from the "Typing and Static Analysis" course of my professional training at Sorbonne University (2022).
The implementation is more advanced, therefore allows to continue in the core concepts understanding.
To compile the project source code, run the default Makefile command
make
To execute the project (bytecode generation for the Mini-ZAM), run this commands.
In the following example, tests/t0.txt file is compiled to tests/build/bc_t0.txt bytecode text file.
To get an idea of the result, see the bytecode sample file tests/build/demo-bytecode.txt
S=t0.txt make compile
With no_typing option, compilation is made without typing prior checks.
S=t0.txt make compile_no_typing
With debug option, AST is displayed in the standard output after typing prior checks.
S=t0.txt make compile_debug
Then, you need a Mini-ZAM to interpret the generated bytecode.
Here is a short introduction to abstract interpretation, which is an advanced static program analysis technique. For example Abstract interpretation allows to detect possible division by 0, or memory overflows.
This implementation (a simple adaptation of a lesson on this pedagogical compiler), represents the very beginning of a complete research field in computer science. It allows to have a first idea and to see how it can be written.
I only consider integer values on 3 different analyses:
- Concrete analysis:
S=tai1-0.txt make abs_inter_concrete
- Constants analysis:
S=tai1-0.txt make abs_inter_constant
- Intervals analysis:
S=tai1-0.txt make abs_inter_interval