JustAnotherJAPL/README.md

52 lines
2.9 KiB
Markdown
Raw Normal View History

# NimVM - A stack-based bytecode virtual machine written in Nim
2021-07-12 18:29:11 +02:00
A basic programming language written in Nim
2021-07-13 16:09:53 +02:00
## Project structure
2021-07-18 21:27:42 +02:00
The project is split into several directories and submodules:
- `build.py` -> The build script (TODO, not pushed yet)
2021-07-13 16:09:53 +02:00
- `docs` -> Contains markdown files with the various specifications for NimVM (bytecode, grammar, etc)
- `src` -> Contains source files
2021-08-21 15:14:56 +02:00
- `src/backend` -> Contains the backend of the language such as the parser, compiler and optimizer
- `src/meta` -> Contains meta-structures used during compilation and parsing
- `src/frontend` -> Contains the runtime environment of NimVM
- `src/frontend/types` -> Contains the type system
- `src/memory` -> Contains NimVM's allocator and memory manager
2021-08-21 15:14:56 +02:00
2021-07-13 16:09:53 +02:00
2021-07-18 16:21:07 +02:00
## Language design
NimVM is a generic stack-based bytecode VM implementation, meaning that source files are compiled into an
imaginary instruction set for which all required operations are implemented in a virtual machine. NimVM
uses a triple-pass compiler where the input is first tokenized and parsed into an AST, then optimized and
eventually translated to bytecode.
2021-07-18 16:21:07 +02:00
The compilation toolchain has been designed as follows:
- First, the input is tokenized. This process aims to break down the source input into a sequence of easier to
process tokens for the next step. The lexer (or tokenizer) detects basic syntax errors like unterminated
string literals and invalid usage of unknown tokens (for example UTF-8 runes)
2021-07-18 16:21:07 +02:00
- Then, the tokens are fed into a parser. The parser recursively traverses the list of tokens coming from the lexer
and builds a higher-level structure called an Abstract Syntax Tree-- or AST for short-- and also catches the rest of
2021-07-18 16:24:01 +02:00
static or syntax errors such as illegal statement usage (for example return outside a function), malformed expressions
2021-07-18 16:21:07 +02:00
and declarations and much more
- After the AST has been built, it goes trough the optimizer. As the name suggests, this step aims to perform a few optimizations,
namely:
- constant folding (meaning 1 + 2 will be replaced with 3 instead of producing 2 constant opcodes and 1 addition opcode)
- global name resolution. This is possible because NimVM's syntax only allows for globals to be defined in a way that
is statically inferrable, so "name error" exceptions can be caught before any code is even ran.
2021-07-18 21:27:42 +02:00
- throw warnings for things like unreachable code after return statements (optional).
2021-07-18 16:26:25 +02:00
The optimization step is entirely optional and enabled by default
2021-07-18 16:21:07 +02:00
- Once the optimizater is done, the compiler takes the AST and compiles it to bytecode for it to be later interpreted
by our virtual machine implementation
## Language syntax
NimVM uses a syntax mostly inspired from C and Java, although some influences come from Python as well.
## Credits
NimVM was inspired by Bob Nystrom's amazing [Crafting Interpreters](https://craftinginterpreters.com) book