JustAnotherJAPL/README.md

# NimVM
A basic programming language written in Nim

## Project structure

The project is split into several directories and submodules to ease human inspection:
- `README.md` -> This file here (lol)
- `docs` -> Contains markdown files with the various specifications for NimVM (bytecode, grammar, etc)
    - `docs/bytecode.md` -> Lays out the bytecode specification for NimVM as well as serialization guidelines
    - `docs/grammar.md` -> Formal grammar specification in EBNF syntax
- `src` -> Contains source files
    - `src/main.nim` -> This is the main executable for NimVM (REPL, run files, etc.), currently not in this repo
    - `src/backend` -> Contains the backend of the language (lexer, parser and compiler)
        - `src/backend/meta` -> Contains meta-structures that are used during parsing and compilation
        - `src/backend/lexer.nim` -> Contains the tokenizer
        - `src/backend/parser.nim` -> Contains the parser
        - `src/backend/compiler.nim` -> Contains the compiler
    - `src/frontend` -> Contains the language's frontend (runtime environment and type system)
        - `src/frontend/types` -> Contains the implementation of the type system
        - `src/frontend/vm.nim` -> Contains the virtual machine (stack-based)
    - `src/util` -> Contains generic utilities used troughout the project
    - `src/util/bytecode` -> Contains the bytecode serializer/deserializer
        - `src/util/bytecode/serializer.nim` -> Contains the bytecode serializer
        - `src/util/bytecode/deserializer.nim` -> Contains the bytecode deserializer
        - `src/util/bytecode/objects.nim` -> Contains object wrappers for bytecode opcodes
    - `src/util/debug.nim` -> Contains the debugger

## Language design

NimVM is a generic stack-based bytecode VM implementation, meaning that source files are compiled into an
imaginary instruction set for which we implemented all the required operations in a virtual machine. NimVM
uses a triple-pass compiler where the input is first tokenized, then parsed into an AST and finally optimized
before being translated to bytecode.

The compilation toolchain has been designed as follows:
- First, the input is tokenized. This process aims to break down the source input into a sequence of easier to
    process tokens for the next step. The lexer (or tokenizer) detects basic syntax errors like unterminated
    string literals and multi-line comments and invalid usage of unknown tokens (for example UTF-8 runes)
- Then, the tokens are fed into a parser. The parser recursively traverses the list of tokens coming from the lexer
  and builds a higher-level structure called an Abstract Syntax Tree-- or AST for short-- and also catches the rest of
  static or syntax errors such as illegal statement usage (for example return outside a function), malformed expressions
  and declarations and much more
- After the AST has been built, it goes trough the optimizer. As the name suggests, this step aims to perform a few optimizations,
  namely:
  - constant folding (meaning 1 + 2 will be replaced with 3 instead of producing 2 constant opcodes and 1 addition opcode)
  - global name resolution. This is possible because NimVM's syntax only allows for globals to be defined in a way that
    is statically inferrable, so "name error" exceptions can be caught before any code is even ran.
  - throw warnings for things like unreachable code after return statements (optional)

  The optimizer also detects attempts to modify a constant's or a let's value at compile-time.
- Once the optimizater is done, the compiler takes the AST and compiles it to bytecode for it to be later interpreted
  by our virtual machine implementation
Updated README 2021-07-12 18:29:11 +02:00			`# NimVM`
			`A basic programming language written in Nim`
Updated README with project info 2021-07-13 16:09:53 +02:00
			`## Project structure`

			`The project is split into several directories and submodules to ease human inspection:`
			- `README.md` -> This file here (lol)
			- `docs` -> Contains markdown files with the various specifications for NimVM (bytecode, grammar, etc)
			- `docs/bytecode.md` -> Lays out the bytecode specification for NimVM as well as serialization guidelines
			- `docs/grammar.md` -> Formal grammar specification in EBNF syntax
			- `src` -> Contains source files
Updated README 2021-07-18 16:22:43 +02:00			- `src/main.nim` -> This is the main executable for NimVM (REPL, run files, etc.), currently not in this repo
Updated README with project info 2021-07-13 16:09:53 +02:00			- `src/backend` -> Contains the backend of the language (lexer, parser and compiler)
			- `src/backend/meta` -> Contains meta-structures that are used during parsing and compilation
			- `src/backend/lexer.nim` -> Contains the tokenizer
			- `src/backend/parser.nim` -> Contains the parser
			- `src/backend/compiler.nim` -> Contains the compiler
			- `src/frontend` -> Contains the language's frontend (runtime environment and type system)
			- `src/frontend/types` -> Contains the implementation of the type system
			- `src/frontend/vm.nim` -> Contains the virtual machine (stack-based)
			- `src/util` -> Contains generic utilities used troughout the project
			- `src/util/bytecode` -> Contains the bytecode serializer/deserializer
			- `src/util/bytecode/serializer.nim` -> Contains the bytecode serializer
			- `src/util/bytecode/deserializer.nim` -> Contains the bytecode deserializer
			- `src/util/bytecode/objects.nim` -> Contains object wrappers for bytecode opcodes
			- `src/util/debug.nim` -> Contains the debugger

Added compilation pipeline 2021-07-18 16:21:07 +02:00			`## Language design`

			`NimVM is a generic stack-based bytecode VM implementation, meaning that source files are compiled into an`
			`imaginary instruction set for which we implemented all the required operations in a virtual machine. NimVM`
			`uses a triple-pass compiler where the input is first tokenized, then parsed into an AST and finally optimized`
			`before being translated to bytecode.`

			`The compilation toolchain has been designed as follows:`
			`- First, the input is tokenized. This process aims to break down the source input into a sequence of easier to`
			`process tokens for the next step. The lexer (or tokenizer) detects basic syntax errors like unterminated`
			`string literals and multi-line comments and invalid usage of unknown tokens (for example UTF-8 runes)`
			`- Then, the tokens are fed into a parser. The parser recursively traverses the list of tokens coming from the lexer`
			`and builds a higher-level structure called an Abstract Syntax Tree-- or AST for short-- and also catches the rest of`
typo (inside -> outside) 2021-07-18 16:24:01 +02:00			`static or syntax errors such as illegal statement usage (for example return outside a function), malformed expressions`
Added compilation pipeline 2021-07-18 16:21:07 +02:00			`and declarations and much more`
			`- After the AST has been built, it goes trough the optimizer. As the name suggests, this step aims to perform a few optimizations,`
			`namely:`
			`- constant folding (meaning 1 + 2 will be replaced with 3 instead of producing 2 constant opcodes and 1 addition opcode)`
			`- global name resolution. This is possible because NimVM's syntax only allows for globals to be defined in a way that`
			`is statically inferrable, so "name error" exceptions can be caught before any code is even ran.`
Added warnings section to optimizer step 2021-07-18 16:25:03 +02:00			`- throw warnings for things like unreachable code after return statements (optional)`
Minor visual change 2021-07-18 16:26:25 +02:00
Added compilation pipeline 2021-07-18 16:21:07 +02:00			`The optimizer also detects attempts to modify a constant's or a let's value at compile-time.`
			`- Once the optimizater is done, the compiler takes the AST and compiles it to bytecode for it to be later interpreted`
			`by our virtual machine implementation`