Initial work on documentation

This commit is contained in:
Mattia Giambirtone 2022-05-23 23:08:00 +02:00
parent 396f40d3d6
commit 4230222639
5 changed files with 292 additions and 3 deletions

View File

@ -1,6 +1,9 @@
# The peon programming language
Peon is a simple, functional, async-first programming language with a focus on correctness and speed
Peon is a simple, functional, async-first programming language with a focus on correctness and speed.
[Go to the Manual](docs/manual.md)
## Project structure
@ -42,6 +45,34 @@ Also, peon will feature [structured concurrency](https://vorpus.org/blog/notes-o
callback hell). Since, unlike Lox, peon isn't a toy language, there's obviously plans to implement creature comforts
like an import system, exception handling, a package manager, etc.
### TODO List
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
Toolchain:
- Tokenizer (with dynamic symbol table) [x]
- Parser (with support for custom operators, even builtins) [x]
- Compiler [ ] (Work in Progress)
- VM [ ] (Work in Progress)
- Bytecode (de-)serializer [x]
- Static code debugger [x]
- Runtime debugger/inspection tool [ ]
Type system:
- Custom types [ ]
- Intrinsics [x]
- Generics [ ] (Work in Progress)
- Function calls [ ] (Work in Progress)
Misc:
- Pragmas [ ] (Work in Progress)
- Attribute resolution [ ]
- ... More?
## The name
The name for peon comes from my and [Productive2's](https://git.nocturn9x.space/prod2) genius and is a result of shortening

View File

@ -1 +0,0 @@
# TODO

View File

@ -1 +1,72 @@
# TODO
# Peon - Bytecode Specification
This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
other file-like objects.
## Code Structure
A peon program is compiled into a tightly packed sequence of bytes that contain all the necessary information
the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.
A peon bytecode dump contains:
- Constants
- The bytecode itself
- Debugging information
- File and version metadata
## Encoding
### Header
A peon bytecode file starts with the header, which is structured as follows:
- The literal string `PEON_BYTECODE`
- A 3-byte version number (the major, minor and patch versions of the compiler that generated the file as per the SemVer versioning standard)
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
- The full commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
- A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes
### Line data section
The line data section contains information about each instruction in the code section and associatrs them
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
in this section can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
below:
```
[...]
## lines maps bytecode instructions to line numbers using Run
## Length Encoding. Instructions are encoded in groups whose structure
## follows the following schema:
## - The first integer represents the line number
## - The second integer represents the count of whatever comes after it
## (let's call it c)
## - After c, a sequence of c integers follows
##
## A visual representation may be easier to understand: [1, 2, 3, 4]
## This is to be interpreted as "there are 2 instructions at line 1 whose values
## are 3 and 4"
## This is more efficient than using the naive approach, which would encode
## the same line number multiple times and waste considerable amounts of space.
[...]
```
### Constant section
The constant section contains all the read-only values that the code will need at runtime, such as hardcoded
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
then be loaded by the appropriate `LoadInt32` instruction at runtime. The section's size is fixed and is encoded at
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant section may be empty, although in
real-world scenarios it's unlikely that it would.
### Code section
The code section contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
and without modifications. The section's size is fixed and is encoded at the beginning as a sequence of 3 bytes
(i.e. a single 24 bit integer).

188
docs/manual.md Normal file
View File

@ -0,0 +1,188 @@
# Peon - Manual
Peon is a functional, statically typed, garbage-collected, C-like programming language with
a focus on speed and correctness, but whose main feature is the ability to natively
perform highly efficient parallel I/O operations by implementing the [structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/)
paradigm.
__Note__: Peon is currently a WIP (Work In Progress), and much of the content of this manual is purely theoretical as
of now. If you want to help make this into a reality, feel free to contribute!
## Table of contents
- [Manual](#peon---manual)
- [Design Goals](#design-goals)
- [Examples](#peon-by-example)
- [Grammar](grammar.md)
- [Bytecode](bytecode.md)
## Design Goals
While peon is inspired from Bob Nystrom's [book](https://craftinginterpreters.com), where he describes a simple toy language
named Lox, the aspiration for it is to become a programming language that could actually be used in the real world. For that
to happen, we need:
- Exceptions (`try/except/finally`)
- An import system (with namespaces, like Python)
- Multithreading support (with a global VM lock when GC'ing)
- Built-in collections (list, tuple, set, etc.)
- Coroutines (w/ structured concurrency)
- Generators
- Generics
- C/Nim FFI
- A package manager
Peon ~~steals~~ borrows many ideas from Python and Nim (the latter being the language peon itself is written in).
## Peon by Example
Here follow a few examples of peon code to make it clear what the end product should look like
### Variable declarations
```
var x = 5; # Inferred type is int64
var y = 3'u16; # Type is specified as uint16
x = 6; # Works: type matches
x = 3.0; # Cannot assign float64 to x
var x = 3.14; # Cannot re-declare x
```
__Note__: Peon supports [name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)), meaning
that almost any ASCII sequence of characters can be used as an identifier, including language
keywords, but stropped names need to be enclosed by matching pairs of backticks (`\``)
### Functions
```
fn fib(n: int): int {
if (n < 3) {
return n;
}
return fib(n - 1) + fib(n - 2);
}
fib(30);
```
### Type declarations
```
type Foo = object { # Can also be "ref object" for reference types (managed automatically)
fieldOne*: int # Asterisk means the field is public outside the current module
fieldTwo*: int
}
```
### Operator overloading
```
operator `+`(a, b: Foo) {
return Foo(fieldOne: a.fieldOne + b.fieldOne, fieldTwo: a.fieldTwo + b.fieldTwo);
}
Foo(fieldOne: 1, fieldTwo: 3) + Foo(fieldOne: 2, fieldTwo: 3); # Foo(fieldOne: 3, fieldTwo: 6)
```
__Note__: Custom operators (e.g. `foo`) can also be defined! The backticks around the plus sign serve to mark it
as an identifier instead of a symbol (which is a requirement for function names, since operators are basically
functions). In fact, even the built-in peon operators are implemented partially in peon (well, their forward
declarations are) and they are then specialized in the compiler to emit a single bytecode instruction.
### Function calls
```
foo(1, 2 + 3, 3.14, bar(baz));
```
__Note__: Operators can be called as functions too. Just wrap their name in backticks, like so:
```
`+`(1, 2)
```
__Note__: Code the likes of `a.b()` is desugared to `b(a)` if there exists a function `b` whose
signature is compatible with the value of of `a` (assuming `a` doesn't have a `b` field, in
which case the attribute resolution takes precedence)
### Generic declarations
```
fn genericSum[T](a, b: T): T { # Note: "a, b: T" means that both a and b are of type T
return a + b;
}
# This allows for a single implementation to be
# re-used multiple times without any code duplication!
genericSum(1, 2);
genericSum(3.14, 0.1);
genericSum(1'u8, 250'u8);
```
#### Multiple generics
```
fn genericSth[T, K](a: T, b: K) { # Note: no return type == void function!
# code...
}
genericSth(1, 3.0);
```
__Note__: The `*` modifier to make a name visible outside the current module must be put
__before__ generics declarations, so only `fn foo*[T](a: T) {}` is the correct syntax
### Forward declarations
```
fn someF: int; # Semicolon, no body!
someF(); # This works!
fn someF: int {
return 42;
}
```
### Generators
```
generator count(n: int): int {
while (n > 0) {
yield n;
n -= 1;
}
}
foreach (n: count(10)) {
print(n);
}
```
### Coroutines
```
import concur;
import http;
coroutine req(url: string): string {
return (await http.AsyncClient().get(url)).content;
}
coroutine main(urls: list[string]) {
pool = concur.pool(); # Creates a task pool: like a nursery in njsmith's article
for (var i = 0; i < urls.len(); i += 1) {
pool.spawn(req, urls[i]);
}
# The pool has internal machinery that makes the parent
# task wait until all child exit! When this function
# returns, ALL child tasks will have exited somehow
}
concur.run(main, newList[string]("https://google.com", "https://debian.org"))
```

BIN
tests.pbc Normal file

Binary file not shown.