Initial work on documentation
This commit is contained in:
parent
396f40d3d6
commit
4230222639
33
README.md
33
README.md
|
@ -1,6 +1,9 @@
|
|||
# The peon programming language
|
||||
|
||||
Peon is a simple, functional, async-first programming language with a focus on correctness and speed
|
||||
Peon is a simple, functional, async-first programming language with a focus on correctness and speed.
|
||||
|
||||
[Go to the Manual](docs/manual.md)
|
||||
|
||||
|
||||
## Project structure
|
||||
|
||||
|
@ -42,6 +45,34 @@ Also, peon will feature [structured concurrency](https://vorpus.org/blog/notes-o
|
|||
callback hell). Since, unlike Lox, peon isn't a toy language, there's obviously plans to implement creature comforts
|
||||
like an import system, exception handling, a package manager, etc.
|
||||
|
||||
|
||||
### TODO List
|
||||
|
||||
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
|
||||
|
||||
Toolchain:
|
||||
|
||||
- Tokenizer (with dynamic symbol table) [x]
|
||||
- Parser (with support for custom operators, even builtins) [x]
|
||||
- Compiler [ ] (Work in Progress)
|
||||
- VM [ ] (Work in Progress)
|
||||
- Bytecode (de-)serializer [x]
|
||||
- Static code debugger [x]
|
||||
- Runtime debugger/inspection tool [ ]
|
||||
|
||||
Type system:
|
||||
|
||||
- Custom types [ ]
|
||||
- Intrinsics [x]
|
||||
- Generics [ ] (Work in Progress)
|
||||
- Function calls [ ] (Work in Progress)
|
||||
|
||||
Misc:
|
||||
|
||||
- Pragmas [ ] (Work in Progress)
|
||||
- Attribute resolution [ ]
|
||||
- ... More?
|
||||
|
||||
## The name
|
||||
|
||||
The name for peon comes from my and [Productive2's](https://git.nocturn9x.space/prod2) genius and is a result of shortening
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
# TODO
|
|
@ -1 +1,72 @@
|
|||
# TODO
|
||||
# Peon - Bytecode Specification
|
||||
|
||||
This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
|
||||
other file-like objects.
|
||||
|
||||
## Code Structure
|
||||
|
||||
A peon program is compiled into a tightly packed sequence of bytes that contain all the necessary information
|
||||
the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
|
||||
bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.
|
||||
|
||||
A peon bytecode dump contains:
|
||||
|
||||
- Constants
|
||||
- The bytecode itself
|
||||
- Debugging information
|
||||
- File and version metadata
|
||||
|
||||
## Encoding
|
||||
|
||||
### Header
|
||||
|
||||
A peon bytecode file starts with the header, which is structured as follows:
|
||||
|
||||
- The literal string `PEON_BYTECODE`
|
||||
- A 3-byte version number (the major, minor and patch versions of the compiler that generated the file as per the SemVer versioning standard)
|
||||
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
|
||||
- The full commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
|
||||
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
|
||||
- A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes
|
||||
|
||||
### Line data section
|
||||
|
||||
The line data section contains information about each instruction in the code section and associatrs them
|
||||
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
|
||||
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
|
||||
in this section can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
|
||||
below:
|
||||
```
|
||||
[...]
|
||||
## lines maps bytecode instructions to line numbers using Run
|
||||
## Length Encoding. Instructions are encoded in groups whose structure
|
||||
## follows the following schema:
|
||||
## - The first integer represents the line number
|
||||
## - The second integer represents the count of whatever comes after it
|
||||
## (let's call it c)
|
||||
## - After c, a sequence of c integers follows
|
||||
##
|
||||
## A visual representation may be easier to understand: [1, 2, 3, 4]
|
||||
## This is to be interpreted as "there are 2 instructions at line 1 whose values
|
||||
## are 3 and 4"
|
||||
## This is more efficient than using the naive approach, which would encode
|
||||
## the same line number multiple times and waste considerable amounts of space.
|
||||
[...]
|
||||
```
|
||||
|
||||
### Constant section
|
||||
|
||||
The constant section contains all the read-only values that the code will need at runtime, such as hardcoded
|
||||
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
|
||||
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
|
||||
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
|
||||
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
|
||||
then be loaded by the appropriate `LoadInt32` instruction at runtime. The section's size is fixed and is encoded at
|
||||
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant section may be empty, although in
|
||||
real-world scenarios it's unlikely that it would.
|
||||
|
||||
### Code section
|
||||
|
||||
The code section contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
|
||||
and without modifications. The section's size is fixed and is encoded at the beginning as a sequence of 3 bytes
|
||||
(i.e. a single 24 bit integer).
|
|
@ -0,0 +1,188 @@
|
|||
# Peon - Manual
|
||||
|
||||
Peon is a functional, statically typed, garbage-collected, C-like programming language with
|
||||
a focus on speed and correctness, but whose main feature is the ability to natively
|
||||
perform highly efficient parallel I/O operations by implementing the [structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/)
|
||||
paradigm.
|
||||
|
||||
__Note__: Peon is currently a WIP (Work In Progress), and much of the content of this manual is purely theoretical as
|
||||
of now. If you want to help make this into a reality, feel free to contribute!
|
||||
|
||||
|
||||
## Table of contents
|
||||
|
||||
- [Manual](#peon---manual)
|
||||
- [Design Goals](#design-goals)
|
||||
- [Examples](#peon-by-example)
|
||||
- [Grammar](grammar.md)
|
||||
- [Bytecode](bytecode.md)
|
||||
|
||||
## Design Goals
|
||||
|
||||
While peon is inspired from Bob Nystrom's [book](https://craftinginterpreters.com), where he describes a simple toy language
|
||||
named Lox, the aspiration for it is to become a programming language that could actually be used in the real world. For that
|
||||
to happen, we need:
|
||||
|
||||
- Exceptions (`try/except/finally`)
|
||||
- An import system (with namespaces, like Python)
|
||||
- Multithreading support (with a global VM lock when GC'ing)
|
||||
- Built-in collections (list, tuple, set, etc.)
|
||||
- Coroutines (w/ structured concurrency)
|
||||
- Generators
|
||||
- Generics
|
||||
- C/Nim FFI
|
||||
- A package manager
|
||||
|
||||
Peon ~~steals~~ borrows many ideas from Python and Nim (the latter being the language peon itself is written in).
|
||||
|
||||
## Peon by Example
|
||||
|
||||
Here follow a few examples of peon code to make it clear what the end product should look like
|
||||
|
||||
### Variable declarations
|
||||
|
||||
```
|
||||
var x = 5; # Inferred type is int64
|
||||
var y = 3'u16; # Type is specified as uint16
|
||||
x = 6; # Works: type matches
|
||||
x = 3.0; # Cannot assign float64 to x
|
||||
var x = 3.14; # Cannot re-declare x
|
||||
```
|
||||
|
||||
__Note__: Peon supports [name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)), meaning
|
||||
that almost any ASCII sequence of characters can be used as an identifier, including language
|
||||
keywords, but stropped names need to be enclosed by matching pairs of backticks (`\``)
|
||||
|
||||
### Functions
|
||||
|
||||
```
|
||||
fn fib(n: int): int {
|
||||
if (n < 3) {
|
||||
return n;
|
||||
}
|
||||
return fib(n - 1) + fib(n - 2);
|
||||
}
|
||||
|
||||
fib(30);
|
||||
```
|
||||
|
||||
### Type declarations
|
||||
|
||||
```
|
||||
type Foo = object { # Can also be "ref object" for reference types (managed automatically)
|
||||
fieldOne*: int # Asterisk means the field is public outside the current module
|
||||
fieldTwo*: int
|
||||
}
|
||||
```
|
||||
|
||||
### Operator overloading
|
||||
|
||||
```
|
||||
operator `+`(a, b: Foo) {
|
||||
return Foo(fieldOne: a.fieldOne + b.fieldOne, fieldTwo: a.fieldTwo + b.fieldTwo);
|
||||
}
|
||||
|
||||
Foo(fieldOne: 1, fieldTwo: 3) + Foo(fieldOne: 2, fieldTwo: 3); # Foo(fieldOne: 3, fieldTwo: 6)
|
||||
```
|
||||
|
||||
__Note__: Custom operators (e.g. `foo`) can also be defined! The backticks around the plus sign serve to mark it
|
||||
as an identifier instead of a symbol (which is a requirement for function names, since operators are basically
|
||||
functions). In fact, even the built-in peon operators are implemented partially in peon (well, their forward
|
||||
declarations are) and they are then specialized in the compiler to emit a single bytecode instruction.
|
||||
|
||||
### Function calls
|
||||
|
||||
```
|
||||
foo(1, 2 + 3, 3.14, bar(baz));
|
||||
```
|
||||
|
||||
__Note__: Operators can be called as functions too. Just wrap their name in backticks, like so:
|
||||
```
|
||||
`+`(1, 2)
|
||||
```
|
||||
|
||||
__Note__: Code the likes of `a.b()` is desugared to `b(a)` if there exists a function `b` whose
|
||||
signature is compatible with the value of of `a` (assuming `a` doesn't have a `b` field, in
|
||||
which case the attribute resolution takes precedence)
|
||||
|
||||
|
||||
### Generic declarations
|
||||
|
||||
```
|
||||
fn genericSum[T](a, b: T): T { # Note: "a, b: T" means that both a and b are of type T
|
||||
return a + b;
|
||||
}
|
||||
|
||||
# This allows for a single implementation to be
|
||||
# re-used multiple times without any code duplication!
|
||||
genericSum(1, 2);
|
||||
genericSum(3.14, 0.1);
|
||||
genericSum(1'u8, 250'u8);
|
||||
```
|
||||
|
||||
#### Multiple generics
|
||||
|
||||
```
|
||||
fn genericSth[T, K](a: T, b: K) { # Note: no return type == void function!
|
||||
# code...
|
||||
}
|
||||
|
||||
genericSth(1, 3.0);
|
||||
```
|
||||
|
||||
__Note__: The `*` modifier to make a name visible outside the current module must be put
|
||||
__before__ generics declarations, so only `fn foo*[T](a: T) {}` is the correct syntax
|
||||
|
||||
### Forward declarations
|
||||
|
||||
```
|
||||
fn someF: int; # Semicolon, no body!
|
||||
|
||||
someF(); # This works!
|
||||
|
||||
fn someF: int {
|
||||
return 42;
|
||||
}
|
||||
```
|
||||
|
||||
### Generators
|
||||
|
||||
```
|
||||
generator count(n: int): int {
|
||||
while (n > 0) {
|
||||
yield n;
|
||||
n -= 1;
|
||||
}
|
||||
}
|
||||
|
||||
foreach (n: count(10)) {
|
||||
print(n);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### Coroutines
|
||||
|
||||
```
|
||||
import concur;
|
||||
import http;
|
||||
|
||||
|
||||
coroutine req(url: string): string {
|
||||
return (await http.AsyncClient().get(url)).content;
|
||||
}
|
||||
|
||||
|
||||
coroutine main(urls: list[string]) {
|
||||
pool = concur.pool(); # Creates a task pool: like a nursery in njsmith's article
|
||||
for (var i = 0; i < urls.len(); i += 1) {
|
||||
pool.spawn(req, urls[i]);
|
||||
}
|
||||
# The pool has internal machinery that makes the parent
|
||||
# task wait until all child exit! When this function
|
||||
# returns, ALL child tasks will have exited somehow
|
||||
}
|
||||
|
||||
|
||||
concur.run(main, newList[string]("https://google.com", "https://debian.org"))
|
||||
```
|
Loading…
Reference in New Issue