Added info about CFI section and made minor changes to README

This commit is contained in:
Mattia Giambirtone 2022-05-25 11:36:12 +02:00
parent 48d1c3fc8c
commit be4c2500ac
3 changed files with 58 additions and 31 deletions

View File

@ -52,25 +52,25 @@ In no particular order, here's a list of stuff that's done/to do (might be incom
Toolchain:
- Tokenizer (with dynamic symbol table) [x]
- Parser (with support for custom operators, even builtins) [x]
- Compiler [ ] (Work in Progress)
- VM [ ] (Work in Progress)
- Bytecode (de-)serializer [x]
- Static code debugger [x]
- Runtime debugger/inspection tool [ ]
- Tokenizer (with dynamic symbol table) -> Done
- Parser (with support for custom operators, even builtins) -> Done
- Compiler [ ] -> Being written
- VM [ ] -> Being written
- Bytecode (de-)serializer -> Done
- Static code debugger [x] -> Done
- Runtime debugger/inspection tool -> TODO
Type system:
- Custom types [ ]
- Intrinsics [x]
- Generics [ ] (Work in Progress)
- Function calls [ ] (Work in Progress)
- Custom types -> TODO
- Intrinsics -> Done
- Generics -> TODO
- Function calls -> WIP
Misc:
- Pragmas [ ] (Work in Progress)
- Attribute resolution [ ]
- Pragmas -> TODO
- Attribute resolution -> TODO
- ... More?
## The name

View File

@ -16,25 +16,28 @@ A peon bytecode dump contains:
- Debugging information
- File and version metadata
## Encoding
### Header
## File Headers
A peon bytecode file starts with the header, which is structured as follows:
- The literal string `PEON_BYTECODE`
- A 3-byte version number (the major, minor and patch versions of the compiler that generated the file as per the SemVer versioning standard)
- A 3-byte version number (the major, minor and patch version numbers of the compiler that generated the file)
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
- The full commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
- The commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
- A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes
### Line data section
## Debug information
The line data section contains information about each instruction in the code section and associates them
The following segments contain extra information and metadata about the compiled bytecode to aid debugging, but they may be missing
in release builds.
### Line data segment
The line data segment contains information about each instruction in the code segment and associates them
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
in this section can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
below:
```
[...]
@ -54,19 +57,43 @@ below:
[...]
```
### Constant section
### CFI segment
The constant section contains all the read-only values that the code will need at runtime, such as hardcoded
The CFI segment (where CFI stands for **C**all **F**rame **I**nformation), contains details about each function in
the original file. The segment's size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer).
The data
in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L41), which is quoted
below:
```
[...]
## cfi represents Call Frame Information and encodes the following information:
## - Function name
## - Stack bottom
## - Argument count
## The encoding for CFI data is the following:
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
## - Then, the frame's stack bottom is encoded as a 3 byte integer
## - After the frame's stack bottom follows the argument count as a 1 byte integer
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
## its size as a 2-byte integer
[...]
```
## Constant segment
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
then be loaded by the appropriate `LoadInt32` instruction at runtime. The section's size is fixed and is encoded at
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant section may be empty, although in
real-world scenarios it's unlikely that it would.
then be loaded by the appropriate `LoadInt32` instruction at runtime. The segment's size is fixed and is encoded at
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant segment may be empty, although in
real-world scenarios likely won't.
### Code section
## Code segment
The code section contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
and without modifications. The section's size is fixed and is encoded at the beginning as a sequence of 3 bytes
The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
(i.e. a single 24 bit integer).

View File

@ -43,8 +43,8 @@ type
## - Stack bottom
## - Argument count
## The encoding for CFI data is the following:
## - First, the position into the bytecode where the function begins is encoded
## - Second, the position into the bytecode where the function ends is encoded
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
## - Then, the frame's stack bottom is encoded as a 3 byte integer
## - After the frame's stack bottom follows the argument count as a 1 byte integer
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with