Added info about CFI section and made minor changes to README
This commit is contained in:
parent
48d1c3fc8c
commit
be4c2500ac
26
README.md
26
README.md
|
@ -52,25 +52,25 @@ In no particular order, here's a list of stuff that's done/to do (might be incom
|
||||||
|
|
||||||
Toolchain:
|
Toolchain:
|
||||||
|
|
||||||
- Tokenizer (with dynamic symbol table) [x]
|
- Tokenizer (with dynamic symbol table) -> Done
|
||||||
- Parser (with support for custom operators, even builtins) [x]
|
- Parser (with support for custom operators, even builtins) -> Done
|
||||||
- Compiler [ ] (Work in Progress)
|
- Compiler [ ] -> Being written
|
||||||
- VM [ ] (Work in Progress)
|
- VM [ ] -> Being written
|
||||||
- Bytecode (de-)serializer [x]
|
- Bytecode (de-)serializer -> Done
|
||||||
- Static code debugger [x]
|
- Static code debugger [x] -> Done
|
||||||
- Runtime debugger/inspection tool [ ]
|
- Runtime debugger/inspection tool -> TODO
|
||||||
|
|
||||||
Type system:
|
Type system:
|
||||||
|
|
||||||
- Custom types [ ]
|
- Custom types -> TODO
|
||||||
- Intrinsics [x]
|
- Intrinsics -> Done
|
||||||
- Generics [ ] (Work in Progress)
|
- Generics -> TODO
|
||||||
- Function calls [ ] (Work in Progress)
|
- Function calls -> WIP
|
||||||
|
|
||||||
Misc:
|
Misc:
|
||||||
|
|
||||||
- Pragmas [ ] (Work in Progress)
|
- Pragmas -> TODO
|
||||||
- Attribute resolution [ ]
|
- Attribute resolution -> TODO
|
||||||
- ... More?
|
- ... More?
|
||||||
|
|
||||||
## The name
|
## The name
|
||||||
|
|
|
@ -16,25 +16,28 @@ A peon bytecode dump contains:
|
||||||
- Debugging information
|
- Debugging information
|
||||||
- File and version metadata
|
- File and version metadata
|
||||||
|
|
||||||
## Encoding
|
## File Headers
|
||||||
|
|
||||||
### Header
|
|
||||||
|
|
||||||
A peon bytecode file starts with the header, which is structured as follows:
|
A peon bytecode file starts with the header, which is structured as follows:
|
||||||
|
|
||||||
- The literal string `PEON_BYTECODE`
|
- The literal string `PEON_BYTECODE`
|
||||||
- A 3-byte version number (the major, minor and patch versions of the compiler that generated the file as per the SemVer versioning standard)
|
- A 3-byte version number (the major, minor and patch version numbers of the compiler that generated the file)
|
||||||
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
|
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
|
||||||
- The full commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
|
- The commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
|
||||||
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
|
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
|
||||||
- A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes
|
- A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes
|
||||||
|
|
||||||
### Line data section
|
## Debug information
|
||||||
|
|
||||||
The line data section contains information about each instruction in the code section and associates them
|
The following segments contain extra information and metadata about the compiled bytecode to aid debugging, but they may be missing
|
||||||
|
in release builds.
|
||||||
|
|
||||||
|
### Line data segment
|
||||||
|
|
||||||
|
The line data segment contains information about each instruction in the code segment and associates them
|
||||||
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
|
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
|
||||||
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
|
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
|
||||||
in this section can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
|
in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
|
||||||
below:
|
below:
|
||||||
```
|
```
|
||||||
[...]
|
[...]
|
||||||
|
@ -54,19 +57,43 @@ below:
|
||||||
[...]
|
[...]
|
||||||
```
|
```
|
||||||
|
|
||||||
### Constant section
|
### CFI segment
|
||||||
|
|
||||||
The constant section contains all the read-only values that the code will need at runtime, such as hardcoded
|
The CFI segment (where CFI stands for **C**all **F**rame **I**nformation), contains details about each function in
|
||||||
|
the original file. The segment's size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer).
|
||||||
|
The data
|
||||||
|
in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L41), which is quoted
|
||||||
|
below:
|
||||||
|
|
||||||
|
```
|
||||||
|
[...]
|
||||||
|
## cfi represents Call Frame Information and encodes the following information:
|
||||||
|
## - Function name
|
||||||
|
## - Stack bottom
|
||||||
|
## - Argument count
|
||||||
|
## The encoding for CFI data is the following:
|
||||||
|
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
|
||||||
|
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
|
||||||
|
## - Then, the frame's stack bottom is encoded as a 3 byte integer
|
||||||
|
## - After the frame's stack bottom follows the argument count as a 1 byte integer
|
||||||
|
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
||||||
|
## its size as a 2-byte integer
|
||||||
|
[...]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Constant segment
|
||||||
|
|
||||||
|
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
|
||||||
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
|
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
|
||||||
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
|
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
|
||||||
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
|
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
|
||||||
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
|
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
|
||||||
then be loaded by the appropriate `LoadInt32` instruction at runtime. The section's size is fixed and is encoded at
|
then be loaded by the appropriate `LoadInt32` instruction at runtime. The segment's size is fixed and is encoded at
|
||||||
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant section may be empty, although in
|
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant segment may be empty, although in
|
||||||
real-world scenarios it's unlikely that it would.
|
real-world scenarios likely won't.
|
||||||
|
|
||||||
### Code section
|
## Code segment
|
||||||
|
|
||||||
The code section contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
|
The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
|
||||||
and without modifications. The section's size is fixed and is encoded at the beginning as a sequence of 3 bytes
|
and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
|
||||||
(i.e. a single 24 bit integer).
|
(i.e. a single 24 bit integer).
|
|
@ -43,8 +43,8 @@ type
|
||||||
## - Stack bottom
|
## - Stack bottom
|
||||||
## - Argument count
|
## - Argument count
|
||||||
## The encoding for CFI data is the following:
|
## The encoding for CFI data is the following:
|
||||||
## - First, the position into the bytecode where the function begins is encoded
|
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
|
||||||
## - Second, the position into the bytecode where the function ends is encoded
|
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
|
||||||
## - Then, the frame's stack bottom is encoded as a 3 byte integer
|
## - Then, the frame's stack bottom is encoded as a 3 byte integer
|
||||||
## - After the frame's stack bottom follows the argument count as a 1 byte integer
|
## - After the frame's stack bottom follows the argument count as a 1 byte integer
|
||||||
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
||||||
|
|
Loading…
Reference in New Issue