Added info about CFI section and made minor changes to README
This commit is contained in:
parent
48d1c3fc8c
commit
be4c2500ac
26
README.md
26
README.md
|
@ -52,25 +52,25 @@ In no particular order, here's a list of stuff that's done/to do (might be incom
|
|||
|
||||
Toolchain:
|
||||
|
||||
- Tokenizer (with dynamic symbol table) [x]
|
||||
- Parser (with support for custom operators, even builtins) [x]
|
||||
- Compiler [ ] (Work in Progress)
|
||||
- VM [ ] (Work in Progress)
|
||||
- Bytecode (de-)serializer [x]
|
||||
- Static code debugger [x]
|
||||
- Runtime debugger/inspection tool [ ]
|
||||
- Tokenizer (with dynamic symbol table) -> Done
|
||||
- Parser (with support for custom operators, even builtins) -> Done
|
||||
- Compiler [ ] -> Being written
|
||||
- VM [ ] -> Being written
|
||||
- Bytecode (de-)serializer -> Done
|
||||
- Static code debugger [x] -> Done
|
||||
- Runtime debugger/inspection tool -> TODO
|
||||
|
||||
Type system:
|
||||
|
||||
- Custom types [ ]
|
||||
- Intrinsics [x]
|
||||
- Generics [ ] (Work in Progress)
|
||||
- Function calls [ ] (Work in Progress)
|
||||
- Custom types -> TODO
|
||||
- Intrinsics -> Done
|
||||
- Generics -> TODO
|
||||
- Function calls -> WIP
|
||||
|
||||
Misc:
|
||||
|
||||
- Pragmas [ ] (Work in Progress)
|
||||
- Attribute resolution [ ]
|
||||
- Pragmas -> TODO
|
||||
- Attribute resolution -> TODO
|
||||
- ... More?
|
||||
|
||||
## The name
|
||||
|
|
|
@ -16,25 +16,28 @@ A peon bytecode dump contains:
|
|||
- Debugging information
|
||||
- File and version metadata
|
||||
|
||||
## Encoding
|
||||
|
||||
### Header
|
||||
## File Headers
|
||||
|
||||
A peon bytecode file starts with the header, which is structured as follows:
|
||||
|
||||
- The literal string `PEON_BYTECODE`
|
||||
- A 3-byte version number (the major, minor and patch versions of the compiler that generated the file as per the SemVer versioning standard)
|
||||
- A 3-byte version number (the major, minor and patch version numbers of the compiler that generated the file)
|
||||
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
|
||||
- The full commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
|
||||
- The commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
|
||||
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
|
||||
- A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes
|
||||
|
||||
### Line data section
|
||||
## Debug information
|
||||
|
||||
The line data section contains information about each instruction in the code section and associates them
|
||||
The following segments contain extra information and metadata about the compiled bytecode to aid debugging, but they may be missing
|
||||
in release builds.
|
||||
|
||||
### Line data segment
|
||||
|
||||
The line data segment contains information about each instruction in the code segment and associates them
|
||||
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
|
||||
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
|
||||
in this section can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
|
||||
in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
|
||||
below:
|
||||
```
|
||||
[...]
|
||||
|
@ -54,19 +57,43 @@ below:
|
|||
[...]
|
||||
```
|
||||
|
||||
### Constant section
|
||||
### CFI segment
|
||||
|
||||
The constant section contains all the read-only values that the code will need at runtime, such as hardcoded
|
||||
The CFI segment (where CFI stands for **C**all **F**rame **I**nformation), contains details about each function in
|
||||
the original file. The segment's size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer).
|
||||
The data
|
||||
in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L41), which is quoted
|
||||
below:
|
||||
|
||||
```
|
||||
[...]
|
||||
## cfi represents Call Frame Information and encodes the following information:
|
||||
## - Function name
|
||||
## - Stack bottom
|
||||
## - Argument count
|
||||
## The encoding for CFI data is the following:
|
||||
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
|
||||
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
|
||||
## - Then, the frame's stack bottom is encoded as a 3 byte integer
|
||||
## - After the frame's stack bottom follows the argument count as a 1 byte integer
|
||||
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
||||
## its size as a 2-byte integer
|
||||
[...]
|
||||
```
|
||||
|
||||
## Constant segment
|
||||
|
||||
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
|
||||
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
|
||||
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
|
||||
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
|
||||
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
|
||||
then be loaded by the appropriate `LoadInt32` instruction at runtime. The section's size is fixed and is encoded at
|
||||
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant section may be empty, although in
|
||||
real-world scenarios it's unlikely that it would.
|
||||
then be loaded by the appropriate `LoadInt32` instruction at runtime. The segment's size is fixed and is encoded at
|
||||
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant segment may be empty, although in
|
||||
real-world scenarios likely won't.
|
||||
|
||||
### Code section
|
||||
## Code segment
|
||||
|
||||
The code section contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
|
||||
and without modifications. The section's size is fixed and is encoded at the beginning as a sequence of 3 bytes
|
||||
The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
|
||||
and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
|
||||
(i.e. a single 24 bit integer).
|
|
@ -43,8 +43,8 @@ type
|
|||
## - Stack bottom
|
||||
## - Argument count
|
||||
## The encoding for CFI data is the following:
|
||||
## - First, the position into the bytecode where the function begins is encoded
|
||||
## - Second, the position into the bytecode where the function ends is encoded
|
||||
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
|
||||
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
|
||||
## - Then, the frame's stack bottom is encoded as a 3 byte integer
|
||||
## - After the frame's stack bottom follows the argument count as a 1 byte integer
|
||||
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
||||
|
|
Loading…
Reference in New Issue