Added info about CFI section and made minor changes to README

2022-05-25 11:36:12 +02:00 · 2022-05-25 11:36:12 +02:00 · be4c2500ac
parent 48d1c3fc8c
commit be4c2500ac
3 changed files with 58 additions and 31 deletions
--- a/README.md
+++ b/README.md
@ -52,25 +52,25 @@ In no particular order, here's a list of stuff that's done/to do (might be incom

 Toolchain:

-  - Tokenizer (with dynamic symbol table) [x]
-  - Parser (with support for custom operators, even builtins) [x]
-  - Compiler [ ]  (Work in Progress)
-  - VM [ ] (Work in Progress)
-  - Bytecode (de-)serializer [x]
-  - Static code debugger [x]
-  - Runtime debugger/inspection tool [ ]
+  - Tokenizer (with dynamic symbol table) -> Done
+  - Parser (with support for custom operators, even builtins) -> Done
+  - Compiler [ ]  -> Being written
+  - VM [ ] -> Being written
+  - Bytecode (de-)serializer -> Done
+  - Static code debugger [x] -> Done
+  - Runtime debugger/inspection tool -> TODO

 Type system:

-  - Custom types [ ]
-  - Intrinsics [x]
-  - Generics [ ] (Work in Progress)
-  - Function calls [ ] (Work in Progress)
+  - Custom types -> TODO
+  - Intrinsics -> Done
+  - Generics -> TODO
+  - Function calls -> WIP

 Misc:

-  - Pragmas [ ] (Work in Progress)
-  - Attribute resolution [ ]
+  - Pragmas -> TODO
+  - Attribute resolution -> TODO
  - ... More?

 ## The name
--- a/docs/bytecode.md
+++ b/docs/bytecode.md
@ -16,25 +16,28 @@ A peon bytecode dump contains:
 - Debugging information
 - File and version metadata 

-## Encoding
-
-### Header
+## File Headers

 A peon bytecode file starts with the header, which is structured as follows:

 - The literal string `PEON_BYTECODE`
- A 3-byte version number (the major, minor and patch versions of the compiler that generated the file as per the SemVer versioning standard)
+- A 3-byte version number (the major, minor and patch version numbers of the compiler that generated the file)
 - The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
- The full commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
+- The commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
 - An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
 - A 32-byte, hex-encoded SHA256 hash of the source file's content, used to track file changes

-### Line data section
+## Debug information

-The line data section contains information about each instruction in the code section and associates them
+The following segments contain extra information and metadata about the compiled bytecode to aid debugging, but they may be missing
+in release builds.
+
+### Line data segment
+
+The line data segment contains information about each instruction in the code segment and associates them
 1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's 
 size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
-in this section can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
+in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L28), which is quoted
 below:
 ```
 [...]
@ -54,19 +57,43 @@ below:
 [...]
 ```

-### Constant section
+### CFI segment

-The constant section contains all the read-only values that the code will need at runtime, such as hardcoded
+The CFI segment (where CFI stands for **C**all **F**rame **I**nformation), contains details about each function in
+the original file. The segment's size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer).
+The data
+in this segment can be decoded as explained in [this file](../src/frontend/meta/bytecode.nim#L41), which is quoted
+below:
+
+```
+[...]
+## cfi represents Call Frame Information and encodes the following information:
+## - Function name
+## - Stack bottom
+## - Argument count
+## The encoding for CFI data is the following:
+## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
+## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
+## - Then, the frame's stack bottom is encoded as a 3 byte integer
+## - After the frame's stack bottom follows the argument count as a 1 byte integer
+## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
+##   its size as a 2-byte integer
+[...]
+```
+
+## Constant segment
+
+The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
 variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
 the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
 them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto 
 the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
-then be loaded by the appropriate `LoadInt32` instruction at runtime. The section's size is fixed and is encoded at
-the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant section may be empty, although in
-real-world scenarios it's unlikely that it would.
+then be loaded by the appropriate `LoadInt32` instruction at runtime. The segment's size is fixed and is encoded at
+the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant segment may be empty, although in
+real-world scenarios likely won't.

-### Code section
+## Code segment

-The code section contains the linear sequence of bytecode instructions of a peon program. It is to be read directly 
-and without modifications. The section's size is fixed and is encoded at the beginning as a sequence of 3 bytes 
+The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly 
+and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes 
 (i.e. a single 24 bit integer).
--- a/src/frontend/meta/bytecode.nim
+++ b/src/frontend/meta/bytecode.nim
@ -43,8 +43,8 @@ type
        ## - Stack bottom
        ## - Argument count
        ## The encoding for CFI data is the following:
-        ## - First, the position into the bytecode where the function begins is encoded
-        ## - Second, the position into the bytecode where the function ends is encoded
+        ## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
+        ## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
        ## - Then, the frame's stack bottom is encoded as a 3 byte integer
        ## - After the frame's stack bottom follows the argument count as a 1 byte integer
        ## - Lastly, the function's name (optional) is encoded in ASCII, prepended with