Began deserializing the constants table and the code section of bytecode files. Minor fixes to debugger and bytecode.nim

This commit is contained in:
Nocturn9x 2021-12-17 17:54:22 +01:00
parent 195045e4f2
commit ed13304809
8 changed files with 164 additions and 42 deletions

View File

@ -30,7 +30,7 @@ __Note__: The conventions about number literals described in the document laying
## Compile-time type specifiers
To distinguish the different kinds of values that JAPL can represent at compile time, type specifiers are prepended to a given series of bytes to tell the deserializer what kind of object that specific sequence should deserialize into. It is important that each compile-time object specifies the size of its value in bytes (referred to as "size specifier" from now on, without quotes), after the type specifier. The following sections about object representation assume the appropriate type and size specifiers have been used and will therefore omit them to avoid repetition. Some types (such as singletons) do not need a size specifier as they're only one byte long: these cases are an exception rather than the rule and are explicitly marked as such in this document.
To distinguish the different kinds of values that JAPL can represent at compile time, type specifiers are prepended to a given series of bytes to tell the deserializer what kind of object that specific sequence should deserialize into. It is important that each compile-time object specifies the size of its value in bytes using a 3-byte (aka 24 bit) integer (referred to as "size specifier" from now on, without quotes), after the type specifier. The following sections about object representation assume the appropriate type and size specifiers have been used and will therefore omit them to avoid repetition. Some types (such as singletons) do not need a size specifier as they're only one byte long: these cases are an exception rather than the rule and are explicitly marked as such in this document.
Below a list of all type specifiers:
@ -39,15 +39,16 @@ Below a list of all type specifiers:
- `0xF` -> nil*
- `0xA` -> nan*
- `0xB` -> inf*
- `0x01` -> Number
- `0x02` -> String
- `0x03` -> List literal (An heterogeneous dynamic array)
- `0x04` -> Set literal (An heterogeneous and unordered dynamic array without duplicates. Mirrors the mathematical definition of a set)
- `0x05` -> Dictionary literal (An associative array, also known as mapping)
- `0x06` -> Tuple literal (An heterogeneous, static array)
- `0x07` -> Function declaration
- `0x08` -> Class declaration
- `0x09` -> Variable declaration. Note that constants are replaced during compilation with their corresponding literal value, therefore they are represented as literals in the constants section and are not compiled as variable declarations.
- `0x0` -> Identifier
- `0x1` -> Number
- `0x2` -> String
- `0x3` -> List literal (An heterogeneous dynamic array)
- `0x4` -> Set literal (An heterogeneous and unordered dynamic array without duplicates. Mirrors the mathematical definition of a set)
- `0x5` -> Dictionary literal (An associative array, also known as mapping)
- `0x6` -> Tuple literal (An heterogeneous, static array)
- `0x7` -> Function declaration
- `0x8` -> Class declaration
- `0x9` -> Variable declaration. Note that constants are replaced during compilation with their corresponding literal value, therefore they are represented as literals in the constants section and are not compiled as variable declarations.
- `0x10` -> Lambda declarations (aka anonymous functions)
@ -57,7 +58,7 @@ __Note__: The types whose name is followed by an asterisk require no size specif
### Numbers
For simplicity purposes, numbers in object files are serialized as strings of decimal digits and optionally a dot followed by 1 or more decimal digits (for floats). The number `2.718`, for example, would just be serialized as the string `"2.718"` (without quotes). JAPL supports scientific notation such as `2e3`, but numbers in this form are collapsed to their decimal representation before being written to a file, therefore `2e3` becomes `2000.0`. Other decimal number representations such as hexadecimal, binary and octal are also converted to base 10 during compilation.
For simplicity purposes, numbers in object files are serialized as strings of decimal digits and optionally a dot followed by 1 or more decimal digits (for floats). The number `2.718`, for example, would just be serialized as the string `"2.718"` (without quotes). JAPL supports scientific notation such as `2e3`, but numbers in this form are collapsed to their decimal representation before being written to a file, therefore `2e3` becomes `2000.0`. Other decimal number representations such as hexadecimal, binary and octal are also converted to base 10 during compilation (usually during the optimization process).
### Strings
@ -92,17 +93,18 @@ An object file starts with the headers, namely:
- A string representing the branch name of the git repo from which JAPL was compiled, prepended with its size represented as a single 8-bit unsigned integer. Due to this encoding the branch name can't be longer than 256 characters, which is a length deemed appropriate for this purpose
- A 40 bytes hexadecimal string, pinpointing the version of the compiler down to the exact commit hash in the JAPL repository, particularly useful when testing development versions
- An 8 byte (64 bit) UNIX timestamp (starting from the Unix Epoch of January 1st 1970 at 00:00), representing the date and time when the file was created
- A 32 bytes SHA256 checksum of the source file's contents, used to track file changes
- A 32 byte SHA256 checksum of the source file's contents, used to track file changes
### Constant section
This section of the file follows the headers and is meant to store all constants needed upon startup by the JAPL virtual machine. For example, the code `var x = 1;` would have the number one as a constant. Constants are just an ordered sequence of compile-time types as described in the sections above.
This section of the file follows the headers and is meant to store all constants needed upon startup by the JAPL virtual machine. For example, the code `var x = 1;` would have the number one as a constant. Constants are just an ordered sequence of compile-time types as described in the sections above. The constant section's end is marked with
the byte `0x59`.
### Code section
After the headers and the constant section follows the code section, which stores the actual bytecode instructions the compiler has emitted. They're encoded as a linear sequence of bytes.
After the headers and the constant section follows the code section, which stores the actual bytecode instructions the compiler has emitted. They're encoded as a linear sequence of bytes. The code section's size is fixed and is encoded as a 3-byte (24 bit) integer right after the constant section's end marker, limiting the maximum number of bytecode instructions per bytecode file to 16777216.
### Modules
When compiling source files, one bytecode file is produced per source file. These bytecode dumps are stored inside `~/.cache` under *nix systems and `C:\Windows\Temp` under windows systems. Since JAPL allows explicit visibility specifiers that alter the way namespaces are built at runtime (and, partially, resolved at compile-time) by selectively
(not) exporting symbols to the outside world, these directives need to be specified in the bytecode file
(not) exporting symbols to the outside world, these directives need to be specified in the bytecode file (TODO).

View File

@ -225,7 +225,7 @@ proc match(self: Lexer, what: string): bool =
proc createToken(self: Lexer, tokenType: TokenType) =
## Creates a token object and adds it to the token
## list
var tok: Token
var tok: Token = new(Token)
tok.kind = tokenType
tok.lexeme = self.source[self.start..<self.current]
tok.line = self.line

View File

@ -611,6 +611,8 @@ proc newClassDecl*(name: ASTNode, body: ASTNode,
proc `$`*(self: ASTNode): string =
if self == nil:
return "nil"
case self.kind:
of intExpr, floatExpr, hexExpr, binExpr, octExpr, strExpr, trueExpr, falseExpr, nanExpr, nilExpr, infExpr:
if self.kind in {trueExpr, falseExpr, nanExpr, nilExpr, infExpr}:

View File

@ -192,7 +192,7 @@ const argumentDoubleInstructions* = {PopN, }
# Jump instructions jump at relative or absolute bytecode offsets
const jumpInstructions* = {JumpIfFalse, JumpIfFalsePop, JumpForwards, JumpBackwards,
LongJumpIfFalse, LongJumpIfFalsePop, LongJumpForwards,
LongJumpBackwards}
LongJumpBackwards, JumpIfTrue, LongJumpIfTrue}
# Collection instructions push a built-in collection type onto the stack
const collectionInstructions* = {BuildList, BuildDict, BuildSet, BuildTuple}

View File

@ -70,7 +70,7 @@ type
EndOfFile
Token* = object
Token* = ref object
## A token object
kind*: TokenType
lexeme*: string
@ -78,4 +78,8 @@ type
pos*: tuple[start, stop: int]
proc `$`*(self: Token): string = &"Token(kind={self.kind}, lexeme={$(self.lexeme).escape()}, line={self.line}, pos=({self.pos.start}, {self.pos.stop}))"
proc `$`*(self: Token): string =
if self != nil:
result = &"Token(kind={self.kind}, lexeme={$(self.lexeme).escape()}, line={self.line}, pos=({self.pos.start}, {self.pos.stop}))"
else:
result = "nil"

View File

@ -14,8 +14,9 @@
import meta/ast
import meta/errors
import meta/bytecode
import meta/token
import ../config
import ../util/multibyte
import strformat
import strutils
@ -49,11 +50,13 @@ proc `$`*(self: Serialized): string =
proc error(self: Serializer, message: string) =
## Raises a formatted SerializationError exception
raise newException(SerializationError, &"A fatal error occurred while serializing '{self.filename}' -> {message}")
raise newException(SerializationError, &"A fatal error occurred while (de)serializing '{self.filename}' -> {message}")
proc initSerializer*(): Serializer =
proc initSerializer*(self: Serializer = nil): Serializer =
new(result)
if self != nil:
result = self
result.file = ""
result.filename = ""
result.chunk = nil
@ -84,6 +87,10 @@ proc bytesToInt(self: Serializer, input: array[8, byte]): int =
copyMem(result.addr, input.unsafeAddr, sizeof(int))
proc bytesToInt(self: Serializer, input: array[3, byte]): int =
copyMem(result.addr, input.unsafeAddr, sizeof(byte) * 3)
proc extend[T](s: var seq[T], a: openarray[T]) =
## Extends s with the elements of a
for e in a:
@ -105,12 +112,13 @@ proc writeHeaders(self: Serializer, stream: var seq[byte], file: string) =
stream.extend(self.toBytes(computeSHA256(file)))
proc writeConstants(self: Serializer, chunk: Chunk, stream: var seq[byte]) =
for constant in chunk.consts:
proc writeConstants(self: Serializer, stream: var seq[byte]) =
## Writes the constants table in-place into the given stream
for constant in self.chunk.consts:
case constant.kind:
of intExpr, floatExpr:
stream.add(0x1)
stream.add(byte(len(constant.token.lexeme)))
stream.extend(len(constant.token.lexeme).toTriple())
stream.extend(self.toBytes(constant.token.lexeme))
of strExpr:
stream.add(0x2)
@ -128,12 +136,11 @@ proc writeConstants(self: Serializer, chunk: Chunk, stream: var seq[byte]) =
else:
strip = 2
stream.add(0x0)
stream.add(byte(len(constant.token.lexeme) - offset)) # Removes the quotes from the length count as they're not written
stream.extend((len(constant.token.lexeme) - offset).toTriple()) # Removes the quotes from the length count as they're not written
stream.add(self.toBytes(constant.token.lexeme[offset..^2]))
of identExpr:
stream.add(0x2)
stream.add(0x0)
stream.add(byte(len(constant.token.lexeme)))
stream.extend(len(constant.token.lexeme).toTriple())
stream.add(self.toBytes(constant.token.lexeme))
of trueExpr:
stream.add(0xC)
@ -147,12 +154,104 @@ proc writeConstants(self: Serializer, chunk: Chunk, stream: var seq[byte]) =
stream.add(0xB)
else:
self.error(&"unknown constant kind in chunk table ({constant.kind})")
stream.add(0x59) # End marker
proc writeCode(self: Serializer, chunk: Chunk, stream: var seq[byte]) =
proc readConstants(self: Serializer, stream: seq[byte]): int =
## Reads the constant table from the given stream and
## adds each constant to the chunk object (note: most compile-time
## information such as the original token objects and line info is lost when
## serializing the data, so those fields are set to nil or some default
## value). Returns the number of bytes that were processed in the stream
var stream = stream
var count: int = 0
while true:
case stream[0]:
of 0x59:
inc(count)
break
of 0x2:
stream = stream[1..^1]
let size = self.bytesToInt([stream[0], stream[1], stream[2]])
stream = stream[3..^1]
var s = newStrExpr(Token(lexeme: ""))
case stream[0]:
of 0x0:
discard
of 0x1:
s.token.lexeme.add("b")
of 0x2:
s.token.lexeme.add("f")
else:
self.error(&"unknown string modifier in chunk table (0x{stream[0].toHex()}")
stream = stream[1..^1]
s.token.lexeme.add("\"")
s.token.lexeme.add(stream[0..<size].join(""))
s.token.lexeme.add("\"")
inc(count, size + 5)
of 0x1:
stream = stream[1..^1]
inc(count)
let size = self.bytesToInt([stream[0], stream[1], stream[2]])
stream = stream[3..^1]
inc(count, 3)
var tok: Token = new(Token)
tok.lexeme = self.bytesToString(stream[0..<size])
if "." in tok.lexeme:
tok.kind = Float
self.chunk.consts.add(newFloatExpr(tok))
else:
tok.kind = Integer
self.chunk.consts.add(newIntExpr(tok))
stream = stream[size..^1]
inc(count, size)
of 0x0:
stream = stream[1..^1]
let size = self.bytesToInt([stream[0], stream[1], stream[2]])
stream = stream[3..^1]
discard self.chunk.addConstant(newIdentExpr(Token(lexeme: self.bytesToString(stream[0..<size]))))
inc(count, size + 4)
of 0xC:
discard self.chunk.addConstant(newTrueExpr(nil))
stream = stream[1..^1]
inc(count)
of 0xD:
discard self.chunk.addConstant(newFalseExpr(nil))
stream = stream[1..^1]
inc(count)
of 0xF:
discard self.chunk.addConstant(newNilExpr(nil))
stream = stream[1..^1]
inc(count)
of 0xA:
discard self.chunk.addConstant(newNaNExpr(nil))
stream = stream[1..^1]
inc(count)
of 0xB:
discard self.chunk.addConstant(newInfExpr(nil))
stream = stream[1..^1]
inc(count)
else:
self.error(&"unknown constant kind in chunk table (0x{stream[0].toHex()})")
result = count
proc writeCode(self: Serializer, stream: var seq[byte]) =
## Writes the bytecode from the given chunk to the given source
## stream
stream.extend(chunk.code)
stream.extend(self.chunk.code.len.toTriple())
stream.extend(self.chunk.code)
proc readCode(self: Serializer, stream: seq[byte]): int =
## Reads the bytecode from a given stream and writes
## it into the given chunk
let size = [stream[0], stream[1], stream[2]].fromTriple()
var stream = stream[3..^1]
for i in countup(0, int(size) - 1):
self.chunk.code.add(stream[i])
assert len(self.chunk.code) == int(size)
return int(size)
proc dumpBytes*(self: Serializer, chunk: Chunk, file, filename: string): seq[byte] =
@ -162,15 +261,17 @@ proc dumpBytes*(self: Serializer, chunk: Chunk, file, filename: string): seq[byt
self.filename = filename
self.chunk = chunk
self.writeHeaders(result, self.file)
self.writeConstants(chunk, result)
self.writeCode(chunk, result)
self.writeConstants(result)
self.writeCode(result)
proc loadBytes*(self: Serializer, stream: seq[byte]): Serialized =
## Loads the result from dumpBytes to a Serializer object
## for use in the VM or for inspection
discard self.initSerializer()
new(result)
result.chunk = newChunk()
self.chunk = result.chunk
var stream = stream
try:
if stream[0..<len(BYTECODE_MARKER)] != self.toBytes(BYTECODE_MARKER):
@ -187,10 +288,14 @@ proc loadBytes*(self: Serializer, stream: seq[byte]): Serialized =
result.compileDate = self.bytesToInt([stream[0], stream[1], stream[2], stream[3], stream[4], stream[5], stream[6], stream[7]])
stream = stream[8..^1]
result.fileHash = self.bytesToString(stream[0..<32]).toHex().toLowerAscii()
result.chunk = newChunk()
stream = stream[32..^1]
stream = stream[self.readConstants(stream)..^1]
stream = stream[self.readCode(stream)..^1]
except IndexDefect:
self.error("truncated bytecode file")
except AssertionDefect:
self.error("corrupted bytecode file")

View File

@ -106,15 +106,24 @@ proc main() =
serialized = serializer.loadBytes(serializedRaw)
echo "Deserialization step:"
echo &"\t\t- File hash: {serialized.fileHash} (matches: {computeSHA256(source).toHex().toLowerAscii() == serialized.fileHash})"
echo &"\t\t- JAPL version: {serialized.japlVer.major}.{serialized.japlVer.minor}.{serialized.japlVer.patch} (commit {serialized.commitHash[0..8]} on branch {serialized.japlBranch})"
stdout.write("\t\t")
echo &"\t- File hash: {serialized.fileHash} (matches: {computeSHA256(source).toHex().toLowerAscii() == serialized.fileHash})"
echo &"\t- JAPL version: {serialized.japlVer.major}.{serialized.japlVer.minor}.{serialized.japlVer.patch} (commit {serialized.commitHash[0..8]} on branch {serialized.japlBranch})"
stdout.write("\t")
echo &"""- Compilation date & time: {fromUnix(serialized.compileDate).format("d/M/yyyy HH:mm:ss")}"""
stdout.write(&"\t- Reconstructed constants table: [")
for i, e in serialized.chunk.consts:
stdout.write(e)
if i < len(serialized.chunk.consts) - 1:
stdout.write(", ")
stdout.write("]\n")
stdout.write(&"\t- Reconstructed bytecode: [")
for i, e in serialized.chunk.code:
stdout.write($e)
if i < len(serialized.chunk.code) - 1:
stdout.write(", ")
stdout.write("]\n")
except:
raise
echo &"A Nim runtime exception occurred: {getCurrentExceptionMsg()}"
continue
when isMainModule:

View File

@ -109,9 +109,9 @@ proc jumpInstruction(instruction: OpCode, chunk: Chunk, offset: int): int =
## Debugs jumps
var jump: int
case instruction:
of JumpIfFalse, JumpIfFalsePop, JumpForwards, JumpBackwards:
of JumpIfFalse, JumpIfTrue, JumpIfFalsePop, JumpForwards, JumpBackwards:
jump = [chunk.code[offset + 1], chunk.code[offset + 2]].fromDouble().int()
of LongJumpIfFalse, LongJumpIfFalsePop, LongJumpForwards, LongJumpBackwards:
of LongJumpIfFalse, LongJumpIfTrue, LongJumpIfFalsePop, LongJumpForwards, LongJumpBackwards:
jump = [chunk.code[offset + 1], chunk.code[offset + 2], chunk.code[offset + 3]].fromTriple().int()
else:
discard # Unreachable