Compare commits

...

2 Commits

10 changed files with 239 additions and 109 deletions

View File

@ -1,7 +1,8 @@
# Peon - Bytecode Specification # Peon - Bytecode Specification
This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
other file-like objects. other file-like objects. Note that the segments in a bytecode dump appear in the order they are listed
in this document.
## Code Structure ## Code Structure
@ -9,12 +10,12 @@ A peon program is compiled into a tightly packed sequence of bytes that contain
the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity. bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.
A peon bytecode dump contains: A peon bytecode file contains the following:
- Constants - Constants
- The bytecode itself - The program's code
- Debugging information - Debugging information (file and version metadata, module info. Optional)
- File and version metadata
## File Headers ## File Headers
@ -34,7 +35,7 @@ in release builds.
### Line data segment ### Line data segment
The line data segment contains information about each instruction in the code segment and associates them The line data segment contains information about each instruction in the code segment and associates them
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's 1:1 with a line number in the original source file for easier debugging using run-length encoding. The segment's
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
in this segment can be decoded as explained in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L29), which is quoted in this segment can be decoded as explained in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L29), which is quoted
below: below:
@ -57,7 +58,7 @@ below:
This segment contains details about each function in the original file. The segment's size is fixed and is encoded at the This segment contains details about each function in the original file. The segment's size is fixed and is encoded at the
beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data in this segment can be decoded as explained beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data in this segment can be decoded as explained
in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L39), which is quoted below: in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L39), which is quoted below:
``` ```
[...] [...]
@ -74,6 +75,26 @@ in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L39), whic
[...] [...]
``` ```
### Modules segment
This segment contains details about the modules that make up the original source code which produced a given bytecode dump.
The data in this segment can be decoded as explained in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L49), which is quoted below:
```
[...]
## modules contains information about all the peon modules that the compiler has encountered,
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
## is contained within a single binary blob. While this simplifies the implementation and makes
## bytecode files entirely "self-hosted", it also means that the original module information is
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
[...]
```
## Constant segment ## Constant segment
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
@ -87,6 +108,6 @@ real-world scenarios it likely won't be.
## Code segment ## Code segment
The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly The code segment contains the linear sequence of bytecode instructions of a peon program to be fed directly to
and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes peon's virtual machine. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
(i.e. a single 24 bit integer). All the instructions are documented [here](../src/frontend/compiler/targgets/bytecode/opcodes.nim) (i.e. a single 24 bit integer). All the instructions are documented [here](../src/frontend/compiler/targgets/bytecode/opcodes.nim)

View File

@ -68,7 +68,8 @@ type
## this system and is not handled ## this system and is not handled
## manually by the VM ## manually by the VM
bytesAllocated: tuple[total, current: int] bytesAllocated: tuple[total, current: int]
cycles: int when debugGC or debugAlloc:
cycles: int
nextGC: int nextGC: int
pointers: HashSet[uint64] pointers: HashSet[uint64]
PeonVM* = object PeonVM* = object
@ -93,9 +94,10 @@ type
frames: seq[uint64] # Stores the bottom of stack frames frames: seq[uint64] # Stores the bottom of stack frames
results: seq[uint64] # Stores function return values results: seq[uint64] # Stores function return values
gc: PeonGC # A reference to the VM's garbage collector gc: PeonGC # A reference to the VM's garbage collector
breakpoints: seq[uint64] # Breakpoints where we call our debugger when debugVM:
debugNext: bool # Whether to debug the next instruction breakpoints: seq[uint64] # Breakpoints where we call our debugger
lastDebugCommand: string # The last debugging command input by the user debugNext: bool # Whether to debug the next instruction
lastDebugCommand: string # The last debugging command input by the user
# Implementation of peon's memory manager # Implementation of peon's memory manager
@ -105,7 +107,8 @@ proc newPeonGC*: PeonGC =
## garbage collector ## garbage collector
result.bytesAllocated = (0, 0) result.bytesAllocated = (0, 0)
result.nextGC = FirstGC result.nextGC = FirstGC
result.cycles = 0 when debugGC or debugAlloc:
result.cycles = 0
proc collect*(self: var PeonVM) proc collect*(self: var PeonVM)
@ -214,6 +217,16 @@ proc markRoots(self: var PeonVM): HashSet[ptr HeapObject] =
# will mistakenly assume the object to be reachable, potentially # will mistakenly assume the object to be reachable, potentially
# leading to a nasty memory leak. Let's just hope a 48+ bit address # leading to a nasty memory leak. Let's just hope a 48+ bit address
# space makes this occurrence rare enough not to be a problem # space makes this occurrence rare enough not to be a problem
# handles a single type (uint64), while Lox has a stack
# of heap-allocated structs (which is convenient, but slow).
# What we do instead is store all pointers allocated by us
# in a hash set and then check if any source of roots contained
# any of the integer values that we're keeping track of. Note
# that this means that if a primitive object's value happens to
# collide with an active pointer, the GC will mistakenly assume
# the object to be reachable (potentially leading to a nasty
# memory leak). Hopefully, in a 64-bit address space, this
# occurrence is rare enough for us to ignore
var result = initHashSet[uint64](self.gc.pointers.len()) var result = initHashSet[uint64](self.gc.pointers.len())
for obj in self.calls: for obj in self.calls:
if obj in self.gc.pointers: if obj in self.gc.pointers:
@ -285,7 +298,6 @@ proc sweep(self: var PeonVM) =
## during the mark phase. ## during the mark phase.
when debugGC: when debugGC:
echo "DEBUG - GC: Beginning sweeping phase" echo "DEBUG - GC: Beginning sweeping phase"
when debugGC:
var count = 0 var count = 0
var current: ptr HeapObject var current: ptr HeapObject
var freed: HashSet[uint64] var freed: HashSet[uint64]
@ -1050,10 +1062,11 @@ proc run*(self: var PeonVM, chunk: Chunk, breakpoints: seq[uint64] = @[], repl:
self.frames = @[] self.frames = @[]
self.calls = @[] self.calls = @[]
self.operands = @[] self.operands = @[]
self.breakpoints = breakpoints
self.results = @[] self.results = @[]
self.ip = 0 self.ip = 0
self.lastDebugCommand = "" when debugVM:
self.breakpoints = breakpoints
self.lastDebugCommand = ""
try: try:
self.dispatch() self.dispatch()
except NilAccessDefect: except NilAccessDefect:

View File

@ -134,7 +134,7 @@ type
node*: Declaration node*: Declaration
# Who is this name exported to? (Only makes sense if isPrivate # Who is this name exported to? (Only makes sense if isPrivate
# equals false) # equals false)
exportedTo*: HashSet[Name] exportedTo*: HashSet[string]
# Has the compiler generated this name internally or # Has the compiler generated this name internally or
# does it come from user code? # does it come from user code?
isReal*: bool isReal*: bool
@ -212,7 +212,7 @@ type
# The module importing us, if any # The module importing us, if any
parentModule*: Name parentModule*: Name
# Currently imported modules # Currently imported modules
modules*: HashSet[Name] modules*: HashSet[string]
TypedNode* = ref object TypedNode* = ref object
## A wapper for AST nodes ## A wapper for AST nodes
@ -354,7 +354,7 @@ proc resolve*(self: Compiler, name: string): Name =
# module, so we definitely can't # module, so we definitely can't
# use it # use it
continue continue
elif self.currentModule in obj.exportedTo: elif self.currentModule.path in obj.exportedTo:
# The name is public in its owner # The name is public in its owner
# module and said module has explicitly # module and said module has explicitly
# exported it to us: we can use it # exported it to us: we can use it
@ -713,7 +713,7 @@ method findByName*(self: Compiler, name: string): seq[Name] =
for obj in reversed(self.names): for obj in reversed(self.names):
if obj.ident.token.lexeme == name: if obj.ident.token.lexeme == name:
if obj.owner.path != self.currentModule.path: if obj.owner.path != self.currentModule.path:
if obj.isPrivate or self.currentModule notin obj.exportedTo: if obj.isPrivate or self.currentModule.path notin obj.exportedTo:
continue continue
result.add(obj) result.add(obj)
@ -727,11 +727,13 @@ method findInModule*(self: Compiler, name: string, module: Name): seq[Name] =
## the current one or not ## the current one or not
if name == "": if name == "":
for obj in reversed(self.names): for obj in reversed(self.names):
if not obj.isPrivate and obj.owner == module: if obj.owner.isNil():
continue
if not obj.isPrivate and obj.owner.path == module.path:
result.add(obj) result.add(obj)
else: else:
for obj in self.findInModule("", module): for obj in self.findInModule("", module):
if obj.ident.token.lexeme == name and self.currentModule in obj.exportedTo: if obj.ident.token.lexeme == name and self.currentModule.path in obj.exportedTo:
result.add(obj) result.add(obj)
@ -1034,7 +1036,7 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
break break
if name.ident.token.lexeme != declaredName: if name.ident.token.lexeme != declaredName:
continue continue
if name.owner != n.owner and (name.isPrivate or n.owner notin name.exportedTo): if name.owner != n.owner and (name.isPrivate or n.owner.path notin name.exportedTo):
continue continue
if name.kind in [NameKind.Var, NameKind.Module, NameKind.CustomType, NameKind.Enum]: if name.kind in [NameKind.Var, NameKind.Module, NameKind.CustomType, NameKind.Enum]:
if name.depth < n.depth: if name.depth < n.depth:

View File

@ -124,11 +124,11 @@ type
of Reference: of Reference:
# A managed reference # A managed reference
nullable*: bool # Is null a valid value for this type? (false by default) nullable*: bool # Is null a valid value for this type? (false by default)
value*: Type # The type the reference points to value*: TypedNode # The type the reference points to
of Pointer: of Pointer:
# An unmanaged reference. Much # An unmanaged reference. Much
# like a raw pointer in C # like a raw pointer in C
data*: Type # The type we point to data*: TypedNode # The type we point to
of TypeDecl: of TypeDecl:
# A user-defined type # A user-defined type
fields*: seq[TypedArgument] # List of fields in the object. May be empty fields*: seq[TypedArgument] # List of fields in the object. May be empty
@ -317,17 +317,17 @@ proc step*(self: Compiler): ASTNode {.inline.} =
# Some forward declarations # Some forward declarations
proc compareUnions*(self: Compiler, a, b: seq[tuple[match: bool, kind: Type]]): bool proc compareUnions*(self: Compiler, a, b: seq[tuple[match: bool, kind: Type]]): bool
proc expression*(self: Compiler, node: Expression, compile: bool = true): Type {.discardable.} = nil proc expression*(self: Compiler, node: Expression, compile: bool = true): TypedNode {.discardable.} = nil
proc identifier*(self: Compiler, node: IdentExpr, name: Name = nil, compile: bool = true, strict: bool = true): Type {.discardable.} = nil proc identifier*(self: Compiler, node: IdentExpr, name: Name = nil, compile: bool = true, strict: bool = true): TypedNode {.discardable.} = nil
proc call*(self: Compiler, node: CallExpr, compile: bool = true): Type {.discardable.} = nil proc call*(self: Compiler, node: CallExpr, compile: bool = true): TypedNode {.discardable.} = nil
proc getItemExpr*(self: Compiler, node: GetItemExpr, compile: bool = true, matching: Type = nil): Type {.discardable.} = nil proc getItemExpr*(self: Compiler, node: GetItemExpr, compile: bool = true, matching: Type = nil): TypedNode {.discardable.} = nil
proc unary*(self: Compiler, node: UnaryExpr, compile: bool = true): Type {.discardable.} = nil proc unary*(self: Compiler, node: UnaryExpr, compile: bool = true): TypedNode {.discardable.} = nil
proc binary*(self: Compiler, node: BinaryExpr, compile: bool = true): Type {.discardable.} = nil proc binary*(self: Compiler, node: BinaryExpr, compile: bool = true): TypedNode {.discardable.} = nil
proc lambdaExpr*(self: Compiler, node: LambdaExpr, compile: bool = true): Type {.discardable.} = nil proc lambdaExpr*(self: Compiler, node: LambdaExpr, compile: bool = true): TypedNode {.discardable.} = nil
proc literal*(self: Compiler, node: ASTNode, compile: bool = true): Type {.discardable.} = nil proc literal*(self: Compiler, node: ASTNode, compile: bool = true): TypedNode {.discardable.} = nil
proc infer*(self: Compiler, node: LiteralExpr): Type proc infer*(self: Compiler, node: LiteralExpr): TypedNode
proc infer*(self: Compiler, node: Expression): Type proc infer*(self: Compiler, node: Expression): TypedNode
proc inferOrError*(self: Compiler, node: Expression): Type proc inferOrError*(self: Compiler, node: Expression): TypedNode
proc findByName*(self: Compiler, name: string): seq[Name] proc findByName*(self: Compiler, name: string): seq[Name]
proc findInModule*(self: Compiler, name: string, module: Name): seq[Name] proc findInModule*(self: Compiler, name: string, module: Name): seq[Name]
proc findByType*(self: Compiler, name: string, kind: Type): seq[Name] proc findByType*(self: Compiler, name: string, kind: Type): seq[Name]
@ -420,7 +420,7 @@ proc compare*(self: Compiler, a, b: Type): bool =
# a and b are of either of the two # a and b are of either of the two
# types in this branch, so we just need # types in this branch, so we just need
# to compare their values # to compare their values
return self.compare(a.value, b.value) return self.compare(a.value.value, b.value.value)
of Function: of Function:
# Functions are a bit trickier to compare # Functions are a bit trickier to compare
if a.arguments.len() != b.arguments.len(): if a.arguments.len() != b.arguments.len():
@ -569,7 +569,7 @@ proc toIntrinsic*(name: string): Type =
return Type(kind: String) return Type(kind: String)
proc infer*(self: Compiler, node: LiteralExpr): Type = proc infer*(self: Compiler, node: LiteralExpr): TypedNode =
## Infers the type of a given literal expression ## Infers the type of a given literal expression
if node.isNil(): if node.isNil():
return nil return nil
@ -577,32 +577,32 @@ proc infer*(self: Compiler, node: LiteralExpr): Type =
of intExpr, binExpr, octExpr, hexExpr: of intExpr, binExpr, octExpr, hexExpr:
let size = node.token.lexeme.split("'") let size = node.token.lexeme.split("'")
if size.len() == 1: if size.len() == 1:
return Type(kind: Int64) return TypedNode(node: node, value: Type(kind: Int64))
let typ = size[1].toIntrinsic() let typ = size[1].toIntrinsic()
if not self.compare(typ, nil): if not self.compare(typ, nil):
return typ return TypedNode(node: node, value: typ)
else: else:
self.error(&"invalid type specifier '{size[1]}' for int", node) self.error(&"invalid type specifier '{size[1]}' for int", node)
of floatExpr: of floatExpr:
let size = node.token.lexeme.split("'") let size = node.token.lexeme.split("'")
if size.len() == 1: if size.len() == 1:
return Type(kind: Float64) return TypedNode(node: node, value: Type(kind: Float64))
let typ = size[1].toIntrinsic() let typ = size[1].toIntrinsic()
if not typ.isNil(): if not typ.isNil():
return typ return TypedNode(node: node, value: typ)
else: else:
self.error(&"invalid type specifier '{size[1]}' for float", node) self.error(&"invalid type specifier '{size[1]}' for float", node)
of trueExpr: of trueExpr:
return Type(kind: Bool) return TypedNode(node: node, value: Type(kind: Bool))
of falseExpr: of falseExpr:
return Type(kind: Bool) return TypedNode(node: node, value: Type(kind: Bool))
of strExpr: of strExpr:
return Type(kind: String) return TypedNode(node: node, value: Type(kind: String))
else: else:
discard # Unreachable discard # Unreachable
proc infer*(self: Compiler, node: Expression): Type = proc infer*(self: Compiler, node: Expression): TypedNode =
## Infers the type of a given expression and ## Infers the type of a given expression and
## returns it ## returns it
if node.isNil(): if node.isNil():
@ -621,9 +621,9 @@ proc infer*(self: Compiler, node: Expression): Type =
of NodeKind.callExpr: of NodeKind.callExpr:
result = self.call(CallExpr(node), compile=false) result = self.call(CallExpr(node), compile=false)
of NodeKind.refExpr: of NodeKind.refExpr:
result = Type(kind: Reference, value: self.infer(Ref(node).value)) result = TypedNode(node: node, value: Type(kind: Reference, value: self.infer(Ref(node).value)))
of NodeKind.ptrExpr: of NodeKind.ptrExpr:
result = Type(kind: Pointer, data: self.infer(Ptr(node).value)) result = TypedNode(node: node, value: Type(kind: Pointer, data: self.infer(Ptr(node).value)))
of NodeKind.groupingExpr: of NodeKind.groupingExpr:
result = self.infer(GroupingExpr(node).expression) result = self.infer(GroupingExpr(node).expression)
of NodeKind.getItemExpr: of NodeKind.getItemExpr:
@ -634,7 +634,7 @@ proc infer*(self: Compiler, node: Expression): Type =
discard # TODO discard # TODO
proc inferOrError*(self: Compiler, node: Expression): Type = proc inferOrError*(self: Compiler, node: Expression): TypedNode =
## Attempts to infer the type of ## Attempts to infer the type of
## the given expression and raises an ## the given expression and raises an
## error if it fails ## error if it fails
@ -648,16 +648,16 @@ proc stringify*(self: Compiler, typ: Type): string =
## type object ## type object
if typ.isNil(): if typ.isNil():
return "nil" return "nil"
case typ.value.kind: case typ.kind:
of Int8, UInt8, Int16, UInt16, Int32, of Int8, UInt8, Int16, UInt16, Int32,
UInt32, Int64, UInt64, Float32, Float64, UInt32, Int64, UInt64, Float32, Float64,
Char, Byte, String, Nil, TypeKind.Nan, Bool, Char, Byte, String, Nil, TypeKind.Nan, Bool,
TypeKind.Inf, Auto: TypeKind.Inf, Auto:
result &= ($typ.value.kind).toLowerAscii() result &= ($typ.kind).toLowerAscii()
of Pointer: of Pointer:
result &= &"ptr {self.stringify(typ.value)}" result &= &"ptr {self.stringify(typ)}"
of Reference: of Reference:
result &= &"ref {self.stringify(typ.value)}" result &= &"ref {self.stringify(typ)}"
of Any: of Any:
return "any" return "any"
of Union: of Union:
@ -770,9 +770,9 @@ proc check*(self: Compiler, term: Expression, kind: Type) {.inline.} =
## Raises an error if appropriate and returns ## Raises an error if appropriate and returns
## otherwise ## otherwise
let k = self.inferOrError(term) let k = self.inferOrError(term)
if not self.compare(k, kind): if not self.compare(k.value, kind):
self.error(&"expecting value of type {self.stringify(kind)}, got {self.stringify(k)}", term) self.error(&"expecting value of type {self.stringify(kind)}, got {self.stringify(k)}", term)
elif k.kind == Any and kind.kind != Any: elif k.value.kind == Any and kind.kind != Any:
self.error(&"any is not a valid type in this context") self.error(&"any is not a valid type in this context")
@ -857,7 +857,7 @@ proc unpackGenerics*(self: Compiler, condition: Expression, list: var seq[tuple[
## Recursively unpacks a type constraint in a generic type ## Recursively unpacks a type constraint in a generic type
case condition.kind: case condition.kind:
of identExpr: of identExpr:
list.add((accept, self.inferOrError(condition))) list.add((accept, self.inferOrError(condition).value))
if list[^1].kind.kind == Auto: if list[^1].kind.kind == Auto:
self.error("automatic types cannot be used within generics", condition) self.error("automatic types cannot be used within generics", condition)
of binaryExpr: of binaryExpr:
@ -883,7 +883,7 @@ proc unpackUnion*(self: Compiler, condition: Expression, list: var seq[tuple[mat
## Recursively unpacks a type union ## Recursively unpacks a type union
case condition.kind: case condition.kind:
of identExpr: of identExpr:
list.add((accept, self.inferOrError(condition))) list.add((accept, self.inferOrError(condition).value))
of binaryExpr: of binaryExpr:
let condition = BinaryExpr(condition) let condition = BinaryExpr(condition)
case condition.operator.lexeme: case condition.operator.lexeme:
@ -966,13 +966,13 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
n.isGeneric = true n.isGeneric = true
var typ: Type var typ: Type
for argument in node.arguments: for argument in node.arguments:
typ = self.infer(argument.valueType) typ = self.infer(argument.valueType).value
if not typ.isNil() and typ.kind == Auto: if not typ.isNil() and typ.kind == Auto:
n.obj.value.isAuto = true n.obj.value.isAuto = true
if n.isGeneric: if n.isGeneric:
self.error("automatic types cannot be used within generics", argument.valueType) self.error("automatic types cannot be used within generics", argument.valueType)
break break
typ = self.infer(node.returnType) typ = self.infer(node.returnType).value
if not typ.isNil() and typ.kind == Auto: if not typ.isNil() and typ.kind == Auto:
n.obj.value.isAuto = true n.obj.value.isAuto = true
if n.isGeneric: if n.isGeneric:
@ -1023,7 +1023,7 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
else: else:
case node.value.kind: case node.value.kind:
of identExpr: of identExpr:
n.obj.value = self.inferOrError(node.value) n.obj.value = self.inferOrError(node.value).value
of binaryExpr: of binaryExpr:
# Type union # Type union
n.obj.value = Type(kind: Union, types: @[]) n.obj.value = Type(kind: Union, types: @[])

View File

@ -46,10 +46,21 @@ type
## - After that follows the argument count as a 1 byte integer ## - After that follows the argument count as a 1 byte integer
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with ## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
## its size as a 2-byte integer ## its size as a 2-byte integer
## modules contains information about all the peon modules that the compiler has encountered,
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
## is contained within a single binary blob. While this simplifies the implementation and makes
## bytecode files entirely "self-hosted", it also means that the original module information is
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
consts*: seq[uint8] consts*: seq[uint8]
code*: seq[uint8] code*: seq[uint8]
lines*: seq[int] lines*: seq[int]
functions*: seq[uint8] functions*: seq[uint8]
modules*: seq[uint8]
OpCode* {.pure.} = enum OpCode* {.pure.} = enum
## Enum of Peon's bytecode opcodes ## Enum of Peon's bytecode opcodes

View File

@ -1006,7 +1006,7 @@ proc terminateProgram(self: BytecodeCompiler, pos: int) =
self.emitByte(ReplExit, self.peek().token.line) self.emitByte(ReplExit, self.peek().token.line)
else: else:
self.emitByte(OpCode.Return, self.peek().token.line) self.emitByte(OpCode.Return, self.peek().token.line)
self.emitByte(0, self.peek().token.line) # Entry point has no return value (TODO: Add easter eggs, cuz why not) self.emitByte(0, self.peek().token.line) # Entry point has no return value
self.patchReturnAddress(pos) self.patchReturnAddress(pos)
@ -1478,8 +1478,9 @@ method lambdaExpr(self: BytecodeCompiler, node: LambdaExpr, compile: bool = true
line: node.token.line, line: node.token.line,
kind: NameKind.Function, kind: NameKind.Function,
belongsTo: function, belongsTo: function,
isReal: true) isReal: true,
if compile and node notin self.lambdas: )
if compile and node notin self.lambdas and not node.body.isNil():
self.lambdas.add(node) self.lambdas.add(node)
let jmp = self.emitJump(JumpForwards, node.token.line) let jmp = self.emitJump(JumpForwards, node.token.line)
if BlockStmt(node.body).code.len() == 0: if BlockStmt(node.body).code.len() == 0:
@ -1687,7 +1688,7 @@ proc importStmt(self: BytecodeCompiler, node: ImportStmt, compile: bool = true)
# Importing a module automatically exports # Importing a module automatically exports
# its public names to us # its public names to us
for name in self.findInModule("", module): for name in self.findInModule("", module):
name.exportedTo.incl(self.currentModule) name.exportedTo.incl(self.currentModule.path)
except IOError: except IOError:
self.error(&"could not import '{module.ident.token.lexeme}': {getCurrentExceptionMsg()}") self.error(&"could not import '{module.ident.token.lexeme}': {getCurrentExceptionMsg()}")
except OSError: except OSError:
@ -1705,22 +1706,22 @@ proc exportStmt(self: BytecodeCompiler, node: ExportStmt, compile: bool = true)
var name = self.resolveOrError(node.name) var name = self.resolveOrError(node.name)
if name.isPrivate: if name.isPrivate:
self.error("cannot export private names") self.error("cannot export private names")
name.exportedTo.incl(self.parentModule) name.exportedTo.incl(self.parentModule.path)
case name.kind: case name.kind:
of NameKind.Module: of NameKind.Module:
# We need to export everything # We need to export everything
# this module defines! # this module defines!
for name in self.findInModule("", name): for name in self.findInModule("", name):
name.exportedTo.incl(self.parentModule) name.exportedTo.incl(self.parentModule.path)
of NameKind.Function: of NameKind.Function:
# Only exporting a single function (or, well # Only exporting a single function (or, well
# all of its implementations) # all of its implementations)
for name in self.findByName(name.ident.token.lexeme): for name in self.findByName(name.ident.token.lexeme):
if name.kind != NameKind.Function: if name.kind != NameKind.Function:
continue continue
name.exportedTo.incl(self.parentModule) name.exportedTo.incl(self.parentModule.path)
else: else:
discard self.error("unsupported export type")
proc breakStmt(self: BytecodeCompiler, node: BreakStmt) = proc breakStmt(self: BytecodeCompiler, node: BreakStmt) =
@ -2073,6 +2074,7 @@ proc compile*(self: BytecodeCompiler, ast: seq[Declaration], file: string, lines
self.disabledWarnings = disabledWarnings self.disabledWarnings = disabledWarnings
self.showMismatches = showMismatches self.showMismatches = showMismatches
self.mode = mode self.mode = mode
let start = self.chunk.code.len()
if not incremental: if not incremental:
self.jumps = @[] self.jumps = @[]
let pos = self.beginProgram() let pos = self.beginProgram()
@ -2081,8 +2083,6 @@ proc compile*(self: BytecodeCompiler, ast: seq[Declaration], file: string, lines
while not self.done(): while not self.done():
self.declaration(Declaration(self.step())) self.declaration(Declaration(self.step()))
self.terminateProgram(pos) self.terminateProgram(pos)
# TODO: REPL is broken, we need a new way to make
# incremental compilation resume from where it stopped!
result = self.chunk result = self.chunk
@ -2100,7 +2100,7 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
break break
elif i == searchPath.high(): elif i == searchPath.high():
self.error(&"""could not import '{path}': module not found""") self.error(&"""could not import '{path}': module not found""")
if self.modules.contains(module): if self.modules.contains(module.path):
return return
let source = readFile(path) let source = readFile(path)
let current = self.current let current = self.current
@ -2115,11 +2115,19 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
self.replMode = false self.replMode = false
self.parentModule = currentModule self.parentModule = currentModule
self.currentModule = module self.currentModule = module
let start = self.chunk.code.len()
discard self.compile(self.parser.parse(self.lexer.lex(source, path), discard self.compile(self.parser.parse(self.lexer.lex(source, path),
path, self.lexer.getLines(), path, self.lexer.getLines(),
self.lexer.getSource(), persist=true), self.lexer.getSource(), persist=true),
path, self.lexer.getLines(), self.lexer.getSource(), chunk=self.chunk, incremental=true, path, self.lexer.getLines(), self.lexer.getSource(), chunk=self.chunk, incremental=true,
isMainModule=false, self.disabledWarnings, self.showMismatches, self.mode) isMainModule=false, self.disabledWarnings, self.showMismatches, self.mode)
# Mark the end of a new module
self.chunk.modules.extend(start.toTriple())
self.chunk.modules.extend(self.chunk.code.high().toTriple())
# I swear to god if someone ever creates a peon module with a name that's
# longer than 2^16 bytes I will hit them with a metal pipe. Mark my words
self.chunk.modules.extend(self.currentModule.ident.token.lexeme.len().toDouble())
self.chunk.modules.extend(self.currentModule.ident.token.lexeme.toBytes())
module.file = path module.file = path
# No need to save the old scope depth: import statements are # No need to save the old scope depth: import statements are
# only allowed at the top level! # only allowed at the top level!
@ -2133,4 +2141,4 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
self.replMode = replMode self.replMode = replMode
self.lines = lines self.lines = lines
self.source = src self.source = src
self.modules.incl(module) self.modules.incl(module.path)

View File

@ -22,12 +22,15 @@ import std/terminal
type type
Function = ref object Function = object
start, stop, bottom, argc: int start, stop, argc: int
name: string
Module = object
start, stop: int
name: string name: string
started, stopped: bool
Debugger* = ref object Debugger* = ref object
chunk: Chunk chunk: Chunk
modules: seq[Module]
functions: seq[Function] functions: seq[Function]
current: int current: int
@ -66,21 +69,38 @@ proc checkFunctionStart(self: Debugger, n: int) =
## Checks if a function begins at the given ## Checks if a function begins at the given
## bytecode offset ## bytecode offset
for i, e in self.functions: for i, e in self.functions:
if n == e.start and not (e.started or e.stopped): # Avoids duplicate output
e.started = true if n == e.start:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function Start ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ====" styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function Start ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $e.start styledEcho fgGreen, "\t- Start offset: ", fgYellow, $e.start
styledEcho fgGreen, "\t- End offset: ", fgYellow, $e.stop styledEcho fgGreen, "\t- End offset: ", fgYellow, $e.stop
styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc, "\n"
proc checkFunctionEnd(self: Debugger, n: int) = proc checkFunctionEnd(self: Debugger, n: int) =
## Checks if a function ends at the given ## Checks if a function ends at the given
## bytecode offset ## bytecode offset
for i, e in self.functions: for i, e in self.functions:
if n == e.stop and e.started and not e.stopped: if n == e.stop:
e.stopped = true
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function End ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ====" styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function End ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
proc checkModuleStart(self: Debugger, n: int) =
## Checks if a module begins at the given
## bytecode offset
for i, m in self.modules:
if m.start == n:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module Start ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $m.start
styledEcho fgGreen, "\t- End offset: ", fgYellow, $m.stop, "\n"
proc checkModuleEnd(self: Debugger, n: int) =
## Checks if a module ends at the given
## bytecode offset
for i, m in self.modules:
if m.stop == n:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module End ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
proc simpleInstruction(self: Debugger, instruction: OpCode) = proc simpleInstruction(self: Debugger, instruction: OpCode) =
@ -94,9 +114,6 @@ proc simpleInstruction(self: Debugger, instruction: OpCode) =
else: else:
stdout.styledWriteLine(fgYellow, "No") stdout.styledWriteLine(fgYellow, "No")
self.current += 1 self.current += 1
self.checkFunctionEnd(self.current - 2)
self.checkFunctionEnd(self.current - 1)
self.checkFunctionEnd(self.current)
proc stackTripleInstruction(self: Debugger, instruction: OpCode) = proc stackTripleInstruction(self: Debugger, instruction: OpCode) =
@ -168,20 +185,27 @@ proc jumpInstruction(self: Debugger, instruction: OpCode) =
self.current += 4 self.current += 4
while self.chunk.code[self.current] == NoOp.uint8: while self.chunk.code[self.current] == NoOp.uint8:
inc(self.current) inc(self.current)
for i in countup(orig, self.current + 1):
self.checkFunctionStart(i)
proc disassembleInstruction*(self: Debugger) = proc disassembleInstruction*(self: Debugger) =
## Takes one bytecode instruction and prints it ## Takes one bytecode instruction and prints it
let opcode = OpCode(self.chunk.code[self.current])
self.checkModuleStart(self.current)
self.checkFunctionStart(self.current)
printDebug("Offset: ") printDebug("Offset: ")
stdout.styledWriteLine(fgYellow, $(self.current)) stdout.styledWriteLine(fgYellow, $(self.current))
printDebug("Line: ") printDebug("Line: ")
stdout.styledWriteLine(fgYellow, &"{self.chunk.getLine(self.current)}") stdout.styledWriteLine(fgYellow, &"{self.chunk.getLine(self.current)}")
var opcode = OpCode(self.chunk.code[self.current])
case opcode: case opcode:
of simpleInstructions: of simpleInstructions:
self.simpleInstruction(opcode) self.simpleInstruction(opcode)
# Functions (and modules) only have a single return statement at the
# end of their body, so we never execute this more than once per module/function
if opcode == Return:
# -2 to skip the hardcoded argument to return
# and the increment by simpleInstruction()
self.checkFunctionEnd(self.current - 2)
self.checkModuleEnd(self.current - 1)
of constantInstructions: of constantInstructions:
self.constantInstruction(opcode) self.constantInstruction(opcode)
of stackDoubleInstructions: of stackDoubleInstructions:
@ -197,7 +221,9 @@ proc disassembleInstruction*(self: Debugger) =
else: else:
echo &"DEBUG - Unknown opcode {opcode} at index {self.current}" echo &"DEBUG - Unknown opcode {opcode} at index {self.current}"
self.current += 1 self.current += 1
proc parseFunctions(self: Debugger) = proc parseFunctions(self: Debugger) =
## Parses function information in the chunk ## Parses function information in the chunk
@ -206,7 +232,7 @@ proc parseFunctions(self: Debugger) =
name: string name: string
idx = 0 idx = 0
size = 0 size = 0
while idx < len(self.chunk.functions) - 1: while idx < self.chunk.functions.high():
start = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple()) start = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
idx += 3 idx += 3
stop = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple()) stop = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
@ -220,15 +246,36 @@ proc parseFunctions(self: Debugger) =
self.functions.add(Function(start: start, stop: stop, argc: argc, name: name)) self.functions.add(Function(start: start, stop: stop, argc: argc, name: name))
proc parseModules(self: Debugger) =
## Parses module information in the chunk
var
start, stop: int
name: string
idx = 0
size = 0
while idx < self.chunk.modules.high():
start = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
idx += 3
stop = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
idx += 3
size = int([self.chunk.modules[idx], self.chunk.modules[idx + 1]].fromDouble())
idx += 2
name = self.chunk.modules[idx..<idx + size].fromBytes()
inc(idx, size)
self.modules.add(Module(start: start, stop: stop, name: name))
proc disassembleChunk*(self: Debugger, chunk: Chunk, name: string) = proc disassembleChunk*(self: Debugger, chunk: Chunk, name: string) =
## Takes a chunk of bytecode and prints it ## Takes a chunk of bytecode and prints it
self.chunk = chunk self.chunk = chunk
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====\n" styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====\n"
self.current = 0 self.current = 0
self.parseFunctions() self.parseFunctions()
self.parseModules()
while self.current < self.chunk.code.len: while self.current < self.chunk.code.len:
self.disassembleInstruction() self.disassembleInstruction()
echo "" echo ""
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====" styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ===="

View File

@ -64,7 +64,8 @@ proc newSerializer*(self: Serializer = nil): Serializer =
proc writeHeaders(self: Serializer, stream: var seq[byte]) = proc writeHeaders(self: Serializer, stream: var seq[byte]) =
## Writes the Peon bytecode headers in-place into a byte stream ## Writes the Peon bytecode headers in-place into the
## given byte sequence
stream.extend(PeonBytecodeMarker.toBytes()) stream.extend(PeonBytecodeMarker.toBytes())
stream.add(byte(PEON_VERSION.major)) stream.add(byte(PEON_VERSION.major))
stream.add(byte(PEON_VERSION.minor)) stream.add(byte(PEON_VERSION.minor))
@ -77,25 +78,31 @@ proc writeHeaders(self: Serializer, stream: var seq[byte]) =
proc writeLineData(self: Serializer, stream: var seq[byte]) = proc writeLineData(self: Serializer, stream: var seq[byte]) =
## Writes line information for debugging ## Writes line information for debugging
## bytecode instructions ## bytecode instructions to the given byte
## sequence
stream.extend(len(self.chunk.lines).toQuad()) stream.extend(len(self.chunk.lines).toQuad())
for b in self.chunk.lines: for b in self.chunk.lines:
stream.extend(b.toTriple()) stream.extend(b.toTriple())
proc writeCFIData(self: Serializer, stream: var seq[byte]) = proc writeFunctions(self: Serializer, stream: var seq[byte]) =
## Writes Call Frame Information for debugging ## Writes debug info about functions to the
## functions ## given byte sequence
stream.extend(len(self.chunk.functions).toQuad()) stream.extend(len(self.chunk.functions).toQuad())
stream.extend(self.chunk.functions) stream.extend(self.chunk.functions)
proc writeConstants(self: Serializer, stream: var seq[byte]) = proc writeConstants(self: Serializer, stream: var seq[byte]) =
## Writes the constants table in-place into the ## Writes the constants table in-place into the
## given stream ## byte sequence
stream.extend(self.chunk.consts.len().toQuad()) stream.extend(self.chunk.consts.len().toQuad())
for constant in self.chunk.consts: stream.extend(self.chunk.consts)
stream.add(constant)
proc writeModules(self: Serializer, stream: var seq[byte]) =
## Writes module information to the given stream
stream.extend(self.chunk.modules.len().toQuad())
stream.extend(self.chunk.modules)
proc writeCode(self: Serializer, stream: var seq[byte]) = proc writeCode(self: Serializer, stream: var seq[byte]) =
@ -106,7 +113,7 @@ proc writeCode(self: Serializer, stream: var seq[byte]) =
proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): int = proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): int =
## Reads the bytecode headers from a given stream ## Reads the bytecode headers from a given sequence
## of bytes ## of bytes
var stream = stream var stream = stream
if stream[0..<len(PeonBytecodeMarker)] != PeonBytecodeMarker.toBytes(): if stream[0..<len(PeonBytecodeMarker)] != PeonBytecodeMarker.toBytes():
@ -131,7 +138,6 @@ proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): i
result += 8 result += 8
proc readLineData(self: Serializer, stream: seq[byte]): int = proc readLineData(self: Serializer, stream: seq[byte]): int =
## Reads line information from a stream ## Reads line information from a stream
## of bytes ## of bytes
@ -142,10 +148,11 @@ proc readLineData(self: Serializer, stream: seq[byte]): int =
self.chunk.lines.add(int([stream[0], stream[1], stream[2]].fromTriple())) self.chunk.lines.add(int([stream[0], stream[1], stream[2]].fromTriple()))
result += 3 result += 3
stream = stream[3..^1] stream = stream[3..^1]
doAssert len(self.chunk.lines) == int(size)
proc readCFIData(self: Serializer, stream: seq[byte]): int = proc readFunctions(self: Serializer, stream: seq[byte]): int =
## Reads Call Frame Information from a stream ## Reads the function segment from a stream
## of bytes ## of bytes
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad() let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4 result += 4
@ -153,22 +160,34 @@ proc readCFIData(self: Serializer, stream: seq[byte]): int =
for i in countup(0, int(size) - 1): for i in countup(0, int(size) - 1):
self.chunk.functions.add(stream[i]) self.chunk.functions.add(stream[i])
inc(result) inc(result)
doAssert len(self.chunk.functions) == int(size)
proc readConstants(self: Serializer, stream: seq[byte]): int = proc readConstants(self: Serializer, stream: seq[byte]): int =
## Reads the constant table from the given stream ## Reads the constant table from the given
## of bytes ## byte sequence
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad() let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4 result += 4
var stream = stream[4..^1] var stream = stream[4..^1]
for i in countup(0, int(size) - 1): for i in countup(0, int(size) - 1):
self.chunk.consts.add(stream[i]) self.chunk.consts.add(stream[i])
inc(result) inc(result)
doAssert len(self.chunk.consts) == int(size)
proc readModules(self: Serializer, stream: seq[byte]): int =
## Reads module information
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.modules.add(stream[i])
inc(result)
doAssert len(self.chunk.modules) == int(size)
proc readCode(self: Serializer, stream: seq[byte]): int = proc readCode(self: Serializer, stream: seq[byte]): int =
## Reads the bytecode from a given stream and writes ## Reads the bytecode from a given byte sequence
## it into the given chunk
let size = [stream[0], stream[1], stream[2]].fromTriple() let size = [stream[0], stream[1], stream[2]].fromTriple()
var stream = stream[3..^1] var stream = stream[3..^1]
for i in countup(0, int(size) - 1): for i in countup(0, int(size) - 1):
@ -178,13 +197,16 @@ proc readCode(self: Serializer, stream: seq[byte]): int =
proc dumpBytes*(self: Serializer, chunk: Chunk, filename: string): seq[byte] = proc dumpBytes*(self: Serializer, chunk: Chunk, filename: string): seq[byte] =
## Dumps the given bytecode and file to a sequence of bytes and returns it. ## Dumps the given chunk to a sequence of bytes and returns it.
## The filename argument is for error reporting only, use dumpFile
## to dump bytecode to a file
self.filename = filename self.filename = filename
self.chunk = chunk self.chunk = chunk
self.writeHeaders(result) self.writeHeaders(result)
self.writeLineData(result) self.writeLineData(result)
self.writeCFIData(result) self.writeFunctions(result)
self.writeConstants(result) self.writeConstants(result)
self.writeModules(result)
self.writeCode(result) self.writeCode(result)
@ -207,8 +229,9 @@ proc loadBytes*(self: Serializer, stream: seq[byte]): Serialized =
try: try:
stream = stream[self.readHeaders(stream, result)..^1] stream = stream[self.readHeaders(stream, result)..^1]
stream = stream[self.readLineData(stream)..^1] stream = stream[self.readLineData(stream)..^1]
stream = stream[self.readCFIData(stream)..^1] stream = stream[self.readFunctions(stream)..^1]
stream = stream[self.readConstants(stream)..^1] stream = stream[self.readConstants(stream)..^1]
stream = stream[self.readModules(stream)..^1]
stream = stream[self.readCode(stream)..^1] stream = stream[self.readCode(stream)..^1]
except IndexDefect: except IndexDefect:
self.error("truncated bytecode stream") self.error("truncated bytecode stream")

View File

@ -246,6 +246,11 @@ proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints
styledEcho fgGreen, "OK" styledEcho fgGreen, "OK"
else: else:
styledEcho fgRed, "Corrupted" styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Modules segment: ")
if serialized.chunk.modules == compiled.modules:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
if run: if run:
case backend: case backend:
of PeonBackend.Bytecode: of PeonBackend.Bytecode:

View File

@ -1,7 +1,7 @@
import std; import std;
const max = 50000; const max = 500000;
var x = max; var x = max;
var s = "just a test"; var s = "just a test";