Fixed various bugs related to lambdas and imports. Added module info section to bytecode dumps

Merge
2023-05-22 12:57:38 +02:00 · 2023-05-09 11:00:35 +02:00
10 changed files with 239 additions and 109 deletions
--- a/docs/bytecode.md
+++ b/docs/bytecode.md
@ -1,7 +1,8 @@
 # Peon - Bytecode Specification

 This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
-other file-like objects.
+other file-like objects. Note that the segments in a bytecode dump appear in the order they are listed
+in this document.

 ## Code Structure

@ -9,12 +10,12 @@ A peon program is compiled into a tightly packed sequence of bytes that contain
 the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
 bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.

-A peon bytecode dump contains:
+A peon bytecode file contains the following:

 - Constants
- The bytecode itself
- Debugging information
- File and version metadata 
+- The program's code
+- Debugging information (file and version metadata, module info. Optional)
+

 ## File Headers

@ -34,7 +35,7 @@ in release builds.
 ### Line data segment

 The line data segment contains information about each instruction in the code segment and associates them
-1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's 
+1:1 with a line number in the original source file for easier debugging using run-length encoding. The segment's 
 size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
 in this segment can be decoded as explained in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L29), which is quoted
 below:
@ -57,7 +58,7 @@ below:

 This segment contains details about each function in the original file. The segment's size is fixed and is encoded at the 
 beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data in this segment can be decoded as explained 
-in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L39), which is quoted below:
+in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L39), which is quoted below:

 ```
 [...]
@ -74,6 +75,26 @@ in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L39), whic
 [...]
 ```

+### Modules segment
+
+This segment contains details about the modules that make up the original source code which produced a given bytecode dump.
+The data in this segment can be decoded as explained in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L49), which is quoted below:
+```
+[...]
+## modules contains information about all the peon modules that the compiler has encountered,
+## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
+## Python, peon does not produce a bytecode file for each separate module it compiles: everything 
+## is contained within a single binary blob. While this simplifies the implementation and makes 
+## bytecode files entirely "self-hosted", it also means that the original module information is 
+## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
+## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
+## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
+## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
+## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
+[...]
+```
+
+
 ## Constant segment

 The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
@ -87,6 +108,6 @@ real-world scenarios it likely won't be.

 ## Code segment

-The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly 
-and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes 
+The code segment contains the linear sequence of bytecode instructions of a peon program to be fed directly to
+peon's virtual machine. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes 
 (i.e. a single 24 bit integer). All the instructions are documented [here](../src/frontend/compiler/targgets/bytecode/opcodes.nim)
--- a/src/backend/vm.nim
+++ b/src/backend/vm.nim
@ -68,7 +68,8 @@ type
        ## this system and is not handled 
        ## manually by the VM
        bytesAllocated: tuple[total, current: int]
-        cycles: int
+        when debugGC or debugAlloc:
+            cycles: int
        nextGC: int
        pointers: HashSet[uint64]
    PeonVM* = object
@ -93,9 +94,10 @@ type
        frames: seq[uint64]         # Stores the bottom of stack frames
        results: seq[uint64]        # Stores function return values
        gc: PeonGC                  # A reference to the VM's garbage collector
-        breakpoints: seq[uint64]    # Breakpoints where we call our debugger
-        debugNext: bool             # Whether to debug the next instruction
-        lastDebugCommand: string    # The last debugging command input by the user
+        when debugVM:
+            breakpoints: seq[uint64]    # Breakpoints where we call our debugger
+            debugNext: bool             # Whether to debug the next instruction
+            lastDebugCommand: string    # The last debugging command input by the user


 # Implementation of peon's memory manager
@ -105,7 +107,8 @@ proc newPeonGC*: PeonGC =
    ## garbage collector
    result.bytesAllocated = (0, 0)
    result.nextGC = FirstGC
-    result.cycles = 0
+    when debugGC or debugAlloc:
+        result.cycles = 0


 proc collect*(self: var PeonVM)
@ -214,6 +217,16 @@ proc markRoots(self: var PeonVM): HashSet[ptr HeapObject] =
    # will mistakenly assume the object to be reachable, potentially 
    # leading to a nasty memory leak. Let's just hope a 48+ bit address 
    # space makes this occurrence rare enough not to be a problem
+    # handles a single type (uint64), while Lox has a stack
+    # of heap-allocated structs (which is convenient, but slow).
+    # What we do instead is store all pointers allocated by us
+    # in a hash set and then check if any source of roots contained
+    # any of the integer values that we're keeping track of. Note
+    # that this means that if a primitive object's value happens to
+    # collide with an active pointer, the GC will mistakenly assume
+    # the object to be reachable (potentially leading to a nasty
+    # memory leak). Hopefully, in a 64-bit address space, this
+    # occurrence is rare enough for us to ignore
    var result = initHashSet[uint64](self.gc.pointers.len())
    for obj in self.calls:
        if obj in self.gc.pointers:
@ -285,7 +298,6 @@ proc sweep(self: var PeonVM) =
    ## during the mark phase.
    when debugGC:
        echo "DEBUG - GC: Beginning sweeping phase"
-    when debugGC:
        var count = 0
    var current: ptr HeapObject
    var freed: HashSet[uint64]
@ -1050,10 +1062,11 @@ proc run*(self: var PeonVM, chunk: Chunk, breakpoints: seq[uint64] = @[], repl:
    self.frames = @[]
    self.calls = @[]
    self.operands = @[]
-    self.breakpoints = breakpoints
    self.results = @[]
    self.ip = 0
-    self.lastDebugCommand = ""
+    when debugVM:
+        self.breakpoints = breakpoints
+        self.lastDebugCommand = ""
    try:
        self.dispatch()
    except NilAccessDefect:
--- a/src/frontend/compiler/compiler.nim
+++ b/src/frontend/compiler/compiler.nim
@ -134,7 +134,7 @@ type
        node*: Declaration
        # Who is this name exported to? (Only makes sense if isPrivate
        # equals false)
-        exportedTo*: HashSet[Name]
+        exportedTo*: HashSet[string]
        # Has the compiler generated this name internally or
        # does it come from user code?
        isReal*: bool
@ -212,7 +212,7 @@ type
        # The module importing us, if any
        parentModule*: Name
        # Currently imported modules
-        modules*: HashSet[Name]
+        modules*: HashSet[string]

    TypedNode* = ref object
        ## A wapper for AST nodes
@ -354,7 +354,7 @@ proc resolve*(self: Compiler, name: string): Name =
                    # module, so we definitely can't
                    # use it
                    continue
-                elif self.currentModule in obj.exportedTo:
+                elif self.currentModule.path in obj.exportedTo:
                    # The name is public in its owner
                    # module and said module has explicitly
                    # exported it to us: we can use it
@ -713,7 +713,7 @@ method findByName*(self: Compiler, name: string): seq[Name] =
    for obj in reversed(self.names):
        if obj.ident.token.lexeme == name:
            if obj.owner.path != self.currentModule.path:
-                if obj.isPrivate or self.currentModule notin obj.exportedTo:
+                if obj.isPrivate or self.currentModule.path notin obj.exportedTo:
                    continue
            result.add(obj)

@ -727,11 +727,13 @@ method findInModule*(self: Compiler, name: string, module: Name): seq[Name] =
    ## the current one or not
    if name == "":
        for obj in reversed(self.names):
-            if not obj.isPrivate and obj.owner == module:
+            if obj.owner.isNil():
+                continue
+            if not obj.isPrivate and obj.owner.path == module.path:
                result.add(obj)
    else:
        for obj in self.findInModule("", module):
-            if obj.ident.token.lexeme == name and self.currentModule in obj.exportedTo:
+            if obj.ident.token.lexeme == name and self.currentModule.path in obj.exportedTo:
                result.add(obj)


@ -1034,7 +1036,7 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
            break
        if name.ident.token.lexeme != declaredName:
            continue
-        if name.owner != n.owner and (name.isPrivate or n.owner notin name.exportedTo):
+        if name.owner != n.owner and (name.isPrivate or n.owner.path notin name.exportedTo):
            continue
        if name.kind in [NameKind.Var, NameKind.Module, NameKind.CustomType, NameKind.Enum]:
            if name.depth < n.depth:
--- a/src/frontend/compiler/newcompiler.nim
+++ b/src/frontend/compiler/newcompiler.nim
@ -124,11 +124,11 @@ type
            of Reference:
                # A managed reference
                nullable*: bool    # Is null a valid value for this type? (false by default)
-                value*: Type       # The type the reference points to
+                value*: TypedNode  # The type the reference points to
            of Pointer:
                # An unmanaged reference. Much
                # like a raw pointer in C
-                data*: Type     # The type we point to
+                data*: TypedNode   # The type we point to
            of TypeDecl:
                # A user-defined type
                fields*: seq[TypedArgument]  # List of fields in the object. May be empty
@ -317,17 +317,17 @@ proc step*(self: Compiler): ASTNode {.inline.} =

 # Some forward declarations
 proc compareUnions*(self: Compiler, a, b: seq[tuple[match: bool, kind: Type]]): bool
-proc expression*(self: Compiler, node: Expression, compile: bool = true): Type {.discardable.} = nil
-proc identifier*(self: Compiler, node: IdentExpr, name: Name = nil, compile: bool = true, strict: bool = true): Type {.discardable.} = nil
-proc call*(self: Compiler, node: CallExpr, compile: bool = true): Type {.discardable.} = nil
-proc getItemExpr*(self: Compiler, node: GetItemExpr, compile: bool = true, matching: Type = nil): Type {.discardable.} = nil
-proc unary*(self: Compiler, node: UnaryExpr, compile: bool = true): Type {.discardable.} = nil
-proc binary*(self: Compiler, node: BinaryExpr, compile: bool = true): Type {.discardable.} = nil
-proc lambdaExpr*(self: Compiler, node: LambdaExpr, compile: bool = true): Type {.discardable.} = nil
-proc literal*(self: Compiler, node: ASTNode, compile: bool = true): Type {.discardable.} = nil
-proc infer*(self: Compiler, node: LiteralExpr): Type
-proc infer*(self: Compiler, node: Expression): Type
-proc inferOrError*(self: Compiler, node: Expression): Type
+proc expression*(self: Compiler, node: Expression, compile: bool = true): TypedNode {.discardable.} = nil
+proc identifier*(self: Compiler, node: IdentExpr, name: Name = nil, compile: bool = true, strict: bool = true): TypedNode {.discardable.} = nil
+proc call*(self: Compiler, node: CallExpr, compile: bool = true): TypedNode {.discardable.} = nil
+proc getItemExpr*(self: Compiler, node: GetItemExpr, compile: bool = true, matching: Type = nil): TypedNode {.discardable.} = nil
+proc unary*(self: Compiler, node: UnaryExpr, compile: bool = true): TypedNode {.discardable.} = nil
+proc binary*(self: Compiler, node: BinaryExpr, compile: bool = true): TypedNode {.discardable.} = nil
+proc lambdaExpr*(self: Compiler, node: LambdaExpr, compile: bool = true): TypedNode {.discardable.} = nil
+proc literal*(self: Compiler, node: ASTNode, compile: bool = true): TypedNode {.discardable.} = nil
+proc infer*(self: Compiler, node: LiteralExpr): TypedNode
+proc infer*(self: Compiler, node: Expression): TypedNode
+proc inferOrError*(self: Compiler, node: Expression): TypedNode
 proc findByName*(self: Compiler, name: string): seq[Name]
 proc findInModule*(self: Compiler, name: string, module: Name): seq[Name]
 proc findByType*(self: Compiler, name: string, kind: Type): seq[Name]
@ -420,7 +420,7 @@ proc compare*(self: Compiler, a, b: Type): bool =
                # a and b are of either of the two
                # types in this branch, so we just need
                # to compare their values
-                return self.compare(a.value, b.value)
+                return self.compare(a.value.value, b.value.value)
            of Function:
                # Functions are a bit trickier to compare
                if a.arguments.len() != b.arguments.len():
@ -569,7 +569,7 @@ proc toIntrinsic*(name: string): Type =
        return Type(kind: String)


-proc infer*(self: Compiler, node: LiteralExpr): Type =
+proc infer*(self: Compiler, node: LiteralExpr): TypedNode =
    ## Infers the type of a given literal expression
    if node.isNil():
        return nil
@ -577,32 +577,32 @@ proc infer*(self: Compiler, node: LiteralExpr): Type =
        of intExpr, binExpr, octExpr, hexExpr:
            let size = node.token.lexeme.split("'")
            if size.len() == 1:
-                return Type(kind: Int64)
+                return TypedNode(node: node, value: Type(kind: Int64))
            let typ = size[1].toIntrinsic()
            if not self.compare(typ, nil):
-                return typ
+                return TypedNode(node: node, value: typ)
            else:
                self.error(&"invalid type specifier '{size[1]}' for int", node)
        of floatExpr:
            let size = node.token.lexeme.split("'")
            if size.len() == 1:
-                return Type(kind: Float64)
+                return  TypedNode(node: node, value: Type(kind: Float64))
            let typ = size[1].toIntrinsic()
            if not typ.isNil():
-                return typ
+                return  TypedNode(node: node, value: typ)
            else:
                self.error(&"invalid type specifier '{size[1]}' for float", node)
        of trueExpr:
-            return Type(kind: Bool)
+            return  TypedNode(node: node, value: Type(kind: Bool))
        of falseExpr:
-            return Type(kind: Bool)
+            return  TypedNode(node: node, value: Type(kind: Bool))
        of strExpr:
-            return Type(kind: String)
+            return  TypedNode(node: node, value: Type(kind: String))
        else:
            discard # Unreachable


-proc infer*(self: Compiler, node: Expression): Type  =
+proc infer*(self: Compiler, node: Expression): TypedNode =
    ## Infers the type of a given expression and
    ## returns it
    if node.isNil():
@ -621,9 +621,9 @@ proc infer*(self: Compiler, node: Expression): Type  =
        of NodeKind.callExpr:
            result = self.call(CallExpr(node), compile=false)
        of NodeKind.refExpr:
-            result = Type(kind: Reference, value: self.infer(Ref(node).value))
+            result =  TypedNode(node: node, value: Type(kind: Reference, value: self.infer(Ref(node).value)))
        of NodeKind.ptrExpr:
-            result = Type(kind: Pointer, data: self.infer(Ptr(node).value))
+            result =  TypedNode(node: node, value: Type(kind: Pointer, data: self.infer(Ptr(node).value)))
        of NodeKind.groupingExpr:
            result = self.infer(GroupingExpr(node).expression)
        of NodeKind.getItemExpr:
@ -634,7 +634,7 @@ proc infer*(self: Compiler, node: Expression): Type  =
            discard # TODO


-proc inferOrError*(self: Compiler, node: Expression): Type =
+proc inferOrError*(self: Compiler, node: Expression): TypedNode =
    ## Attempts to infer the type of
    ## the given expression and raises an 
    ## error if it fails
@ -648,16 +648,16 @@ proc stringify*(self: Compiler, typ: Type): string =
    ## type object
    if typ.isNil():
        return "nil"
-    case typ.value.kind:
+    case typ.kind:
        of Int8, UInt8, Int16, UInt16, Int32,
           UInt32, Int64, UInt64, Float32, Float64,
           Char, Byte, String, Nil, TypeKind.Nan, Bool,
           TypeKind.Inf, Auto:
-            result &= ($typ.value.kind).toLowerAscii()
+            result &= ($typ.kind).toLowerAscii()
        of Pointer:
-            result &= &"ptr {self.stringify(typ.value)}"
+            result &= &"ptr {self.stringify(typ)}"
        of Reference:
-            result &= &"ref {self.stringify(typ.value)}"
+            result &= &"ref {self.stringify(typ)}"
        of Any:
            return "any"
        of Union:
@ -770,9 +770,9 @@ proc check*(self: Compiler, term: Expression, kind: Type) {.inline.} =
    ## Raises an error if appropriate and returns
    ## otherwise
    let k = self.inferOrError(term)
-    if not self.compare(k, kind):
+    if not self.compare(k.value, kind):
        self.error(&"expecting value of type {self.stringify(kind)}, got {self.stringify(k)}", term)
-    elif k.kind == Any and kind.kind != Any:
+    elif k.value.kind == Any and kind.kind != Any:
        self.error(&"any is not a valid type in this context")


@ -857,7 +857,7 @@ proc unpackGenerics*(self: Compiler, condition: Expression, list: var seq[tuple[
    ## Recursively unpacks a type constraint in a generic type
    case condition.kind:
        of identExpr:
-            list.add((accept, self.inferOrError(condition)))
+            list.add((accept, self.inferOrError(condition).value))
            if list[^1].kind.kind == Auto:
                self.error("automatic types cannot be used within generics", condition)
        of binaryExpr:
@ -883,7 +883,7 @@ proc unpackUnion*(self: Compiler, condition: Expression, list: var seq[tuple[mat
    ## Recursively unpacks a type union
    case condition.kind:
        of identExpr:
-            list.add((accept, self.inferOrError(condition)))
+            list.add((accept, self.inferOrError(condition).value))
        of binaryExpr:
            let condition = BinaryExpr(condition)
            case condition.operator.lexeme:
@ -966,13 +966,13 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
                n.isGeneric = true
            var typ: Type
            for argument in node.arguments:
-                typ = self.infer(argument.valueType)
+                typ = self.infer(argument.valueType).value
                if not typ.isNil() and typ.kind == Auto:
                    n.obj.value.isAuto = true
                    if n.isGeneric:
                        self.error("automatic types cannot be used within generics", argument.valueType)
                    break
-            typ = self.infer(node.returnType)
+            typ = self.infer(node.returnType).value
            if not typ.isNil() and typ.kind == Auto:
                n.obj.value.isAuto = true
                if n.isGeneric:
@ -1023,7 +1023,7 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
            else:
                case node.value.kind:
                    of identExpr:
-                        n.obj.value = self.inferOrError(node.value)
+                        n.obj.value = self.inferOrError(node.value).value
                    of binaryExpr:
                        # Type union
                        n.obj.value = Type(kind: Union, types: @[])
--- a/src/frontend/compiler/targets/bytecode/opcodes.nim
+++ b/src/frontend/compiler/targets/bytecode/opcodes.nim
@ -46,10 +46,21 @@ type
        ## - After that follows the argument count as a 1 byte integer
        ## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
        ##   its size as a 2-byte integer
+        ## modules contains information about all the peon modules that the compiler has encountered,
+        ## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
+        ## Python, peon does not produce a bytecode file for each separate module it compiles: everything 
+        ## is contained within a single binary blob. While this simplifies the implementation and makes 
+        ## bytecode files entirely "self-hosted", it also means that the original module information is 
+        ## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
+        ## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
+        ## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
+        ## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
+        ## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
        consts*: seq[uint8]
        code*: seq[uint8]
        lines*: seq[int]
        functions*: seq[uint8]
+        modules*: seq[uint8]

    OpCode* {.pure.} = enum
        ## Enum of Peon's bytecode opcodes
--- a/src/frontend/compiler/targets/bytecode/target.nim
+++ b/src/frontend/compiler/targets/bytecode/target.nim
@ -1006,7 +1006,7 @@ proc terminateProgram(self: BytecodeCompiler, pos: int) =
        self.emitByte(ReplExit, self.peek().token.line)
    else:
        self.emitByte(OpCode.Return, self.peek().token.line)
-        self.emitByte(0, self.peek().token.line)   # Entry point has no return value (TODO: Add easter eggs, cuz why not)
+        self.emitByte(0, self.peek().token.line)   # Entry point has no return value
        self.patchReturnAddress(pos)


@ -1478,8 +1478,9 @@ method lambdaExpr(self: BytecodeCompiler, node: LambdaExpr, compile: bool = true
                                line: node.token.line,
                                kind: NameKind.Function,
                                belongsTo: function,
-                                isReal: true)
-    if compile and node notin self.lambdas:
+                                isReal: true,
+                               )
+    if compile and node notin self.lambdas and not node.body.isNil():
        self.lambdas.add(node)
        let jmp = self.emitJump(JumpForwards, node.token.line)
        if BlockStmt(node.body).code.len() == 0:
@ -1687,7 +1688,7 @@ proc importStmt(self: BytecodeCompiler, node: ImportStmt, compile: bool = true)
            # Importing a module automatically exports
            # its public names to us
            for name in self.findInModule("", module):
-                name.exportedTo.incl(self.currentModule)
+                name.exportedTo.incl(self.currentModule.path)
    except IOError:
        self.error(&"could not import '{module.ident.token.lexeme}': {getCurrentExceptionMsg()}")
    except OSError:
@ -1705,22 +1706,22 @@ proc exportStmt(self: BytecodeCompiler, node: ExportStmt, compile: bool = true)
    var name = self.resolveOrError(node.name)
    if name.isPrivate:
        self.error("cannot export private names")
-    name.exportedTo.incl(self.parentModule)
+    name.exportedTo.incl(self.parentModule.path)
    case name.kind:
        of NameKind.Module:
            # We need to export everything
            # this module defines!
            for name in self.findInModule("", name):
-                name.exportedTo.incl(self.parentModule)
+                name.exportedTo.incl(self.parentModule.path)
        of NameKind.Function:
            # Only exporting a single function (or, well
            # all of its implementations)
            for name in self.findByName(name.ident.token.lexeme):
                if name.kind != NameKind.Function:
                    continue
-                name.exportedTo.incl(self.parentModule)
+                name.exportedTo.incl(self.parentModule.path)
        else:
-            discard
+            self.error("unsupported export type")


 proc breakStmt(self: BytecodeCompiler, node: BreakStmt) =
@ -2073,6 +2074,7 @@ proc compile*(self: BytecodeCompiler, ast: seq[Declaration], file: string, lines
    self.disabledWarnings = disabledWarnings
    self.showMismatches = showMismatches
    self.mode = mode
+    let start = self.chunk.code.len()
    if not incremental:
        self.jumps = @[]
    let pos = self.beginProgram()
@ -2081,8 +2083,6 @@ proc compile*(self: BytecodeCompiler, ast: seq[Declaration], file: string, lines
    while not self.done():
        self.declaration(Declaration(self.step()))
    self.terminateProgram(pos)
-    # TODO: REPL is broken, we need a new way to make
-    # incremental compilation resume from where it stopped!
    result = self.chunk


@ -2100,7 +2100,7 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
            break
        elif i == searchPath.high():
            self.error(&"""could not import '{path}': module not found""")
-    if self.modules.contains(module):
+    if self.modules.contains(module.path):
        return
    let source = readFile(path)
    let current = self.current
@ -2115,11 +2115,19 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
    self.replMode = false
    self.parentModule = currentModule
    self.currentModule = module
+    let start = self.chunk.code.len()
    discard self.compile(self.parser.parse(self.lexer.lex(source, path), 
                                           path, self.lexer.getLines(), 
                                           self.lexer.getSource(), persist=true), 
                         path, self.lexer.getLines(), self.lexer.getSource(), chunk=self.chunk, incremental=true,
                         isMainModule=false, self.disabledWarnings, self.showMismatches, self.mode)
+    # Mark the end of a new module
+    self.chunk.modules.extend(start.toTriple())
+    self.chunk.modules.extend(self.chunk.code.high().toTriple())
+    # I swear to god if someone ever creates a peon module with a name that's
+    # longer than 2^16 bytes I will hit them with a metal pipe. Mark my words
+    self.chunk.modules.extend(self.currentModule.ident.token.lexeme.len().toDouble())
+    self.chunk.modules.extend(self.currentModule.ident.token.lexeme.toBytes())
    module.file = path
    # No need to save the old scope depth: import statements are
    # only allowed at the top level!
@ -2133,4 +2141,4 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
    self.replMode = replMode
    self.lines = lines
    self.source = src
-    self.modules.incl(module)
+    self.modules.incl(module.path)
--- a/src/frontend/compiler/targets/bytecode/util/debugger.nim
+++ b/src/frontend/compiler/targets/bytecode/util/debugger.nim
@ -22,12 +22,15 @@ import std/terminal


 type 
-    Function = ref object
-        start, stop, bottom, argc: int
+    Function = object
+        start, stop, argc: int
+        name: string
+    Module = object
+        start, stop: int
        name: string
-        started, stopped: bool
    Debugger* = ref object
        chunk: Chunk
+        modules: seq[Module]
        functions: seq[Function]
        current: int

@ -66,21 +69,38 @@ proc checkFunctionStart(self: Debugger, n: int) =
    ## Checks if a function begins at the given
    ## bytecode offset
    for i, e in self.functions:
-        if n == e.start and not (e.started or e.stopped):
-            e.started = true
+        # Avoids duplicate output
+        if n == e.start:
            styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function Start ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
            styledEcho fgGreen, "\t- Start offset: ", fgYellow, $e.start
            styledEcho fgGreen, "\t- End offset: ", fgYellow, $e.stop
-            styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc
+            styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc, "\n"


 proc checkFunctionEnd(self: Debugger, n: int) =
    ## Checks if a function ends at the given
    ## bytecode offset
    for i, e in self.functions:
-        if n == e.stop and e.started and not e.stopped:
-            e.stopped = true
+        if n == e.stop:
            styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function End ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
+
+
+proc checkModuleStart(self: Debugger, n: int) =
+    ## Checks if a module begins at the given
+    ## bytecode offset
+    for i, m in self.modules:
+        if m.start == n:
+            styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module Start ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
+            styledEcho fgGreen, "\t- Start offset: ", fgYellow, $m.start
+            styledEcho fgGreen, "\t- End offset: ", fgYellow, $m.stop, "\n"
+
+
+proc checkModuleEnd(self: Debugger, n: int) =
+    ## Checks if a module ends at the given
+    ## bytecode offset
+    for i, m in self.modules:
+        if m.stop == n:
+            styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module End ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
    

 proc simpleInstruction(self: Debugger, instruction: OpCode) =
@ -94,9 +114,6 @@ proc simpleInstruction(self: Debugger, instruction: OpCode) =
        else:
            stdout.styledWriteLine(fgYellow, "No")
        self.current += 1
-        self.checkFunctionEnd(self.current - 2)
-        self.checkFunctionEnd(self.current - 1)
-        self.checkFunctionEnd(self.current)


 proc stackTripleInstruction(self: Debugger, instruction: OpCode) =
@ -168,20 +185,27 @@ proc jumpInstruction(self: Debugger, instruction: OpCode) =
    self.current += 4
    while self.chunk.code[self.current] == NoOp.uint8:
        inc(self.current)
-    for i in countup(orig, self.current + 1):
-        self.checkFunctionStart(i)
    

 proc disassembleInstruction*(self: Debugger) =
    ## Takes one bytecode instruction and prints it
+    let opcode = OpCode(self.chunk.code[self.current])
+    self.checkModuleStart(self.current)
+    self.checkFunctionStart(self.current)
    printDebug("Offset: ")
    stdout.styledWriteLine(fgYellow, $(self.current))
    printDebug("Line: ")
    stdout.styledWriteLine(fgYellow, &"{self.chunk.getLine(self.current)}")
-    var opcode = OpCode(self.chunk.code[self.current])
    case opcode:
        of simpleInstructions:
            self.simpleInstruction(opcode)
+            # Functions (and modules) only have a single return statement at the
+            # end of their body, so we never execute this more than once per module/function
+            if opcode == Return:
+                # -2 to skip the hardcoded argument to return
+                # and the increment by simpleInstruction()
+                self.checkFunctionEnd(self.current - 2)
+                self.checkModuleEnd(self.current - 1)
        of constantInstructions:
            self.constantInstruction(opcode)
        of stackDoubleInstructions:
@ -197,7 +221,9 @@ proc disassembleInstruction*(self: Debugger) =
        else:
            echo &"DEBUG - Unknown opcode {opcode} at index {self.current}"
            self.current += 1
-    
+
+
+

 proc parseFunctions(self: Debugger) =
    ## Parses function information in the chunk
@ -206,7 +232,7 @@ proc parseFunctions(self: Debugger) =
        name: string
        idx = 0
        size = 0
-    while idx < len(self.chunk.functions) - 1:
+    while idx < self.chunk.functions.high():
        start = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
        idx += 3
        stop = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
@ -220,15 +246,36 @@ proc parseFunctions(self: Debugger) =
        self.functions.add(Function(start: start, stop: stop, argc: argc, name: name))


+proc parseModules(self: Debugger) =
+    ## Parses module information in the chunk
+    var 
+        start, stop: int
+        name: string
+        idx = 0
+        size = 0
+    while idx < self.chunk.modules.high():
+        start = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
+        idx += 3
+        stop = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
+        idx += 3
+        size = int([self.chunk.modules[idx], self.chunk.modules[idx + 1]].fromDouble())
+        idx += 2
+        name = self.chunk.modules[idx..<idx + size].fromBytes()
+        inc(idx, size)
+        self.modules.add(Module(start: start, stop: stop, name: name))
+
+
 proc disassembleChunk*(self: Debugger, chunk: Chunk, name: string) =
    ## Takes a chunk of bytecode and prints it
    self.chunk = chunk
    styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====\n"
    self.current = 0
    self.parseFunctions()
+    self.parseModules()
    while self.current < self.chunk.code.len:
        self.disassembleInstruction()
        echo ""
+    
    styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ===="


--- a/src/frontend/compiler/targets/bytecode/util/serializer.nim
+++ b/src/frontend/compiler/targets/bytecode/util/serializer.nim
@ -64,7 +64,8 @@ proc newSerializer*(self: Serializer = nil): Serializer =


 proc writeHeaders(self: Serializer, stream: var seq[byte]) =
-    ## Writes the Peon bytecode headers in-place into a byte stream
+    ## Writes the Peon bytecode headers in-place into the
+    ## given byte sequence
    stream.extend(PeonBytecodeMarker.toBytes())
    stream.add(byte(PEON_VERSION.major))
    stream.add(byte(PEON_VERSION.minor))
@ -77,25 +78,31 @@ proc writeHeaders(self: Serializer, stream: var seq[byte]) =

 proc writeLineData(self: Serializer, stream: var seq[byte]) =
    ## Writes line information for debugging 
-    ## bytecode instructions
+    ## bytecode instructions to the given byte
+    ## sequence
    stream.extend(len(self.chunk.lines).toQuad())
    for b in self.chunk.lines:
        stream.extend(b.toTriple())


-proc writeCFIData(self: Serializer, stream: var seq[byte]) =
-    ## Writes Call Frame Information for debugging
-    ## functions
+proc writeFunctions(self: Serializer, stream: var seq[byte]) =
+    ## Writes debug info about functions to the
+    ## given byte sequence
    stream.extend(len(self.chunk.functions).toQuad())
    stream.extend(self.chunk.functions)


 proc writeConstants(self: Serializer, stream: var seq[byte]) =
    ## Writes the constants table in-place into the 
-    ## given stream
+    ## byte sequence
    stream.extend(self.chunk.consts.len().toQuad())
-    for constant in self.chunk.consts:
-        stream.add(constant)
+    stream.extend(self.chunk.consts)
+
+
+proc writeModules(self: Serializer, stream: var seq[byte]) =
+    ## Writes module information to the given stream
+    stream.extend(self.chunk.modules.len().toQuad())
+    stream.extend(self.chunk.modules)


 proc writeCode(self: Serializer, stream: var seq[byte]) =
@ -106,7 +113,7 @@ proc writeCode(self: Serializer, stream: var seq[byte]) =


 proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): int =
-    ## Reads the bytecode headers from a given stream
+    ## Reads the bytecode headers from a given sequence
    ## of bytes
    var stream = stream
    if stream[0..<len(PeonBytecodeMarker)] != PeonBytecodeMarker.toBytes():
@ -131,7 +138,6 @@ proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): i
    result += 8


-
 proc readLineData(self: Serializer, stream: seq[byte]): int =
    ## Reads line information from a stream
    ## of bytes
@ -142,10 +148,11 @@ proc readLineData(self: Serializer, stream: seq[byte]): int =
        self.chunk.lines.add(int([stream[0], stream[1], stream[2]].fromTriple()))
        result +=  3
        stream = stream[3..^1]
+    doAssert len(self.chunk.lines) == int(size)


-proc readCFIData(self: Serializer, stream: seq[byte]): int =
-    ## Reads Call Frame Information from a stream
+proc readFunctions(self: Serializer, stream: seq[byte]): int =
+    ## Reads the function segment from a stream
    ## of bytes
    let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
    result += 4
@ -153,22 +160,34 @@ proc readCFIData(self: Serializer, stream: seq[byte]): int =
    for i in countup(0, int(size) - 1):
        self.chunk.functions.add(stream[i])
        inc(result)
+    doAssert len(self.chunk.functions) == int(size)


 proc readConstants(self: Serializer, stream: seq[byte]): int =
-    ## Reads the constant table from the given stream 
-    ## of bytes
+    ## Reads the constant table from the given 
+    ## byte sequence
    let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
    result += 4
    var stream = stream[4..^1]
    for i in countup(0, int(size) - 1):
        self.chunk.consts.add(stream[i])
        inc(result)
+    doAssert len(self.chunk.consts) == int(size)
+
+
+proc readModules(self: Serializer, stream: seq[byte]): int =
+    ## Reads module information
+    let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
+    result += 4
+    var stream = stream[4..^1]
+    for i in countup(0, int(size) - 1):
+        self.chunk.modules.add(stream[i])
+        inc(result)
+    doAssert len(self.chunk.modules) == int(size)


 proc readCode(self: Serializer, stream: seq[byte]): int =
-    ## Reads the bytecode from a given stream and writes
-    ## it into the given chunk
+    ## Reads the bytecode from a given byte sequence
    let size = [stream[0], stream[1], stream[2]].fromTriple()
    var stream = stream[3..^1]
    for i in countup(0, int(size) - 1):
@ -178,13 +197,16 @@ proc readCode(self: Serializer, stream: seq[byte]): int =


 proc dumpBytes*(self: Serializer, chunk: Chunk, filename: string): seq[byte] =
-    ## Dumps the given bytecode and file to a sequence of bytes and returns it.
+    ## Dumps the given chunk to a sequence of bytes and returns it.
+    ## The filename argument is for error reporting only, use dumpFile
+    ## to dump bytecode to a file
    self.filename = filename
    self.chunk = chunk
    self.writeHeaders(result)
    self.writeLineData(result)
-    self.writeCFIData(result)
+    self.writeFunctions(result)
    self.writeConstants(result)
+    self.writeModules(result)
    self.writeCode(result)


@ -207,8 +229,9 @@ proc loadBytes*(self: Serializer, stream: seq[byte]): Serialized =
    try:
        stream = stream[self.readHeaders(stream, result)..^1]
        stream = stream[self.readLineData(stream)..^1]
-        stream = stream[self.readCFIData(stream)..^1]
+        stream = stream[self.readFunctions(stream)..^1]
        stream = stream[self.readConstants(stream)..^1]
+        stream = stream[self.readModules(stream)..^1]
        stream = stream[self.readCode(stream)..^1]
    except IndexDefect:
        self.error("truncated bytecode stream")
--- a/src/main.nim
+++ b/src/main.nim
@ -246,6 +246,11 @@ proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints
                    styledEcho fgGreen, "OK"
                else:
                    styledEcho fgRed, "Corrupted"
+                stdout.styledWrite(fgBlue, "\t- Modules segment: ")
+                if serialized.chunk.modules == compiled.modules:
+                    styledEcho fgGreen, "OK"
+                else:
+                    styledEcho fgRed, "Corrupted"    
        if run:
            case backend:
                of PeonBackend.Bytecode:
--- a/tests/gc.pn
+++ b/tests/gc.pn
@ -1,7 +1,7 @@
 import std;


-const max = 50000;
+const max = 500000;

 var x = max;
 var s = "just a test";
Author	SHA1	Message	Date
Mattia Giambirtone	40d0f23135	Fixed various bugs related to lambdas and imports. Added module info section to bytecode dumps	2023-05-22 12:57:38 +02:00
Mattia Giambirtone	20da594116	Merge	2023-05-09 11:00:35 +02:00