Compare commits

...

13 Commits

16 changed files with 1506 additions and 285 deletions

View File

@ -14,7 +14,8 @@ Peon is a multi-paradigm, statically-typed programming language inspired by C, N
features such as automatic type inference, parametrically polymorphic generic types, pure functions, closures, interfaces, single inheritance,
reference types, templates, coroutines, raw pointers and exceptions.
The memory management model is rather simple: a Mark and Sweep garbage collector is employed to reclaim unused memory.
The memory management model is rather simple: a Mark and Sweep garbage collector is employed to reclaim unused memory, although more garbage
collection strategies (such as generational GC or deferred reference counting) are planned to be added in the future.
Peon features a native cooperative concurrency model designed to take advantage of the inherent waiting of typical I/O workloads, without the use of more than one OS thread (wherever possible), allowing for much greater efficiency and a smaller memory footprint. The asynchronous model used forces developers to write code that is both easy to reason about, thanks to the [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) model that is core to peon's async event loop implementation, and works as expected every time (without dropping signals, exceptions, or task return values).
@ -27,21 +28,13 @@ In peon, all objects are first-class (this includes functions, iterators, closur
**Disclaimer 1**: The project is still in its very early days: lots of stuff is not implemented, a work in progress or
otherwise outright broken. Feel free to report bugs!
**Disclaimer 2**: Currently the REPL is very basic (it adds your code to previous input plus a newline, as if it was compiling a new file every time),
because incremental compilation is designed for modules and it doesn't play well with the interactive nature of a REPL session. To show the current state
of the REPL, type `#show` (this will print all the code that has been typed so far), while to reset everything, type `#reset`. You can also type
`#clear` if you want a clean slate to type in, but note that it won't reset the REPL state. If adding a new piece of code causes compilation to fail, the REPL will not add the last piece of code to the input so you can type it again and recompile without having to exit the program and start from scratch. You can move through the code using left/right arrows and go to a new line by pressing Ctrl+Enter. Using the up/down keys on your keyboard
will move through the input history (which is never reset). Also note that UTF-8 is currently unsupported in the REPL (it will be soon though!)
**Disclaimer 3**: Currently, the `std` module has to be _always_ imported explicitly for even the most basic snippets to work. This is because intrinsic types and builtin operators are defined within it: if it is not imported, peon won't even know how to parse `2 + 2` (and even if it could, it would have no idea what the type of the expression would be). You can have a look at the [peon standard library](src/peon/stdlib) to see how the builtins are defined (be aware that they heavily rely on compiler black magic to work) and can even provide your own implementation if you're so inclined.
**Disclaimer 2**: Currently, the `std` module has to be _always_ imported explicitly for even the most basic snippets to work. This is because intrinsic types and builtin operators are defined within it: if it is not imported, peon won't even know how to parse `2 + 2` (and even if it could, it would have no idea what the type of the expression would be). You can have a look at the [peon standard library](src/peon/stdlib) to see how the builtins are defined (be aware that they heavily rely on compiler black magic to work) and can even provide your own implementation if you're so inclined.
### TODO List
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
- User-defined types
- User-defined types
- Function calls ✅
- Control flow (if-then-else, switch) ✅
- Looping (while) ✅
@ -57,7 +50,6 @@ In no particular order, here's a list of stuff that's done/to do (might be incom
- Named scopes/blocks ✅
- Inheritance
- Interfaces
- Indexing operator
- Generics ✅
- Automatic types ✅
- Iterators/Generators
@ -76,12 +68,14 @@ In no particular order, here's a list of stuff that's done/to do (might be incom
Here's a random list of high-level features I would like peon to have and that I think are kinda neat (some may
have been implemented alredady):
- Reference types are not nullable by default (must use `#pragma[nullable]`)
- The `commutative` pragma, which allows to define just one implementation of an operator
and have it become commutative
- Easy C/Nim interop via FFI
- C/C++ backend
- Nim backend
- [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) (must-have!)
- Simple OOP (with multiple dispatch!)
- RTTI, with methods that dispatch at runtime based on the true type of a value
- RTTI, with methods that dispatch at runtime based on the true (aka runtime) type of a value
- Limited compile-time evaluation (embed the Peon VM in the C/C++/Nim backend and use that to execute peon code at compile time)
@ -134,5 +128,7 @@ out for yourself. Fortunately, the process is quite straightforward:
automate this soon, but as of right now the work is all manual (and it's part of the fun, IMHO ;))
__Note__: On Linux, peon will also look into `~/.local/peon/stdlib`
If you've done everything right, you should be able to run `peon` in your terminal and have it drop you into the REPL. Good
luck and have fun!

View File

@ -1,7 +1,8 @@
# Peon - Bytecode Specification
This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
other file-like objects.
other file-like objects. Note that the segments in a bytecode dump appear in the order they are listed
in this document.
## Code Structure
@ -9,12 +10,12 @@ A peon program is compiled into a tightly packed sequence of bytes that contain
the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.
A peon bytecode dump contains:
A peon bytecode file contains the following:
- Constants
- The bytecode itself
- Debugging information
- File and version metadata
- The program's code
- Debugging information (file and version metadata, module info. Optional)
## File Headers
@ -34,7 +35,7 @@ in release builds.
### Line data segment
The line data segment contains information about each instruction in the code segment and associates them
1:1 with a line number in the original source file for easier debugging using run-length encoding. The section's
1:1 with a line number in the original source file for easier debugging using run-length encoding. The segment's
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
in this segment can be decoded as explained in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L29), which is quoted
below:
@ -57,7 +58,7 @@ below:
This segment contains details about each function in the original file. The segment's size is fixed and is encoded at the
beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data in this segment can be decoded as explained
in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L39), which is quoted below:
in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L39), which is quoted below:
```
[...]
@ -74,6 +75,26 @@ in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L39), whic
[...]
```
### Modules segment
This segment contains details about the modules that make up the original source code which produced a given bytecode dump.
The data in this segment can be decoded as explained in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L49), which is quoted below:
```
[...]
## modules contains information about all the peon modules that the compiler has encountered,
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
## is contained within a single binary blob. While this simplifies the implementation and makes
## bytecode files entirely "self-hosted", it also means that the original module information is
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
[...]
```
## Constant segment
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
@ -87,6 +108,6 @@ real-world scenarios it likely won't be.
## Code segment
The code segment contains the linear sequence of bytecode instructions of a peon program. It is to be read directly
and without modifications. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
The code segment contains the linear sequence of bytecode instructions of a peon program to be fed directly to
peon's virtual machine. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
(i.e. a single 24 bit integer). All the instructions are documented [here](../src/frontend/compiler/targgets/bytecode/opcodes.nim)

View File

@ -68,7 +68,8 @@ type
## this system and is not handled
## manually by the VM
bytesAllocated: tuple[total, current: int]
cycles: int
when debugGC or debugAlloc:
cycles: int
nextGC: int
pointers: HashSet[uint64]
PeonVM* = object
@ -93,9 +94,10 @@ type
frames: seq[uint64] # Stores the bottom of stack frames
results: seq[uint64] # Stores function return values
gc: PeonGC # A reference to the VM's garbage collector
breakpoints: seq[uint64] # Breakpoints where we call our debugger
debugNext: bool # Whether to debug the next instruction
lastDebugCommand: string # The last debugging command input by the user
when debugVM:
breakpoints: seq[uint64] # Breakpoints where we call our debugger
debugNext: bool # Whether to debug the next instruction
lastDebugCommand: string # The last debugging command input by the user
# Implementation of peon's memory manager
@ -105,25 +107,17 @@ proc newPeonGC*: PeonGC =
## garbage collector
result.bytesAllocated = (0, 0)
result.nextGC = FirstGC
result.cycles = 0
when debugGC or debugAlloc:
result.cycles = 0
proc collect*(self: var PeonVM)
# Our pointer tagging routines
template tag(p: untyped): untyped = cast[pointer](cast[uint64](p) or (1'u64 shl 63'u64))
template untag(p: untyped): untyped = cast[pointer](cast[uint64](p) and 0x7fffffffffffffff'u64)
template getTag(p: untyped): untyped = (p and (1'u64 shl 63'u64)) == 0
proc reallocate*(self: var PeonVM, p: pointer, oldSize: int, newSize: int): pointer =
## Simple wrapper around realloc with
## built-in garbage collection. Callers
## should keep in mind that the returned
## pointer is tagged (bit 63 is set to 1)
## and should be passed to untag() before
## being dereferenced or otherwise used
## built-in garbage collection
self.gc.bytesAllocated.current += newSize - oldSize
try:
when debugMem:
@ -147,7 +141,7 @@ proc reallocate*(self: var PeonVM, p: pointer, oldSize: int, newSize: int): poin
else:
if self.gc.bytesAllocated.current >= self.gc.nextGC:
self.collect()
result = tag(realloc(untag(p), newSize))
result = realloc(p, newSize)
except NilAccessDefect:
stderr.writeLine("Peon: could not manage memory, segmentation fault")
quit(139) # For now, there's not much we can do if we can't get the memory we need, so we exit
@ -178,12 +172,12 @@ proc allocate(self: var PeonVM, kind: ObjectKind, size: typedesc, count: int): p
## Allocates an object on the heap and adds its
## location to the internal pointer list of the
## garbage collector
result = cast[ptr HeapObject](untag(self.reallocate(nil, 0, sizeof(HeapObject))))
result = cast[ptr HeapObject](self.reallocate(nil, 0, sizeof(HeapObject)))
setkind(result[], kind, kind)
result.marked = false
case kind:
of String:
result.str = cast[ptr UncheckedArray[char]](untag(self.reallocate(nil, 0, sizeof(size) * count)))
result.str = cast[ptr UncheckedArray[char]](self.reallocate(nil, 0, sizeof(size) * count))
result.len = count
else:
discard # TODO
@ -213,30 +207,33 @@ proc markRoots(self: var PeonVM): HashSet[ptr HeapObject] =
# Unlike what Bob does in his book, we keep track
# of objects another way, mainly due to the difference
# of our respective designs. Specifically, our VM only
# handles a single type (uint64) while Lox stores all objects
# in heap-allocated structs (which is convenient, but slow).
# What we do is store the pointers to the objects we allocated in
# a hash set and then, at collection time, do a set difference
# between the reachable objects and the whole set and discard
# whatever is left; Unfortunately, this means that if a primitive
# object's value happens to collide with an active pointer the GC
# will mistakenly assume the object to be reachable, potentially
# leading to a nasty memory leak. Let's just hope a 48+ bit address
# space makes this occurrence rare enough not to be a problem
# handles a single type (uint64), while Lox has a stack
# of heap-allocated structs (which is convenient, but slow).
# The previous implementation would just store all pointers
# allocated by us in a hash set and then check if any source
# of roots contained any of the integer values that it was
# keeping track of, but this meant that if a primitive object's
# value happened to collide with an active pointer the GC would
# mistakenly assume the object was reachable, potentially leading
# to a nasty memory leak. The current implementation uses pointer
# tagging: we know that modern CPUs never use bit 63 in addresses,
# so if it set we know it cannot be a pointer, and if it is set we
# just need to check if it's in our list of active addresses or not.
# This should resolve the potential memory leak (hopefully)
# What we do instead is store all pointers allocated by us
# in a hash set and then check if any source of roots contained
# any of the integer values that we're keeping track of. Note
# that this means that if a primitive object's value happens to
# collide with an active pointer, the GC will mistakenly assume
# the object to be reachable (potentially leading to a nasty
# memory leak). Hopefully, in a 64-bit address space, this
# occurrence is rare enough for us to ignore
var result = initHashSet[uint64](self.gc.pointers.len())
for obj in self.calls:
if not obj.getTag():
continue
if obj in self.gc.pointers:
result.incl(obj)
for obj in self.operands:
if not obj.getTag():
continue
if obj in self.gc.pointers:
result.incl(obj)
result.incl(obj)
var obj: ptr HeapObject
for p in result:
obj = cast[ptr HeapObject](p)
@ -301,7 +298,6 @@ proc sweep(self: var PeonVM) =
## during the mark phase.
when debugGC:
echo "DEBUG - GC: Beginning sweeping phase"
when debugGC:
var count = 0
var current: ptr HeapObject
var freed: HashSet[uint64]
@ -380,19 +376,19 @@ proc newPeonVM*: PeonVM =
# Getters for singleton types
{.push inline.}
proc getNil*(self: var PeonVM): uint64 = self.cache[2]
func getNil*(self: var PeonVM): uint64 = self.cache[2]
proc getBool*(self: var PeonVM, value: bool): uint64 =
func getBool*(self: var PeonVM, value: bool): uint64 =
if value:
return self.cache[1]
return self.cache[0]
proc getInf*(self: var PeonVM, positive: bool): uint64 =
func getInf*(self: var PeonVM, positive: bool): uint64 =
if positive:
return self.cache[3]
return self.cache[4]
proc getNan*(self: var PeonVM): uint64 = self.cache[5]
func getNan*(self: var PeonVM): uint64 = self.cache[5]
# Thanks to nim's *genius* idea of making x > y a template
@ -402,11 +398,11 @@ proc getNan*(self: var PeonVM): uint64 = self.cache[5]
# and https://github.com/nim-lang/Nim/issues/10425 and try not to
# bang your head against the nearest wall), we need a custom operator
# that preserves the natural order of evaluation
proc `!>`[T](a, b: T): auto {.inline.} =
func `!>`[T](a, b: T): auto =
b < a
proc `!>=`[T](a, b: T): auto {.inline, used.} =
proc `!>=`[T](a, b: T): auto {.used.} =
b <= a
@ -414,26 +410,26 @@ proc `!>=`[T](a, b: T): auto {.inline, used.} =
# that go through the (get|set|peek)c wrappers are frame-relative,
# meaning that the given index is added to the current stack frame's
# bottom to obtain an absolute stack index
proc push(self: var PeonVM, obj: uint64) =
func push(self: var PeonVM, obj: uint64) =
## Pushes a value object onto the
## operand stack
self.operands.add(obj)
proc pop(self: var PeonVM): uint64 =
func pop(self: var PeonVM): uint64 =
## Pops a value off the operand
## stack and returns it
return self.operands.pop()
proc peekb(self: PeonVM, distance: BackwardsIndex = ^1): uint64 =
func peekb(self: PeonVM, distance: BackwardsIndex = ^1): uint64 =
## Returns the value at the given (backwards)
## distance from the top of the operand stack
## without consuming it
return self.operands[distance]
proc peek(self: PeonVM, distance: int = 0): uint64 =
func peek(self: PeonVM, distance: int = 0): uint64 =
## Returns the value at the given
## distance from the top of the
## operand stack without consuming it
@ -442,33 +438,33 @@ proc peek(self: PeonVM, distance: int = 0): uint64 =
return self.operands[self.operands.high() + distance]
proc pushc(self: var PeonVM, val: uint64) =
func pushc(self: var PeonVM, val: uint64) =
## Pushes a value onto the
## call stack
self.calls.add(val)
proc popc(self: var PeonVM): uint64 =
func popc(self: var PeonVM): uint64 =
## Pops a value off the call
## stack and returns it
return self.calls.pop()
proc peekc(self: PeonVM, distance: int = 0): uint64 {.used.} =
func peekc(self: PeonVM, distance: int = 0): uint64 {.used.} =
## Returns the value at the given
## distance from the top of the
## call stack without consuming it
return self.calls[self.calls.high() + distance]
proc getc(self: PeonVM, idx: int): uint64 =
func getc(self: PeonVM, idx: int): uint64 =
## Getter method that abstracts
## indexing our call stack through
## stack frames
return self.calls[idx.uint64 + self.frames[^1]]
proc setc(self: var PeonVM, idx: int, val: uint64) =
func setc(self: var PeonVM, idx: int, val: uint64) =
## Setter method that abstracts
## indexing our call stack through
## stack frames
@ -700,7 +696,7 @@ proc dispatch*(self: var PeonVM) =
while true:
{.computedgoto.} # https://nim-lang.org/docs/manual.html#pragmas-computedgoto-pragma
when debugVM:
if self.ip in self.breakpoints or self.breakpoints.len() == 0 or self.debugNext:
if self.ip in self.breakpoints or self.debugNext:
self.debug()
instruction = OpCode(self.readByte())
case instruction:
@ -768,6 +764,10 @@ proc dispatch*(self: var PeonVM) =
# not needed there anymore
discard self.pop()
discard self.pop()
of ReplExit:
# Preserves the VM's state for the next
# execution. Used in the REPL
return
of Return:
# Returns from a function.
# Every peon program is wrapped
@ -829,9 +829,13 @@ proc dispatch*(self: var PeonVM) =
# not a great idea)
self.pushc(self.pop())
of LoadVar:
# Pushes a variable from the call stack
# Pushes a local variable from the call stack
# onto the operand stack
self.push(self.getc(self.readLong().int))
of LoadGlobal:
# Pushes a global variable from the call stack
# onto the operand stack
self.push(self.calls[self.readLong().int])
of NoOp:
# Does nothing
continue
@ -1002,6 +1006,8 @@ proc dispatch*(self: var PeonVM) =
self.push(self.getBool(cast[float32](self.pop()) !>= cast[float32](self.pop())))
of Float32LessOrEqual:
self.push(self.getBool(cast[float32](self.pop()) <= cast[float32](self.pop())))
of Identity:
self.push(cast[uint64](self.pop() == self.pop()))
# Print opcodes
of PrintInt64:
echo cast[int64](self.pop())
@ -1050,23 +1056,41 @@ proc dispatch*(self: var PeonVM) =
discard
proc run*(self: var PeonVM, chunk: Chunk, breakpoints: seq[uint64] = @[]) =
proc run*(self: var PeonVM, chunk: Chunk, breakpoints: seq[uint64] = @[], repl: bool = false) =
## Executes a piece of Peon bytecode
self.chunk = chunk
self.frames = @[]
self.calls = @[]
self.operands = @[]
self.breakpoints = breakpoints
self.results = @[]
self.ip = 0
self.lastDebugCommand = ""
when debugVM:
self.breakpoints = breakpoints
self.lastDebugCommand = ""
try:
self.dispatch()
except NilAccessDefect:
stderr.writeLine("Memory Access Violation: SIGSEGV")
quit(1)
if not repl:
# We clean up after ourselves!
self.collect()
proc resume*(self: var PeonVM, chunk: Chunk) =
## Resumes execution of the given chunk (which
## may have changed since the last call to run()).
## No other state mutation occurs and all stacks as
## well as other metadata are left intact. This should
## not be used directly unless you know what you're
## doing, as incremental compilation support is very
## experimental and highly unstable
self.chunk = chunk
try:
self.dispatch()
except NilAccessDefect:
stderr.writeLine("Memory Access Violation: SIGSEGV")
quit(1)
# We clean up after ourselves!
self.collect()
{.pop.}

View File

@ -15,14 +15,14 @@
import strformat
# These variables can be tweaked to debug and test various components of the toolchain
const debugLexer* {.booldefine.} = false # Print the tokenizer's output
const debugParser* {.booldefine.} = false # Print the AST generated by the parser
const debugCompiler* {.booldefine.} = false # Disassemble and/or print the code generated by the compiler
var debugLexer* = false # Print the tokenizer's output
var debugParser* = false # Print the AST generated by the parser
var debugCompiler* = false # Disassemble and/or print the code generated by the compiler
const debugVM* {.booldefine.} = false # Enable the runtime debugger in the bytecode VM
const debugGC* {.booldefine.} = false # Debug the Garbage Collector (extremely verbose)
const debugAlloc* {.booldefine.} = false # Trace object allocation (extremely verbose)
const debugMem* {.booldefine.} = false # Debug the memory allocator (extremely verbose)
const debugSerializer* {.booldefine.} = false # Validate the bytecode serializer's output
var debugSerializer* = false # Validate the bytecode serializer's output
const debugStressGC* {.booldefine.} = false # Make the GC run a collection at every allocation (VERY SLOW!)
const debugMarkGC* {.booldefine.} = false # Trace the marking phase object by object (extremely verbose)
const PeonBytecodeMarker* = "PEON_BYTECODE" # Magic value at the beginning of bytecode files
@ -70,8 +70,11 @@ Options
yes/on and no/off
--noWarn Disable a specific warning (for example, --noWarn:unusedVariable)
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
--backend Select the compilation backend (valid values are: 'c', 'cpp' and 'bytecode'). Note
--backend Select the compilation backend (valid values are: 'c' and 'bytecode'). Note
that the REPL always uses the bytecode target. Defaults to 'bytecode'
-o, --output Rename the output file with this value (with --backend:bytecode, a '.pbc' extension
is added if not already present)
--debug-dump Debug the bytecode serializer. Only makes sense with --backend:bytecode
--debug-lexer Debug the peon lexer
--debug-parser Debug the peon parser
"""

View File

@ -12,19 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import std/tables
import std/strformat
import std/algorithm
@ -52,7 +40,7 @@ export ast, token, symbols, config, errors
type
PeonBackend* = enum
## An enumeration of the peon backends
Bytecode, NativeC, NativeCpp
Bytecode, NativeC
PragmaKind* = enum
## An enumeration of pragma types
@ -146,7 +134,7 @@ type
node*: Declaration
# Who is this name exported to? (Only makes sense if isPrivate
# equals false)
exportedTo*: HashSet[Name]
exportedTo*: HashSet[string]
# Has the compiler generated this name internally or
# does it come from user code?
isReal*: bool
@ -224,7 +212,7 @@ type
# The module importing us, if any
parentModule*: Name
# Currently imported modules
modules*: HashSet[Name]
modules*: HashSet[string]
TypedNode* = ref object
## A wapper for AST nodes
@ -353,11 +341,9 @@ proc step*(self: Compiler): ASTNode {.inline.} =
# and can be reused across multiple compilation backends
proc resolve*(self: Compiler, name: string): Name =
## Traverses all existing namespaces and returns
## the first object with the given name. Returns
## nil when the name can't be found. Note that
## when a type or function declaration is first
## resolved, it is also compiled on-the-fly
## Traverses all existing namespaces in reverse order
## and returns the first object with the given name.
## Returns nil when the name can't be found
for obj in reversed(self.names):
if obj.ident.token.lexeme == name:
if obj.owner.path != self.currentModule.path:
@ -368,11 +354,12 @@ proc resolve*(self: Compiler, name: string): Name =
# module, so we definitely can't
# use it
continue
elif self.currentModule in obj.exportedTo:
elif self.currentModule.path in obj.exportedTo:
# The name is public in its owner
# module and said module has explicitly
# exported it to us: we can use it
result = obj
result.resolved = true
break
# If the name is public but not exported in
# its owner module, then we act as if it's
@ -382,6 +369,7 @@ proc resolve*(self: Compiler, name: string): Name =
# might not want to also have access to C's and D's
# names as they might clash with its own stuff)
continue
# We own this name, so we can definitely access it
result = obj
result.resolved = true
break
@ -725,7 +713,7 @@ method findByName*(self: Compiler, name: string): seq[Name] =
for obj in reversed(self.names):
if obj.ident.token.lexeme == name:
if obj.owner.path != self.currentModule.path:
if obj.isPrivate or self.currentModule notin obj.exportedTo:
if obj.isPrivate or self.currentModule.path notin obj.exportedTo:
continue
result.add(obj)
@ -739,11 +727,13 @@ method findInModule*(self: Compiler, name: string, module: Name): seq[Name] =
## the current one or not
if name == "":
for obj in reversed(self.names):
if not obj.isPrivate and obj.owner == module:
if obj.owner.isNil():
continue
if not obj.isPrivate and obj.owner.path == module.path:
result.add(obj)
else:
for obj in self.findInModule("", module):
if obj.ident.token.lexeme == name and self.currentModule in obj.exportedTo:
if obj.ident.token.lexeme == name and self.currentModule.path in obj.exportedTo:
result.add(obj)
@ -1046,7 +1036,7 @@ proc declare*(self: Compiler, node: ASTNode): Name {.discardable.} =
break
if name.ident.token.lexeme != declaredName:
continue
if name.owner != n.owner and (name.isPrivate or n.owner notin name.exportedTo):
if name.owner != n.owner and (name.isPrivate or n.owner.path notin name.exportedTo):
continue
if name.kind in [NameKind.Var, NameKind.Module, NameKind.CustomType, NameKind.Enum]:
if name.depth < n.depth:

File diff suppressed because it is too large Load Diff

View File

@ -46,10 +46,21 @@ type
## - After that follows the argument count as a 1 byte integer
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
## its size as a 2-byte integer
## modules contains information about all the peon modules that the compiler has encountered,
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
## is contained within a single binary blob. While this simplifies the implementation and makes
## bytecode files entirely "self-hosted", it also means that the original module information is
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
consts*: seq[uint8]
code*: seq[uint8]
lines*: seq[int]
functions*: seq[uint8]
modules*: seq[uint8]
OpCode* {.pure.} = enum
## Enum of Peon's bytecode opcodes
@ -136,6 +147,7 @@ type
Float32GreaterOrEqual,
Float32LessOrEqual,
LogicalNot,
Identity, # Pointer equality
## Print opcodes
PrintInt64,
PrintUInt64,
@ -188,7 +200,9 @@ type
PushC, # Pop off the operand stack onto the call stack
SysClock64, # Pushes the output of a monotonic clock on the stack
LoadTOS, # Pushes the top of the call stack onto the operand stack
DupTop # Duplicates the top of the operand stack onto the operand stack
DupTop, # Duplicates the top of the operand stack onto the operand stack
ReplExit, # Exits the VM immediately, leaving its state intact. Used in the REPL
LoadGlobal # Loads a global variable
# We group instructions by their operation/operand types for easier handling when debugging
@ -267,7 +281,9 @@ const simpleInstructions* = {Return, LoadNil,
Float32LessThan,
Float32GreaterOrEqual,
Float32LessOrEqual,
DupTop
DupTop,
ReplExit,
Identity
}
# Constant instructions are instructions that operate on the bytecode constant table
@ -280,7 +296,7 @@ const constantInstructions* = {LoadInt64, LoadUInt64,
# Stack triple instructions operate on the stack at arbitrary offsets and pop arguments off of it in the form
# of 24 bit integers
const stackTripleInstructions* = {StoreVar, LoadVar, }
const stackTripleInstructions* = {StoreVar, LoadVar, LoadGlobal}
# Stack double instructions operate on the stack at arbitrary offsets and pop arguments off of it in the form
# of 16 bit integers

View File

@ -461,7 +461,8 @@ proc handleBuiltinFunction(self: BytecodeCompiler, fn: Type, args: seq[Expressio
"PrintString": PrintString,
"SysClock64": SysClock64,
"LogicalNot": LogicalNot,
"NegInf": LoadNInf
"NegInf": LoadNInf,
"Identity": Identity
}.to_table()
if fn.builtinOp == "print":
let typ = self.inferOrError(args[0])
@ -565,6 +566,8 @@ proc endScope(self: BytecodeCompiler) =
var names: seq[Name] = @[]
var popCount = 0
for name in self.names:
if self.replMode and name.depth == 0:
continue
# We only pop names in scopes deeper than ours
if name.depth > self.depth:
if name.depth == 0 and not self.isMainModule:
@ -999,9 +1002,12 @@ proc terminateProgram(self: BytecodeCompiler, pos: int) =
## Utility to terminate a peon program
self.patchForwardDeclarations()
self.endScope()
self.emitByte(OpCode.Return, self.peek().token.line)
self.emitByte(0, self.peek().token.line) # Entry point has no return value (TODO: Add easter eggs, cuz why not)
self.patchReturnAddress(pos)
if self.replMode:
self.emitByte(ReplExit, self.peek().token.line)
else:
self.emitByte(OpCode.Return, self.peek().token.line)
self.emitByte(0, self.peek().token.line) # Entry point has no return value
self.patchReturnAddress(pos)
proc beginProgram(self: BytecodeCompiler): int =
@ -1228,10 +1234,14 @@ method identifier(self: BytecodeCompiler, node: IdentExpr, name: Name = nil, com
if not s.belongsTo.isNil() and s.belongsTo.valueType.fun.kind == funDecl and FunDecl(s.belongsTo.valueType.fun).isTemplate:
discard
else:
# Loads a regular variable from the current frame
self.emitByte(LoadVar, s.ident.token.line)
# No need to check for -1 here: we already did a nil check above!
self.emitBytes(s.position.toTriple(), s.ident.token.line)
if s.depth > 0:
# Loads a regular variable from the current frame
self.emitByte(LoadVar, s.ident.token.line)
# No need to check for -1 here: we already did a nil check above!
self.emitBytes(s.position.toTriple(), s.ident.token.line)
else:
self.emitByte(LoadGlobal, s.ident.token.line)
self.emitBytes(s.position.toTriple(), s.ident.token.line)
method assignment(self: BytecodeCompiler, node: ASTNode, compile: bool = true): Type {.discardable.} =
@ -1468,8 +1478,9 @@ method lambdaExpr(self: BytecodeCompiler, node: LambdaExpr, compile: bool = true
line: node.token.line,
kind: NameKind.Function,
belongsTo: function,
isReal: true)
if compile and node notin self.lambdas:
isReal: true,
)
if compile and node notin self.lambdas and not node.body.isNil():
self.lambdas.add(node)
let jmp = self.emitJump(JumpForwards, node.token.line)
if BlockStmt(node.body).code.len() == 0:
@ -1677,7 +1688,7 @@ proc importStmt(self: BytecodeCompiler, node: ImportStmt, compile: bool = true)
# Importing a module automatically exports
# its public names to us
for name in self.findInModule("", module):
name.exportedTo.incl(self.currentModule)
name.exportedTo.incl(self.currentModule.path)
except IOError:
self.error(&"could not import '{module.ident.token.lexeme}': {getCurrentExceptionMsg()}")
except OSError:
@ -1695,22 +1706,22 @@ proc exportStmt(self: BytecodeCompiler, node: ExportStmt, compile: bool = true)
var name = self.resolveOrError(node.name)
if name.isPrivate:
self.error("cannot export private names")
name.exportedTo.incl(self.parentModule)
name.exportedTo.incl(self.parentModule.path)
case name.kind:
of NameKind.Module:
# We need to export everything
# this module defines!
for name in self.findInModule("", name):
name.exportedTo.incl(self.parentModule)
name.exportedTo.incl(self.parentModule.path)
of NameKind.Function:
# Only exporting a single function (or, well
# all of its implementations)
for name in self.findByName(name.ident.token.lexeme):
if name.kind != NameKind.Function:
continue
name.exportedTo.incl(self.parentModule)
name.exportedTo.incl(self.parentModule.path)
else:
discard
self.error("unsupported export type")
proc breakStmt(self: BytecodeCompiler, node: BreakStmt) =
@ -1972,12 +1983,12 @@ proc funDecl(self: BytecodeCompiler, node: FunDecl, name: Name) =
self.patchJump(jump)
self.endScope()
# Terminates the function's context
let stop = self.chunk.code.len().toTriple()
self.emitByte(OpCode.Return, self.peek().token.line)
if hasVal:
self.emitByte(1, self.peek().token.line)
else:
self.emitByte(0, self.peek().token.line)
let stop = self.chunk.code.len().toTriple()
self.chunk.functions[idx] = stop[0]
self.chunk.functions[idx + 1] = stop[1]
self.chunk.functions[idx + 2] = stop[2]
@ -2046,26 +2057,32 @@ proc compile*(self: BytecodeCompiler, ast: seq[Declaration], file: string, lines
self.chunk = newChunk()
else:
self.chunk = chunk
self.ast = ast
self.file = file
self.depth = 0
self.currentFunction = nil
self.current = 0
self.lines = lines
self.source = source
if self.replMode:
self.ast &= ast
self.source &= "\n" & source
self.lines &= lines
else:
self.ast = ast
self.current = 0
self.stackIndex = 1
self.lines = lines
self.source = source
self.isMainModule = isMainModule
self.disabledWarnings = disabledWarnings
self.showMismatches = showMismatches
self.mode = mode
self.stackIndex = 1
let start = self.chunk.code.len()
if not incremental:
self.jumps = @[]
let pos = self.beginProgram()
let idx = self.stackIndex
self.stackIndex = idx
while not self.done():
self.declaration(Declaration(self.step()))
self.terminateProgram(pos)
# TODO: REPL is broken, we need a new way to make
# incremental compilation resume from where it stopped!
result = self.chunk
@ -2083,7 +2100,7 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
break
elif i == searchPath.high():
self.error(&"""could not import '{path}': module not found""")
if self.modules.contains(module):
if self.modules.contains(module.path):
return
let source = readFile(path)
let current = self.current
@ -2094,13 +2111,23 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
let currentModule = self.currentModule
let mainModule = self.isMainModule
let parentModule = self.parentModule
let replMode = self.replMode
self.replMode = false
self.parentModule = currentModule
self.currentModule = module
let start = self.chunk.code.len()
discard self.compile(self.parser.parse(self.lexer.lex(source, path),
path, self.lexer.getLines(),
self.lexer.getSource(), persist=true),
path, self.lexer.getLines(), self.lexer.getSource(), chunk=self.chunk, incremental=true,
isMainModule=false, self.disabledWarnings, self.showMismatches, self.mode)
# Mark the end of a new module
self.chunk.modules.extend(start.toTriple())
self.chunk.modules.extend(self.chunk.code.high().toTriple())
# I swear to god if someone ever creates a peon module with a name that's
# longer than 2^16 bytes I will hit them with a metal pipe. Mark my words
self.chunk.modules.extend(self.currentModule.ident.token.lexeme.len().toDouble())
self.chunk.modules.extend(self.currentModule.ident.token.lexeme.toBytes())
module.file = path
# No need to save the old scope depth: import statements are
# only allowed at the top level!
@ -2111,6 +2138,7 @@ proc compileModule(self: BytecodeCompiler, module: Name) =
self.currentModule = currentModule
self.isMainModule = mainModule
self.parentModule = parentModule
self.replMode = replMode
self.lines = lines
self.source = src
self.modules.incl(module)
self.modules.incl(module.path)

View File

@ -22,12 +22,15 @@ import std/terminal
type
Function = ref object
start, stop, bottom, argc: int
Function = object
start, stop, argc: int
name: string
Module = object
start, stop: int
name: string
started, stopped: bool
Debugger* = ref object
chunk: Chunk
modules: seq[Module]
functions: seq[Function]
current: int
@ -66,21 +69,38 @@ proc checkFunctionStart(self: Debugger, n: int) =
## Checks if a function begins at the given
## bytecode offset
for i, e in self.functions:
if n == e.start and not (e.started or e.stopped):
e.started = true
# Avoids duplicate output
if n == e.start:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function Start ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $e.start
styledEcho fgGreen, "\t- End offset: ", fgYellow, $e.stop
styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc
styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc, "\n"
proc checkFunctionEnd(self: Debugger, n: int) =
## Checks if a function ends at the given
## bytecode offset
for i, e in self.functions:
if n == e.stop and e.started and not e.stopped:
e.stopped = true
if n == e.stop:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function End ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
proc checkModuleStart(self: Debugger, n: int) =
## Checks if a module begins at the given
## bytecode offset
for i, m in self.modules:
if m.start == n:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module Start ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $m.start
styledEcho fgGreen, "\t- End offset: ", fgYellow, $m.stop, "\n"
proc checkModuleEnd(self: Debugger, n: int) =
## Checks if a module ends at the given
## bytecode offset
for i, m in self.modules:
if m.stop == n:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module End ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
proc simpleInstruction(self: Debugger, instruction: OpCode) =
@ -94,9 +114,6 @@ proc simpleInstruction(self: Debugger, instruction: OpCode) =
else:
stdout.styledWriteLine(fgYellow, "No")
self.current += 1
self.checkFunctionEnd(self.current - 2)
self.checkFunctionEnd(self.current - 1)
self.checkFunctionEnd(self.current)
proc stackTripleInstruction(self: Debugger, instruction: OpCode) =
@ -168,20 +185,27 @@ proc jumpInstruction(self: Debugger, instruction: OpCode) =
self.current += 4
while self.chunk.code[self.current] == NoOp.uint8:
inc(self.current)
for i in countup(orig, self.current + 1):
self.checkFunctionStart(i)
proc disassembleInstruction*(self: Debugger) =
## Takes one bytecode instruction and prints it
let opcode = OpCode(self.chunk.code[self.current])
self.checkModuleStart(self.current)
self.checkFunctionStart(self.current)
printDebug("Offset: ")
stdout.styledWriteLine(fgYellow, $(self.current))
printDebug("Line: ")
stdout.styledWriteLine(fgYellow, &"{self.chunk.getLine(self.current)}")
var opcode = OpCode(self.chunk.code[self.current])
case opcode:
of simpleInstructions:
self.simpleInstruction(opcode)
# Functions (and modules) only have a single return statement at the
# end of their body, so we never execute this more than once per module/function
if opcode == Return:
# -2 to skip the hardcoded argument to return
# and the increment by simpleInstruction()
self.checkFunctionEnd(self.current - 2)
self.checkModuleEnd(self.current - 1)
of constantInstructions:
self.constantInstruction(opcode)
of stackDoubleInstructions:
@ -197,7 +221,9 @@ proc disassembleInstruction*(self: Debugger) =
else:
echo &"DEBUG - Unknown opcode {opcode} at index {self.current}"
self.current += 1
proc parseFunctions(self: Debugger) =
## Parses function information in the chunk
@ -206,7 +232,7 @@ proc parseFunctions(self: Debugger) =
name: string
idx = 0
size = 0
while idx < len(self.chunk.functions) - 1:
while idx < self.chunk.functions.high():
start = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
idx += 3
stop = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
@ -220,15 +246,36 @@ proc parseFunctions(self: Debugger) =
self.functions.add(Function(start: start, stop: stop, argc: argc, name: name))
proc parseModules(self: Debugger) =
## Parses module information in the chunk
var
start, stop: int
name: string
idx = 0
size = 0
while idx < self.chunk.modules.high():
start = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
idx += 3
stop = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
idx += 3
size = int([self.chunk.modules[idx], self.chunk.modules[idx + 1]].fromDouble())
idx += 2
name = self.chunk.modules[idx..<idx + size].fromBytes()
inc(idx, size)
self.modules.add(Module(start: start, stop: stop, name: name))
proc disassembleChunk*(self: Debugger, chunk: Chunk, name: string) =
## Takes a chunk of bytecode and prints it
self.chunk = chunk
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====\n"
self.current = 0
self.parseFunctions()
self.parseModules()
while self.current < self.chunk.code.len:
self.disassembleInstruction()
echo ""
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ===="

View File

@ -64,7 +64,8 @@ proc newSerializer*(self: Serializer = nil): Serializer =
proc writeHeaders(self: Serializer, stream: var seq[byte]) =
## Writes the Peon bytecode headers in-place into a byte stream
## Writes the Peon bytecode headers in-place into the
## given byte sequence
stream.extend(PeonBytecodeMarker.toBytes())
stream.add(byte(PEON_VERSION.major))
stream.add(byte(PEON_VERSION.minor))
@ -77,25 +78,31 @@ proc writeHeaders(self: Serializer, stream: var seq[byte]) =
proc writeLineData(self: Serializer, stream: var seq[byte]) =
## Writes line information for debugging
## bytecode instructions
## bytecode instructions to the given byte
## sequence
stream.extend(len(self.chunk.lines).toQuad())
for b in self.chunk.lines:
stream.extend(b.toTriple())
proc writeCFIData(self: Serializer, stream: var seq[byte]) =
## Writes Call Frame Information for debugging
## functions
proc writeFunctions(self: Serializer, stream: var seq[byte]) =
## Writes debug info about functions to the
## given byte sequence
stream.extend(len(self.chunk.functions).toQuad())
stream.extend(self.chunk.functions)
proc writeConstants(self: Serializer, stream: var seq[byte]) =
## Writes the constants table in-place into the
## given stream
## byte sequence
stream.extend(self.chunk.consts.len().toQuad())
for constant in self.chunk.consts:
stream.add(constant)
stream.extend(self.chunk.consts)
proc writeModules(self: Serializer, stream: var seq[byte]) =
## Writes module information to the given stream
stream.extend(self.chunk.modules.len().toQuad())
stream.extend(self.chunk.modules)
proc writeCode(self: Serializer, stream: var seq[byte]) =
@ -106,7 +113,7 @@ proc writeCode(self: Serializer, stream: var seq[byte]) =
proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): int =
## Reads the bytecode headers from a given stream
## Reads the bytecode headers from a given sequence
## of bytes
var stream = stream
if stream[0..<len(PeonBytecodeMarker)] != PeonBytecodeMarker.toBytes():
@ -131,7 +138,6 @@ proc readHeaders(self: Serializer, stream: seq[byte], serialized: Serialized): i
result += 8
proc readLineData(self: Serializer, stream: seq[byte]): int =
## Reads line information from a stream
## of bytes
@ -142,10 +148,11 @@ proc readLineData(self: Serializer, stream: seq[byte]): int =
self.chunk.lines.add(int([stream[0], stream[1], stream[2]].fromTriple()))
result += 3
stream = stream[3..^1]
doAssert len(self.chunk.lines) == int(size)
proc readCFIData(self: Serializer, stream: seq[byte]): int =
## Reads Call Frame Information from a stream
proc readFunctions(self: Serializer, stream: seq[byte]): int =
## Reads the function segment from a stream
## of bytes
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
@ -153,22 +160,34 @@ proc readCFIData(self: Serializer, stream: seq[byte]): int =
for i in countup(0, int(size) - 1):
self.chunk.functions.add(stream[i])
inc(result)
doAssert len(self.chunk.functions) == int(size)
proc readConstants(self: Serializer, stream: seq[byte]): int =
## Reads the constant table from the given stream
## of bytes
## Reads the constant table from the given
## byte sequence
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.consts.add(stream[i])
inc(result)
doAssert len(self.chunk.consts) == int(size)
proc readModules(self: Serializer, stream: seq[byte]): int =
## Reads module information
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.modules.add(stream[i])
inc(result)
doAssert len(self.chunk.modules) == int(size)
proc readCode(self: Serializer, stream: seq[byte]): int =
## Reads the bytecode from a given stream and writes
## it into the given chunk
## Reads the bytecode from a given byte sequence
let size = [stream[0], stream[1], stream[2]].fromTriple()
var stream = stream[3..^1]
for i in countup(0, int(size) - 1):
@ -178,13 +197,16 @@ proc readCode(self: Serializer, stream: seq[byte]): int =
proc dumpBytes*(self: Serializer, chunk: Chunk, filename: string): seq[byte] =
## Dumps the given bytecode and file to a sequence of bytes and returns it.
## Dumps the given chunk to a sequence of bytes and returns it.
## The filename argument is for error reporting only, use dumpFile
## to dump bytecode to a file
self.filename = filename
self.chunk = chunk
self.writeHeaders(result)
self.writeLineData(result)
self.writeCFIData(result)
self.writeFunctions(result)
self.writeConstants(result)
self.writeModules(result)
self.writeCode(result)
@ -207,8 +229,9 @@ proc loadBytes*(self: Serializer, stream: seq[byte]): Serialized =
try:
stream = stream[self.readHeaders(stream, result)..^1]
stream = stream[self.readLineData(stream)..^1]
stream = stream[self.readCFIData(stream)..^1]
stream = stream[self.readFunctions(stream)..^1]
stream = stream[self.readConstants(stream)..^1]
stream = stream[self.readModules(stream)..^1]
stream = stream[self.readCode(stream)..^1]
except IndexDefect:
self.error("truncated bytecode stream")

View File

@ -16,6 +16,7 @@
import std/strformat
import std/strutils
import std/tables
import std/os
@ -31,9 +32,6 @@ export token, ast, errors
type
LoopContext {.pure.} = enum
Loop, None
Precedence {.pure.} = enum
## Operator precedence
## clearly stolen from
@ -66,18 +64,16 @@ type
# Only meaningful for parse errors
file: string
# The list of tokens representing
# the source code to be parsed.
# In most cases, those will come
# from the builtin lexer, but this
# behavior is not enforced and the
# tokenizer is entirely separate from
# the parser
# the source code to be parsed
tokens: seq[Token]
# Little internal attribute that tells
# us if we're inside a loop or not. This
# allows us to detect errors like break
# being used outside loops
currentLoop: LoopContext
# Just like scope depth tells us how
# many nested scopes are above us, the
# loop depth tells us how many nested
# loops are above us. It's just a simple
# way of statically detecting stuff like
# the break statement being used outside
# loops. Maybe a bit overkill for a parser?
loopDepth: int
# Stores the current function
# being parsed. This is a reference
# to either a FunDecl or LambdaExpr
@ -96,8 +92,13 @@ type
lines: seq[tuple[start, stop: int]]
# The source of the current module
source: string
# Keeps track of imported modules
modules: seq[tuple[name: string, loaded: bool]]
# Keeps track of imported modules.
# The key is the module's fully qualified
# path, while the boolean indicates whether
# it has been fully loaded. This is useful
# to avoid importing a module twice and to
# detect recursive dependency cycles
modules: TableRef[string, bool]
ParseError* = ref object of PeonException
## A parsing exception
parser*: Parser
@ -140,7 +141,7 @@ proc newOperatorTable: OperatorTable =
result.tokens = @[]
for prec in Precedence:
result.precedence[prec] = @[]
# These operators are currently not built-in
# These operators are currently hardcoded
# due to compiler limitations
result.addOperator("=")
result.addOperator(".")
@ -161,11 +162,12 @@ proc newParser*: Parser =
result.file = ""
result.tokens = @[]
result.currentFunction = nil
result.currentLoop = LoopContext.None
result.loopDepth = 0
result.scopeDepth = 0
result.operators = newOperatorTable()
result.tree = @[]
result.source = ""
result.modules = newTable[string, bool]()
# Public getters for improved error formatting
@ -180,7 +182,7 @@ template endOfLine(msg: string, tok: Token = nil) = self.expect(Semicolon, msg,
proc peek(self: Parser, distance: int = 0): Token =
proc peek(self: Parser, distance: int = 0): Token {.inline.} =
## Peeks at the token at the given distance.
## If the distance is out of bounds, an EOF
## token is returned. A negative distance may
@ -201,7 +203,7 @@ proc done(self: Parser): bool {.inline.} =
result = self.peek().kind == EndOfFile
proc step(self: Parser, n: int = 1): Token =
proc step(self: Parser, n: int = 1): Token {.inline.} =
## Steps n tokens into the input,
## returning the last consumed one
if self.done():
@ -227,7 +229,7 @@ proc error(self: Parser, message: string, token: Token = nil) {.raises: [ParseEr
# as a symbol and in the cases where we need a specific token we just match the string
# directly
proc check[T: TokenType or string](self: Parser, kind: T,
distance: int = 0): bool =
distance: int = 0): bool {.inline.} =
## Checks if the given token at the given distance
## matches the expected kind and returns a boolean.
## The distance parameter is passed directly to
@ -239,7 +241,7 @@ proc check[T: TokenType or string](self: Parser, kind: T,
self.peek(distance).lexeme == kind
proc check[T: TokenType or string](self: Parser, kind: openarray[T]): bool =
proc check[T: TokenType or string](self: Parser, kind: openarray[T]): bool {.inline.} =
## Calls self.check() in a loop with each entry of
## the given openarray of token kinds and returns
## at the first match. Note that this assumes
@ -251,7 +253,7 @@ proc check[T: TokenType or string](self: Parser, kind: openarray[T]): bool =
return false
proc match[T: TokenType or string](self: Parser, kind: T): bool =
proc match[T: TokenType or string](self: Parser, kind: T): bool {.inline.} =
## Behaves like self.check(), except that when a token
## matches it is also consumed
if self.check(kind):
@ -261,7 +263,7 @@ proc match[T: TokenType or string](self: Parser, kind: T): bool =
result = false
proc match[T: TokenType or string](self: Parser, kind: openarray[T]): bool =
proc match[T: TokenType or string](self: Parser, kind: openarray[T]): bool {.inline.} =
## Calls self.match() in a loop with each entry of
## the given openarray of token kinds and returns
## at the first match. Note that this assumes
@ -273,7 +275,7 @@ proc match[T: TokenType or string](self: Parser, kind: openarray[T]): bool =
result = false
proc expect[T: TokenType or string](self: Parser, kind: T, message: string = "", token: Token = nil) =
proc expect[T: TokenType or string](self: Parser, kind: T, message: string = "", token: Token = nil) {.inline.} =
## Behaves like self.match(), except that
## when a token doesn't match, an error
## is raised. If no error message is
@ -285,7 +287,7 @@ proc expect[T: TokenType or string](self: Parser, kind: T, message: string = "",
self.error(message)
proc expect[T: TokenType or string](self: Parser, kind: openarray[T], message: string = "", token: Token = nil) {.used.} =
proc expect[T: TokenType or string](self: Parser, kind: openarray[T], message: string = "", token: Token = nil) {.inline, used.} =
## Behaves like self.expect(), except that
## an error is raised only if none of the
## given token kinds matches
@ -307,6 +309,7 @@ proc funDecl(self: Parser, isAsync: bool = false, isGenerator: bool = false,
isLambda: bool = false, isOperator: bool = false, isTemplate: bool = false): Declaration
proc declaration(self: Parser): Declaration
proc parse*(self: Parser, tokens: seq[Token], file: string, lines: seq[tuple[start, stop: int]], source: string, persist: bool = false): seq[Declaration]
proc findOperators(self: Parser, tokens: seq[Token])
# End of forward declarations
@ -436,7 +439,7 @@ proc makeCall(self: Parser, callee: Expression): CallExpr =
proc parseGenericArgs(self: Parser) =
## Parses function generic arguments
## like function[type](arg)
discard
discard # TODO
proc call(self: Parser): Expression =
@ -596,12 +599,12 @@ proc assertStmt(self: Parser): Statement =
result.file = self.file
proc beginScope(self: Parser) =
proc beginScope(self: Parser) {.inline.} =
## Begins a new lexical scope
inc(self.scopeDepth)
proc endScope(self: Parser) =
proc endScope(self: Parser) {.inline.} =
## Ends a new lexical scope
dec(self.scopeDepth)
@ -631,8 +634,7 @@ proc namedBlockStmt(self: Parser): Statement =
self.expect(Identifier, "expecting block name after 'block'")
var name = newIdentExpr(self.peek(-1), self.scopeDepth)
name.file = self.file
let enclosingLoop = self.currentLoop
self.currentLoop = Loop
inc(self.loopDepth)
self.expect(LeftBrace, "expecting '{' after 'block'")
while not self.check(RightBrace) and not self.done():
code.add(self.declaration())
@ -642,14 +644,14 @@ proc namedBlockStmt(self: Parser): Statement =
result = newNamedBlockStmt(code, name, tok)
result.file = self.file
self.endScope()
self.currentLoop = enclosingLoop
dec(self.loopDepth)
proc breakStmt(self: Parser): Statement =
## Parses break statements
let tok = self.peek(-1)
var label: IdentExpr
if self.currentLoop != Loop:
if self.loopDepth == 0:
self.error("'break' cannot be used outside loops")
if self.match(Identifier):
label = newIdentExpr(self.peek(-1), self.scopeDepth)
@ -673,7 +675,7 @@ proc continueStmt(self: Parser): Statement =
## Parses continue statements
let tok = self.peek(-1)
var label: IdentExpr
if self.currentLoop != Loop:
if self.loopDepth == 0:
self.error("'continue' cannot be used outside loops")
if self.match(Identifier):
label = newIdentExpr(self.peek(-1), self.scopeDepth)
@ -747,8 +749,7 @@ proc raiseStmt(self: Parser): Statement =
proc forEachStmt(self: Parser): Statement =
## Parses C#-like foreach loops
let tok = self.peek(-1)
let enclosingLoop = self.currentLoop
self.currentLoop = Loop
inc(self.loopDepth)
self.expect(Identifier)
let identifier = newIdentExpr(self.peek(-1), self.scopeDepth)
self.expect("in")
@ -756,10 +757,7 @@ proc forEachStmt(self: Parser): Statement =
self.expect(LeftBrace)
result = newForEachStmt(identifier, expression, self.blockStmt(), tok)
result.file = self.file
self.currentLoop = enclosingLoop
proc findOperators(self: Parser, tokens: seq[Token])
dec(self.loopDepth)
proc importStmt(self: Parser, fromStmt: bool = false): Statement =
@ -806,6 +804,10 @@ proc importStmt(self: Parser, fromStmt: bool = false): Statement =
break
elif i == searchPath.high():
self.error(&"""could not import '{path}': module not found""")
if not self.modules.getOrDefault(path, true):
self.error(&"coult not import '{path}' (recursive dependency detected)")
else:
self.modules[path] = false
try:
var source = readFile(path)
var tree = self.tree
@ -819,6 +821,8 @@ proc importStmt(self: Parser, fromStmt: bool = false): Statement =
self.tree = tree
self.current = current
self.tokens = tokens
# Module has been fully loaded and can now be used
self.modules[path] = true
except IOError:
self.error(&"could not import '{path}': {getCurrentExceptionMsg()}")
except OSError:
@ -859,14 +863,13 @@ proc whileStmt(self: Parser): Statement =
## Parses a C-style while loop statement
let tok = self.peek(-1)
self.beginScope()
let enclosingLoop = self.currentLoop
inc(self.loopDepth)
let condition = self.expression()
self.expect(LeftBrace)
self.currentLoop = Loop
result = newWhileStmt(condition, self.blockStmt(), tok)
result.file = self.file
self.currentLoop = enclosingLoop
self.endScope()
dec(self.loopDepth)
proc ifStmt(self: Parser): Statement =
@ -1049,7 +1052,7 @@ proc parseFunExpr(self: Parser): LambdaExpr =
proc parseGenericConstraint(self: Parser): Expression =
## Recursivelt parses a generic constraint
## Recursively parses a generic constraint
## and returns it as an expression
result = self.expression() # First value is always an identifier of some sort
if not self.check(RightBracket):
@ -1301,6 +1304,7 @@ proc typeDecl(self: Parser): TypeDecl =
var generics: seq[tuple[name: IdentExpr, cond: Expression]] = @[]
var pragmas: seq[Pragma] = @[]
result = newTypeDecl(name, fields, defaults, isPrivate, token, pragmas, generics, nil, false, false)
result.file = self.file
if self.match(LeftBracket):
self.parseGenerics(result)
self.expect("=", "expecting '=' after type name")
@ -1315,7 +1319,6 @@ proc typeDecl(self: Parser): TypeDecl =
result.isEnum = true
of "object":
discard self.step()
discard # Default case
else:
hasNone = true
if hasNone:
@ -1334,7 +1337,7 @@ proc typeDecl(self: Parser): TypeDecl =
self.expect(LeftBrace, "expecting '{' after type declaration")
if self.match(TokenType.Pragma):
for pragma in self.parsePragmas():
pragmas.add(pragma)
result.pragmas.add(pragma)
var
argName: IdentExpr
argPrivate: bool
@ -1356,8 +1359,6 @@ proc typeDecl(self: Parser): TypeDecl =
else:
if not self.check(RightBrace):
self.expect(",", "expecting comma after enum field declaration")
result.pragmas = pragmas
result.file = self.file
proc declaration(self: Parser): Declaration =
@ -1420,11 +1421,12 @@ proc parse*(self: Parser, tokens: seq[Token], file: string, lines: seq[tuple[sta
self.lines = lines
self.current = 0
self.scopeDepth = 0
self.currentLoop = LoopContext.None
self.loopDepth = 0
self.currentFunction = nil
self.tree = @[]
if not persist:
self.operators = newOperatorTable()
self.modules = newTable[string, bool]()
self.findOperators(tokens)
while not self.done():
self.tree.add(self.declaration())

View File

@ -51,28 +51,28 @@ proc getLineEditor: LineEditor =
result.bindHistory(history)
proc repl(warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: CompileMode = Debug) =
proc repl(warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: CompileMode = Debug, breakpoints: seq[uint64] = @[]) =
styledEcho fgMagenta, "Welcome into the peon REPL!"
var
keep = true
tokens: seq[Token] = @[]
tree: seq[Declaration] = @[]
compiler = newBytecodeCompiler(replMode=true)
compiled: Chunk
compiled: Chunk = newChunk()
serialized: Serialized
tokenizer = newLexer()
vm = newPeonVM()
parser = newParser()
debugger = newDebugger()
serializer = newSerializer()
editor = getLineEditor()
input: string
current: string
first: bool = false
tokenizer.fillSymbolTable()
editor.bindEvent(jeQuit):
stdout.styledWriteLine(fgGreen, "Goodbye!")
keep = false
input = ""
current = ""
editor.bindKey("ctrl+a"):
editor.content.home()
editor.bindKey("ctrl+e"):
@ -80,21 +80,15 @@ proc repl(warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: Comp
while keep:
try:
input = editor.read()
if input == "#reset":
compiled = newChunk()
current = ""
continue
elif input == "#show":
echo current
elif input == "#clear":
if input == "#clear":
stdout.write("\x1Bc")
continue
elif input == "":
continue
tokens = tokenizer.lex(current & input & "\n", "stdin")
tokens = tokenizer.lex(input, "stdin")
if tokens.len() == 0:
continue
when debugLexer:
if debugLexer:
styledEcho fgCyan, "Tokenization step:"
for i, token in tokens:
if i == tokens.high():
@ -102,22 +96,22 @@ proc repl(warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: Comp
break
styledEcho fgGreen, "\t", $token
echo ""
tree = newParser().parse(tokens, "stdin", tokenizer.getLines(), current & input & "\n")
tree = parser.parse(tokens, "stdin", tokenizer.getLines(), input, persist=true)
if tree.len() == 0:
continue
when debugParser:
if debugParser:
styledEcho fgCyan, "Parsing step:"
for node in tree:
styledEcho fgGreen, "\t", $node
echo ""
compiled = newBytecodeCompiler(replMode=true).compile(tree, "stdin", tokenizer.getLines(), current & input & "\n", showMismatches=mismatches, disabledWarnings=warnings, mode=mode)
when debugCompiler:
compiled = compiler.compile(tree, "stdin", tokenizer.getLines(), input, chunk=compiled, showMismatches=mismatches, disabledWarnings=warnings, mode=mode, incremental=true)
if debugCompiler:
styledEcho fgCyan, "Compilation step:\n"
debugger.disassembleChunk(compiled, "stdin")
echo ""
serialized = serializer.loadBytes(serializer.dumpBytes(compiled, "stdin"))
when debugSerializer:
if debugSerializer:
styledEcho fgCyan, "Serialization step: "
styledEcho fgBlue, "\t- Peon version: ", fgYellow, &"{serialized.version.major}.{serialized.version.minor}.{serialized.version.patch}", fgBlue, " (commit ", fgYellow, serialized.commit[0..8], fgBlue, ") on branch ", fgYellow, serialized.branch
stdout.styledWriteLine(fgBlue, "\t- Compilation date & time: ", fgYellow, fromUnix(serialized.compileDate).format("d/M/yyyy HH:mm:ss"))
@ -141,8 +135,11 @@ proc repl(warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: Comp
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
vm.run(serialized.chunk)
current &= input & "\n"
if not first:
vm.run(serialized.chunk, repl=true, breakpoints=breakpoints)
first = true
else:
vm.resume(serialized.chunk)
except LexingError:
print(LexingError(getCurrentException()))
except ParseError:
@ -157,7 +154,7 @@ proc repl(warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: Comp
quit(0)
proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints: seq[uint64] = @[], dis: bool = false,
proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints: seq[uint64] = @[],
warnings: seq[WarningKind] = @[], mismatches: bool = false, mode: CompileMode = Debug, run: bool = true,
backend: PeonBackend = PeonBackend.Bytecode, output: string) =
var
@ -186,7 +183,7 @@ proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints
tokens = tokenizer.lex(input, f)
if tokens.len() == 0:
return
when debugLexer:
if debugLexer:
styledEcho fgCyan, "Tokenization step:"
for i, token in tokens:
if i == tokens.high():
@ -197,7 +194,7 @@ proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints
tree = parser.parse(tokens, f, tokenizer.getLines(), input)
if tree.len() == 0:
return
when debugParser:
if debugParser:
styledEcho fgCyan, "Parsing step:"
for node in tree:
styledEcho fgGreen, "\t", $node
@ -205,11 +202,9 @@ proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints
case backend:
of PeonBackend.Bytecode:
compiled = compiler.compile(tree, f, tokenizer.getLines(), input, disabledWarnings=warnings, showMismatches=mismatches, mode=mode)
when debugCompiler:
if debugCompiler:
styledEcho fgCyan, "Compilation step:\n"
debugger.disassembleChunk(compiled, f)
if dis:
debugger.disassembleChunk(compiled, f)
var path = splitFile(if output.len() > 0: output else: f).dir
if path.len() > 0:
path &= "/"
@ -224,31 +219,35 @@ proc runFile(f: string, fromString: bool = false, dump: bool = true, breakpoints
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "the selected backend is not implemented yet")
elif backend == PeonBackend.Bytecode:
serialized = serializer.loadFile(f)
if backend == PeonBackend.Bytecode:
when debugSerializer:
styledEcho fgCyan, "Serialization step: "
styledEcho fgBlue, "\t- Peon version: ", fgYellow, &"{serialized.version.major}.{serialized.version.minor}.{serialized.version.patch}", fgBlue, " (commit ", fgYellow, serialized.commit[0..8], fgBlue, ") on branch ", fgYellow, serialized.branch
stdout.styledWriteLine(fgBlue, "\t- Compilation date & time: ", fgYellow, fromUnix(serialized.compileDate).format("d/M/yyyy HH:mm:ss"))
stdout.styledWrite(fgBlue, &"\t- Constants segment: ")
if serialized.chunk.consts == compiled.consts:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, &"\t- Code segment: ")
if serialized.chunk.code == compiled.code:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Line info segment: ")
if serialized.chunk.lines == compiled.lines:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Functions segment: ")
if serialized.chunk.functions == compiled.functions:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
if backend == PeonBackend.Bytecode and debugSerializer:
styledEcho fgCyan, "Serialization step: "
styledEcho fgBlue, "\t- Peon version: ", fgYellow, &"{serialized.version.major}.{serialized.version.minor}.{serialized.version.patch}", fgBlue, " (commit ", fgYellow, serialized.commit[0..8], fgBlue, ") on branch ", fgYellow, serialized.branch
stdout.styledWriteLine(fgBlue, "\t- Compilation date & time: ", fgYellow, fromUnix(serialized.compileDate).format("d/M/yyyy HH:mm:ss"))
stdout.styledWrite(fgBlue, &"\t- Constants segment: ")
if serialized.chunk.consts == compiled.consts:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, &"\t- Code segment: ")
if serialized.chunk.code == compiled.code:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Line info segment: ")
if serialized.chunk.lines == compiled.lines:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Functions segment: ")
if serialized.chunk.functions == compiled.functions:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Modules segment: ")
if serialized.chunk.modules == compiled.modules:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
if run:
case backend:
of PeonBackend.Bytecode:
@ -284,7 +283,6 @@ when isMainModule:
var dump: bool = true
var warnings: seq[WarningKind] = @[]
var breaks: seq[uint64] = @[]
var dis: bool = false
var mismatches: bool = false
var mode: CompileMode = CompileMode.Debug
var run: bool = true
@ -350,7 +348,7 @@ when isMainModule:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"error: invalid breakpoint value '{point}'")
quit()
of "disassemble":
dis = true
debugCompiler = true
of "compile":
run = false
of "output":
@ -361,8 +359,12 @@ when isMainModule:
backend = PeonBackend.Bytecode
of "c":
backend = PeonBackend.NativeC
of "cpp":
backend = PeonBackend.NativeCpp
of "debug-dump":
debugSerializer = true
of "debug-lexer":
debugLexer = true
of "debug-parser":
debugParser = true
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"error: unkown option '{key}'")
quit()
@ -403,14 +405,16 @@ when isMainModule:
of "c":
run = false
of "d":
dis = true
debugCompiler = true
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"unkown option '{key}'")
quit()
else:
echo "usage: peon [options] [filename.pn]"
quit()
if breaks.len() == 0 and debugVM:
breaks.add(0)
if file == "":
repl(warnings, mismatches, mode)
repl(warnings, mismatches, mode, breaks)
else:
runFile(file, fromString, dump, breaks, dis, warnings, mismatches, mode, run, backend, output)
runFile(file, fromString, dump, breaks, warnings, mismatches, mode, run, backend, output)

View File

@ -2,6 +2,11 @@
import values;
operator `is`*[T: any](a, b: T): bool {
#pragma[magic: "Identity", pure]
}
operator `>`*[T: UnsignedInteger](a, b: T): bool {
#pragma[magic: "GreaterThan", pure]
}
@ -12,7 +17,7 @@ operator `<`*[T: UnsignedInteger](a, b: T): bool {
}
operator `==`*[T: Number | inf](a, b: T): bool {
operator `==`*[T: Number | inf | bool](a, b: T): bool {
#pragma[magic: "Equal", pure]
}

View File

@ -16,4 +16,9 @@ export comparisons;
var version* = 1;
var _private = 5; # Invisible outside the module (underscore is to silence warning)
var test* = 0x60;
var test* = 0x60;
fn testGlobals*: bool {
return version == 1 and _private == 5 and test == 0x60;
}

View File

@ -1,4 +1,5 @@
import std;
import time;
fn fib(n: int): int {
@ -10,7 +11,7 @@ fn fib(n: int): int {
print("Computing the value of fib(37)");
var x = clock();
var x = time.clock();
print(fib(37));
print(clock() - x);
print(time.clock() - x);
print("Done!");

View File

@ -1,7 +1,7 @@
import std;
const max = 50000;
const max = 500000;
var x = max;
var s = "just a test";