Compare commits

...

47 Commits

Author SHA1 Message Date
Mattia Giambirtone ecbdf120e3
More lexer bug fixes and tests. Minor changes to error reporting. Added intrinsic aliases 2024-02-23 13:15:11 +01:00
Mattia Giambirtone 92993535d7
Fix bugs that weren't producing expression statements properly 2024-02-23 13:15:10 +01:00
Mattia Giambirtone 20cca2c185
Add --cacheDir option 2024-02-23 13:15:10 +01:00
Mattia Giambirtone b0b7739a63
Change build directory 2024-02-23 13:15:10 +01:00
Mattia Giambirtone c5450a9e19
Build task now fails if tests fail 2024-02-23 13:15:10 +01:00
Mattia Giambirtone f1d2386175
Fix stringWithEscapes test and add more unicode tests 2024-02-23 13:15:10 +01:00
Mattia Giambirtone 6db44570ae
Improve unicode support in the tokenizer and add more tests 2024-02-23 13:15:09 +01:00
Mattia Giambirtone e061bb399b
Added allTokens test 2024-02-23 13:15:09 +01:00
Mattia Giambirtone 79f3803328
Improve test suite and fix bugs in the tokenizer 2024-02-23 13:15:09 +01:00
Mattia Giambirtone 3b603d1fdf
Minor formatting changes to test outcome printing (again) 2024-02-23 13:15:09 +01:00
Mattia Giambirtone 0ec377b308
Minor formatting changes to test outcome printing 2024-02-23 13:15:08 +01:00
Mattia Giambirtone 40cbed2b19
Improve test suite with outcome management 2024-02-23 13:15:08 +01:00
Mattia Giambirtone 41abf59395
Add initial documentation to test suite 2024-02-23 13:15:08 +01:00
Mattia Giambirtone b2efb1c9b5
Update copyright/license notices & README with build instructions. Made peon buildable via nimble build 2024-02-23 13:15:08 +01:00
Mattia Giambirtone eb8f7c0a51
Update README 2024-02-23 13:15:07 +01:00
Mattia Giambirtone 31ee29538e
Improve error handling and error messages 2024-02-23 13:15:07 +01:00
Mattia Giambirtone c3bac2cf46
Add comments to generics example 2024-02-23 13:15:07 +01:00
Mattia Giambirtone ee90dad3d2
Turn peon into a proper nimble package and add initial test suite 2024-02-23 13:15:07 +01:00
Mattia Giambirtone 887d1ce8f5
Added --noGen option 2024-02-23 13:15:06 +01:00
Mattia Giambirtone 3f0a4708d3
Rework generic replacement mechanism 2024-02-23 13:15:06 +01:00
Mattia Giambirtone c0bd1daebf
Significant parser refactoring and cleanup 2024-02-23 13:15:06 +01:00
Mattia Giambirtone d04f412347
Add extra generic test and comments 2024-02-23 13:15:06 +01:00
Mattia Giambirtone 2f74c23774
Whoops 2024-02-23 13:15:05 +01:00
Mattia Giambirtone 60d9b3c37e
More fixes to generics 2024-02-23 13:15:05 +01:00
Mattia Giambirtone 34d5f77f65
Added test for generics 2024-02-23 13:15:05 +01:00
Mattia Giambirtone a6a944a4fa
Completely rework generics 2024-02-23 13:15:05 +01:00
Mattia Giambirtone 838fc3d5a1
preparation for type system overhaul (please send help) 2024-02-23 13:15:04 +01:00
Mattia Giambirtone 83051d67f8
Further work on porting the bytecode target 2024-02-23 13:15:04 +01:00
Mattia Giambirtone 6181c49f1f
Initial work on porting the bytecode backend to peon 0.2 2024-02-23 13:15:04 +01:00
Mattia Giambirtone 8b39cc3bc0
Add parser support for selective import statements 2024-02-23 13:15:04 +01:00
Mattia Giambirtone 3ad22dea12
Improve separation by splitting types from type checker 2024-02-23 13:15:03 +01:00
Mattia Giambirtone e11ada2fec
Minor review and improvements 2024-02-23 13:15:03 +01:00
Mattia Giambirtone 13eea04e74
Fix parsing bug with type declarations 2024-02-23 13:15:03 +01:00
Mattia Giambirtone 8cac75ecef
Minor fixes 2024-02-23 13:15:03 +01:00
Mattia Giambirtone f7f6ae052f
Add missing file 2024-02-23 13:15:02 +01:00
Mattia Giambirtone f2a23b8b77
Minor refactoring of components and names 2024-02-23 13:15:02 +01:00
Mattia Giambirtone 4c8cf89c8e
Remove unused bytecode VM 2024-02-23 13:15:02 +01:00
Mattia Giambirtone 525a11adad
Minor refactoring in preparation for additional modules 2024-02-23 13:15:02 +01:00
Mattia Giambirtone f5d091bb9b
The compiler no longer emits warnings that should be emitted by a control flow analyzer. Major cleanup and refactoring 2024-02-23 13:15:01 +01:00
Mattia Giambirtone db41234ee0
Further work on function calls 2024-02-23 13:15:01 +01:00
Mattia Giambirtone f34b71ec0b
Improvements to type signature matching and fixed some bugs 2024-02-23 13:15:01 +01:00
Mattia Giambirtone eccb2b5372
Fixed issue when comparing integers and floats of different sizes/signedness 2024-02-23 13:15:01 +01:00
Mattia Giambirtone 3d0f35489c
Initial work on refactoring call resolution 2024-02-23 13:15:00 +01:00
Mattia Giambirtone 9547d2c2bd
Fixes to variable declarations 2024-02-23 13:15:00 +01:00
Mattia Giambirtone 60d3c25e17
Improvements and refactoringù 2024-02-23 13:15:00 +01:00
Mattia Giambirtone eaa5c7ada8
Added support for nullable types and const pointers, fixed some bugs in the parser, ported old peon entry point code 2024-02-23 13:15:00 +01:00
Mattia Giambirtone b6b3f67204
Revert to old README 2024-02-23 13:14:59 +01:00
27 changed files with 5357 additions and 3132 deletions

4
.gitignore vendored
View File

@ -143,3 +143,7 @@ dmypy.json
# Cython debug symbols
cython_debug/
tests/test.pn
# Binary stuff
bin/

134
README.md
View File

@ -1,126 +1,22 @@
# The peon programming language
# peon-rewrite
Peon is a modern, multi-paradigm, async-first programming language with a focus on correctness and speed.
[Go to the Manual](docs/manual.md)
Work in progress for Peon 0.2.x
## What's peon?
## What changed
__Note__: For simplicity reasons, the verbs in this section refer to the present even though part of what's described here is not implemented yet.
- Peon will no longer use a runtime GC. Instead, the memory model will use ~~lifetimes with regions~~-- actually, peon will use
[generational references](https://verdagon.dev/blog/generational-references) instead (they're way cooler IMHO)
- The compiler has been completely overhauled and no longer handles any code generation (in fact, currently there is no code generation
at all, just a parser and a type checker). This is to allow for true multi-backend support as well as to improve separation of concerns
because all the code generation stuff intertwined with the typechecking was driving me insane (please do send help)
## Build
Just run `nimble build`. It should grab all the dependencies for you and produce a `peon` binary in your current working directory
Peon is a multi-paradigm, statically-typed programming language inspired by C, Nim, Python, Rust and C++: it supports modern, high-level
features such as automatic type inference, parametrically polymorphic generic types, pure functions, closures, interfaces, single inheritance,
reference types, templates, coroutines, raw pointers and exceptions.
## Tests
The memory management model is rather simple: a Mark and Sweep garbage collector is employed to reclaim unused memory, although more garbage
collection strategies (such as generational GC or deferred reference counting) are planned to be added in the future.
Peon features a native cooperative concurrency model designed to take advantage of the inherent waiting of typical I/O workloads, without the use of more than one OS thread (wherever possible), allowing for much greater efficiency and a smaller memory footprint. The asynchronous model used forces developers to write code that is both easy to reason about, thanks to the [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) model that is core to peon's async event loop implementation, and works as expected every time (without dropping signals, exceptions, or task return values).
Other notable features are the ability to define (and overload) custom operators with ease by implementing them as language-level functions, [Universal function call syntax](https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax), [Name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)) and named scopes.
In peon, all objects are first-class (this includes functions, iterators, closures and coroutines).
## Disclaimers
**Disclaimer 1**: The project is still in its very early days: lots of stuff is not implemented, a work in progress or
otherwise outright broken. Feel free to report bugs!
**Disclaimer 2**: Currently, the `std` module has to be _always_ imported explicitly for even the most basic snippets to work. This is because intrinsic types and builtin operators are defined within it: if it is not imported, peon won't even know how to parse `2 + 2` (and even if it could, it would have no idea what the type of the expression would be). You can have a look at the [peon standard library](src/peon/stdlib) to see how the builtins are defined (be aware that they heavily rely on compiler black magic to work) and can even provide your own implementation if you're so inclined.
### TODO List
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
- User-defined types
- Function calls ✅
- Control flow (if-then-else, switch) ✅
- Looping (while) ✅
- Iteration (foreach)
- Type conversions
- Type casting
- Intrinsics ✅
- Type unions ✅
- Functions ✅
- Closures
- Managed references
- Unmanaged references
- Named scopes/blocks ✅
- Inheritance
- Interfaces
- Generics ✅
- Automatic types ✅
- Iterators/Generators
- Coroutines
- Pragmas ✅
- Attribute resolution ✅
- Universal Function Call Syntax
- Import system ✅
- Exceptions
- Templates (_not_ like C++ templates) ✅
- Optimizations (constant folding, branch and dead code elimination, inlining)
## Feature wishlist
Here's a random list of high-level features I would like peon to have and that I think are kinda neat (some may
have been implemented alredady):
- Reference types are not nullable by default (must use `#pragma[nullable]`)
- The `commutative` pragma, which allows to define just one implementation of an operator
and have it become commutative
- Easy C/Nim interop via FFI
- C/C++ backend
- Nim backend
- [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) (must-have!)
- Simple OOP (with multiple dispatch!)
- RTTI, with methods that dispatch at runtime based on the true (aka runtime) type of a value
- Limited compile-time evaluation (embed the Peon VM in the C/C++/Nim backend and use that to execute peon code at compile time)
## The name
The name for peon comes from [Productive2's](https://git.nocturn9x.space/prod2) genius cute brain, and is a result of shortening
the name of the fastest animal on earth: the **Pe**regrine Falc**on**. I guess I wanted this to mean peon will be blazing fast (I
certainly hope so!)
# Peon needs you.
No, but really. I need help. This project is huge and (IMHO) awesome, but there's a lot of non-trivial work to do and doing
it with other people is just plain more fun and rewarding. If you want to get involved, definitely try [contacting](https://nocturn9x.space/contact) me
or open an issue/PR!
# Credits
- Araq, for creating the amazing language that is [Nim](https://nim-lang.org) (as well as all of its contributors!)
- Guido Van Rossum, aka the chad who created [Python](https://python.org) and its awesome community and resources
- The Nim community and contributors, for making Nim what it is today
- Bob Nystrom, for his amazing [book](https://craftinginterpreters.com) that inspired me
and taught me how to actually make a programming language (kinda, I'm still very dumb)
- [Njsmith](https://vorpus.org/), for his awesome articles on structured concurrency
- All the amazing people in the [r/ProgrammingLanguages](https://reddit.com/r/ProgrammingLanguages) subreddit and its [Discord](https://discord.gg/tuFCPmB7Un) server
- [Art](https://git.nocturn9x.space/art) <3
- Everyone to listened (and still listens to) me ramble about compilers, programming languages and the likes (and for giving me ideas and testing peon!)
- ... More? (I'd thank the contributors but it's just me :P)
- Me! I guess
## Ok, cool, how do I use it?
Great question! If this README somehow didn't turn you away already (thanks, by the way), then you may want to try peon
out for yourself. Fortunately, the process is quite straightforward:
- First, you're gonna have to install [Nim](https://nim-lang.org/), the language peon is written in. I highly recommend
using [choosenim](https://github.com/dom96/choosenim) to manage your Nim installations as it makes switching between them and updating them a breeze
- Then, clone this repository and compile peon in release mode with `nim c -d:release --passC:"-flto" -o:peon src/main`, which should produce`peon` binary
ready for you to play with (if your C toolchain doesn't support LTO then you can just omit the `--passC` option, although that would be pretty weird for
a modern linker)
- If you want to move the executable to a different directory (say, into your `PATH`), you should copy peon's standard
library (found in `/src/peon/stdlib`) into a known folder, edit the `moduleLookupPaths` variable inside `src/config.nim`
by adding said folder to it so that the peon compiler knows where to find modules when you `import std;` and then recompile
peon. Hopefully I will automate this soon, but as of right now the work is all manual
__Note__: On Linux, peon will also look into `~/.local/peon/stdlib` by default, so you can just create the `~/.local/peon` folder and copy `src/peon/stdlib` there
Peon is starting to get large enough for it to need an automated test suite (wow, much fancy, such cool), so you can run `nimble test`
to run that. The actual tests aren't using testament because I have a severe case of NIH syndrome, sorry folks!

View File

@ -1 +1,2 @@
--hints:off --deepCopy:on --experimental:strictFuncs --exceptions:setjmp
path="src"

22
peon.nimble Normal file
View File

@ -0,0 +1,22 @@
# Package
version = "0.1.0"
author = "nocturn9x"
description = "A rewrite of Peon 0.1.x"
license = "Apache-2.0"
srcDir = "src"
bin = @["peon"]
binDir = "bin"
# Dependencies
requires "nim >= 2.1.1"
requires "jale >= 0.1.1"
before build:
exec "nimble test"
task test, "Runs the test suite":
exec "nim r tests/tokenize.nim"

View File

@ -0,0 +1,464 @@
import frontend/compiler/typechecker
import backend/bytecode/opcodes
import backend/bytecode/tooling/multibyte
import errors
import std/strutils
import std/parseutils
import std/tables
import std/strformat
type
FunctionWrapper = ref object
## A wrapper around a typed function
## declaration. This is necessary to
## carry bytecode-specific information
## regarding this function along with
## the typed declaration itself
decl: TypedFunDecl
# The location where the function's code
# begins and ends
location: tuple[start, stop: int]
BytecodeGenerator* = ref object
## A bytecode generator
# The piece of code we compile into
chunk: Chunk
# The current size of the call
# stack (which is always known
# statically)
stackSize: int
# Stores the position of all jumps
jumps: seq[tuple[patched: bool, offset: int]]
# Metadata regarding function locations (used to construct
# the debugging fields in the resulting bytecode)
functions: seq[tuple[start, stop, pos: int, fn: Name]]
# Used for error reporting
currentFile: string
currentNode: TypedNode
# The typechecker used to validate the peon code we're generating
# bytecode for
typeChecker: TypeChecker
proc newBytecodeGenerator*: BytecodeGenerator =
## Initializes a new, blank bytecode
## generator
result = BytecodeGenerator()
proc generateExpression(self: BytecodeGenerator, expression: TypedExpr)
proc error(self: BytecodeGenerator, msg: string, typedNode: TypedNode = nil) =
## Raises a generic peon exception
var typedNode = typedNode
var file = self.currentFile
if typedNode.isNil():
typedNode = self.currentNode
if file == "" and typedNode.node.isDecl():
file = TypedDecl(typedNode).name.owner.ident.token.lexeme
raise CodeGenError(msg: msg, line: typedNode.node.token.line, file: file)
proc emitByte(self: BytecodeGenerator, byt: OpCode | uint8, line: int) {.inline.} =
## Emits a single byte, writing it to
## the current chunk being compiled
self.chunk.write(uint8(byt), line)
proc emitBytes(self: BytecodeGenerator, bytarr: openarray[OpCode | uint8], line: int) {.inline.} =
## Handy helper method to write arbitrary bytes into
## the current chunk, calling emitByte on each of its
## elements
for b in bytarr:
self.emitByte(b, line)
proc makeConstant(self: BytecodeGenerator, value: TypedExpr): array[3, uint8] =
## Adds a constant to the current chunk's constant table
## and returns its index as a 3-byte array of uint8s
var lit: string
if value.kind.kind == Integer:
lit = value.node.token.lexeme
if lit.contains("'"):
var idx = lit.high()
while lit[idx] != '\'':
lit = lit[0..^2]
dec(idx)
lit = lit[0..^2]
case value.kind.kind:
of Integer:
case value.kind.size:
of Tiny:
result = self.chunk.writeConstant([uint8(parseInt(lit))])
of Short:
result = self.chunk.writeConstant(parseInt(lit).toDouble())
of Long:
result = self.chunk.writeConstant(parseInt(lit).toQuad())
of LongLong:
if not value.kind.signed:
result = self.chunk.writeConstant(parseInt(lit).toLong())
else:
result = self.chunk.writeConstant(parseBiggestUInt(lit).toLong())
of String:
result = self.chunk.writeConstant(value.node.token.lexeme[1..^1].toBytes())
of Float:
case value.kind.width:
of Half:
var f: float = 0.0
discard parseFloat(value.node.token.lexeme, f)
result = self.chunk.writeConstant(cast[array[4, uint8]](float32(f)))
of Full:
var f: float = 0.0
discard parseFloat(value.node.token.lexeme, f)
result = self.chunk.writeConstant(cast[array[8, uint8]](f))
else:
discard
proc emitConstant(self: BytecodeGenerator, expression: TypedExpr) =
## Emits a constant instruction along
## with its operand
let
typ = expression.kind
node = expression.node
case typ.kind:
of Integer:
case typ.size:
of LongLong:
if typ.signed:
self.emitByte(LoadInt64, node.token.line)
else:
self.emitByte(LoadUInt64, node.token.line)
of Long:
if typ.signed:
self.emitByte(LoadInt32, node.token.line)
else:
self.emitByte(LoadUInt32, node.token.line)
of Short:
if typ.signed:
self.emitByte(LoadInt16, node.token.line)
else:
self.emitByte(LoadUInt16, node.token.line)
of Tiny:
if typ.signed:
self.emitByte(LoadInt8, node.token.line)
else:
self.emitByte(LoadUInt8, node.token.line)
of String:
self.emitByte(LoadString, node.token.line)
let str = LiteralExpr(node).literal.lexeme
if str.len() >= 16777216:
self.error("string constants cannot be larger than 16777215 bytes", expression)
self.emitBytes((str.len() - 2).toTriple(), node.token.line)
of Float:
case typ.width:
of Half:
self.emitByte(LoadFloat32, node.token.line)
of Full:
self.emitByte(LoadFloat64, node.token.line)
else:
discard # TODO
self.emitBytes(self.makeConstant(expression), node.token.line)
proc setJump(self: BytecodeGenerator, offset: int, jmp: array[3, uint8]) =
## Sets a jump at the given
## offset to the given value
self.chunk.code[offset + 1] = jmp[0]
self.chunk.code[offset + 2] = jmp[1]
self.chunk.code[offset + 3] = jmp[2]
proc setJump(self: BytecodeGenerator, offset: int, jmp: seq[uint8]) =
## Sets a jump at the given
## offset to the given value
self.chunk.code[offset + 1] = jmp[0]
self.chunk.code[offset + 2] = jmp[1]
self.chunk.code[offset + 3] = jmp[2]
proc emitJump(self: BytecodeGenerator, opcode: OpCode, line: int): int =
## Emits a dummy jump offset to be patched later
## and returns a unique identifier for that jump
## to be passed to patchJump
self.emitByte(opcode, line)
self.jumps.add((patched: false, offset: self.chunk.code.high()))
self.emitBytes(0.toTriple(), line)
result = self.jumps.high()
proc patchJump(self: BytecodeGenerator, offset: int) =
## Patches a previously emitted relative
## jump using emitJump
var jump: int = self.chunk.code.len() - self.jumps[offset].offset
if jump < 0:
self.error("jump size cannot be negative (This is an internal error and most likely a bug)")
if jump > 16777215:
# TODO: Emit consecutive jumps using insertAt
self.error("cannot jump more than 16777215 instructions")
if jump > 0:
self.setJump(self.jumps[offset].offset, (jump - 4).toTriple())
self.jumps[offset].patched = true
proc handleBuiltinFunction(self: BytecodeGenerator, fn: FunctionWrapper, args: seq[TypedExpr], line: int) =
## Emits instructions for builtin functions
## such as addition or subtraction
var builtinOp: string
for pragma in FunDecl(fn.decl.node).pragmas:
if pragma.name.token.lexeme == "magic":
builtinOp = pragma.args[0].token.lexeme
if builtinOp notin ["LogicalOr", "LogicalAnd"]:
if len(args) == 2:
self.generateExpression(args[1])
self.generateExpression(args[0])
elif len(args) == 1:
self.generateExpression(args[0])
const codes: Table[string, OpCode] = {"Negate": Negate,
"NegateFloat32": NegateFloat32,
"NegateFloat64": NegateFloat64,
"Add": Add,
"Subtract": Subtract,
"Divide": Divide,
"Multiply": Multiply,
"SignedDivide": SignedDivide,
"AddFloat64": AddFloat64,
"SubtractFloat64": SubtractFloat64,
"DivideFloat64": DivideFloat64,
"MultiplyFloat64": MultiplyFloat64,
"AddFloat32": AddFloat32,
"SubtractFloat32": SubtractFloat32,
"DivideFloat32": DivideFloat32,
"MultiplyFloat32": MultiplyFloat32,
"Pow": Pow,
"SignedPow": SignedPow,
"PowFloat32": PowFloat32,
"PowFloat64": PowFloat64,
"Mod": Mod,
"SignedMod": SignedMod,
"ModFloat32": ModFloat32,
"ModFloat64": ModFloat64,
"Or": Or,
"And": And,
"Xor": Xor,
"Not": Not,
"LShift": LShift,
"RShift": RShift,
"Equal": Equal,
"NotEqual": NotEqual,
"LessThan": LessThan,
"GreaterThan": GreaterThan,
"LessOrEqual": LessOrEqual,
"GreaterOrEqual": GreaterOrEqual,
"SignedLessThan": SignedLessThan,
"SignedGreaterThan": SignedGreaterThan,
"SignedLessOrEqual": SignedLessOrEqual,
"SignedGreaterOrEqual": SignedGreaterOrEqual,
"Float32LessThan": Float32LessThan,
"Float32GreaterThan": Float32GreaterThan,
"Float32LessOrEqual": Float32LessOrEqual,
"Float32GreaterOrEqual": Float32GreaterOrEqual,
"Float64LessThan": Float64LessThan,
"Float64GreaterThan": Float64GreaterThan,
"Float64LessOrEqual": Float64LessOrEqual,
"Float64GreaterOrEqual": Float64GreaterOrEqual,
"PrintString": PrintString,
"SysClock64": SysClock64,
"LogicalNot": LogicalNot,
"NegInf": LoadNInf,
"Identity": Identity
}.toTable()
if builtinOp == "print":
let typ = args[0].kind
case typ.kind:
of Integer:
case typ.size:
of LongLong:
if typ.signed:
self.emitByte(PrintInt64, line)
else:
self.emitByte(PrintUInt64, line)
of Long:
if typ.signed:
self.emitByte(PrintInt32, line)
else:
self.emitByte(PrintUInt32, line)
of Short:
if typ.signed:
self.emitByte(PrintInt16, line)
else:
self.emitByte(PrintUInt16, line)
of Tiny:
if typ.signed:
self.emitByte(PrintInt8, line)
else:
self.emitByte(PrintUInt8, line)
of Float:
case typ.width:
of Full:
self.emitByte(PrintFloat64, line)
of Half:
self.emitByte(PrintFloat32, line)
of String:
self.emitByte(PrintString, line)
of Boolean:
self.emitByte(PrintBool, line)
of TypeKind.Nan:
self.emitByte(PrintNan, line)
of TypeKind.Infinity:
self.emitByte(PrintInf, line)
of Function:
self.emitByte(LoadString, line)
var loc: string = fn.location.start.toHex()
while loc[0] == '0' and loc.len() > 1:
loc = loc[1..^1]
var str: string
if typ.isLambda:
str = &"anonymous function at 0x{loc}"
else:
str = &"function '{FunDecl(fn.decl.node).name.token.lexeme}' at 0x{loc}"
self.emitBytes(str.len().toTriple(), line)
self.emitBytes(self.chunk.writeConstant(str.toBytes()), line)
self.emitByte(PrintString, line)
else:
self.error(&"invalid type {self.typechecker.stringify(typ)} for built-in 'print'", args[0])
return
if builtinOp in codes:
self.emitByte(codes[builtinOp], line)
return
# Some builtin operations are slightly more complex
# so we handle them separately
case builtinOp:
of "LogicalOr":
self.generateExpression(args[0])
let jump = self.emitJump(JumpIfTrue, line)
self.generateExpression(args[1])
self.patchJump(jump)
of "LogicalAnd":
self.generateExpression(args[0])
let jump = self.emitJump(JumpIfFalseOrPop, line)
self.generateExpression(args[1])
self.patchJump(jump)
of "cast":
# Type casts are a merely compile-time construct:
# they don't produce any code at runtime because
# the underlying data representation does not change!
# The only reason why there's a "cast" pragma is to
# make it so that the peon stub can have no body
discard
else:
self.error(&"unknown built-in: '{builtinOp}'")
proc patchReturnAddress(self: BytecodeGenerator, pos: int) =
## Patches the return address of a function
## call
let address = self.chunk.code.len().toLong()
self.chunk.consts[pos] = address[0]
self.chunk.consts[pos + 1] = address[1]
self.chunk.consts[pos + 2] = address[2]
self.chunk.consts[pos + 3] = address[3]
self.chunk.consts[pos + 4] = address[4]
self.chunk.consts[pos + 5] = address[5]
self.chunk.consts[pos + 6] = address[6]
self.chunk.consts[pos + 7] = address[7]
proc generateLiteral(self: BytecodeGenerator, literal: TypedExpr) =
## Emits code for literals
let
typ = literal.kind
node = literal.node
case typ.kind:
of Integer, Float:
# No need to do any input validation here: the typechecker
# has graciously done all the work for us! :)
self.emitConstant(literal)
of Infinity:
if typ.positive:
self.emitByte(LoadInf, node.token.line)
else:
self.emitByte(LoadNInf, node.token.line)
of NaN:
self.emitByte(LoadNaN, node.token.line)
else:
self.error(&"Unknown typed node of type {node.kind} at generateLiteral()")
proc generateUnary(self: BytecodeGenerator, expression: TypedExpr) =
## Emits code for unary expressions
discard # TODO
proc generateBinary(self: BytecodeGenerator, expression: TypedExpr) =
## Emits code for binary expressions
discard # TODO
proc generateExpression(self: BytecodeGenerator, expression: TypedExpr) =
## Emits code for expressions
if expression.node.isConst():
self.generateLiteral(expression)
else:
let node = expression.node
case node.kind:
of unaryExpr:
self.generateUnary(expression)
of binaryExpr:
self.generateBinary(expression)
else:
self.error(&"Unknown typed node of type {node.kind} at generateExpression()")
proc beginProgram(self: BytecodeGenerator): int =
## Emits boilerplate code to set up
## a peon program
self.emitByte(LoadUInt64, 1)
# The initial jump address is always the same
self.emitBytes(self.chunk.writeConstant(12.toLong()), 1)
self.emitByte(LoadUInt64, 1)
# We emit a dummy return address which is patched later
self.emitBytes(self.chunk.writeConstant(0.toLong()), 1)
result = self.chunk.consts.len() - 8
self.emitByte(Call, 1)
self.emitBytes(0.toTriple(), 1)
proc endProgram(self: BytecodeGenerator, pos: int) =
## Emits boilerplate code to tear down
## a peon program
self.emitByte(OpCode.Return, self.currentNode.node.token.line)
# Entry point has no return value
self.emitByte(0, self.currentNode.node.token.line)
# Patch the return address now that we know the boundaries
# of the function
self.patchReturnAddress(pos)
proc generate*(self: BytecodeGenerator, compiled: seq[TypedNode], typeChecker: TypeChecker): Chunk =
## Turn the given compilation output
## into a bytecode chunk
self.chunk = newChunk()
self.typeChecker = typeChecker
let offset = self.beginProgram()
self.currentFile = typeChecker.getFile()
for typedNode in compiled:
self.currentNode = typedNode
let currentFile = self.currentFile
if self.currentNode.node.isDecl():
self.currentFile = TypedDecl(typedNode).name.module.ident.token.lexeme
case typedNode.node.kind:
of exprStmt:
self.generateExpression(TypedExprStmt(typedNode).expression)
self.emitByte(Pop, typedNode.node.token.line)
else:
self.error(&"Unknown typed node of type {typedNode.node.kind} at generate()")
self.currentFile = currentFile
self.endProgram(offset)
result = self.chunk

View File

@ -0,0 +1,373 @@
# Copyright 2023 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Low level bytecode implementation details
import backend/bytecode/tooling/multibyte
type
Chunk* = ref object
## A piece of bytecode.
## consts is the code's constants table.
## code is the linear sequence of compiled bytecode instructions.
## lines maps bytecode instructions to line numbers using Run
## Length Encoding. Instructions are encoded in groups whose structure
## follows the following schema:
## - The first integer represents the line number
## - The second integer represents the number of
## instructions on that line
## For example, if lines equals [1, 5], it means that there are 5 instructions
## at line 1, meaning that all instructions in code[0..4] belong to the same line.
## This is more efficient than using the naive approach, which would encode
## the same line number multiple times and waste considerable amounts of space.
## functions encodes the following information:
## - Function name
## - Argument count
## - Function boundaries
## The encoding is the following:
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
## - After that follows the argument count as a 1 byte integer
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
## its size as a 2-byte integer
## modules contains information about all the peon modules that the compiler has encountered,
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
## is contained within a single binary blob. While this simplifies the implementation and makes
## bytecode files entirely "self-hosted", it also means that the original module information is
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
consts*: seq[uint8]
code*: seq[uint8]
lines*: seq[int]
functions*: seq[uint8]
modules*: seq[uint8]
OpCode* {.pure.} = enum
## Enum of Peon's bytecode opcodes
# Note: x represents the argument
# to unary opcodes, while a and b
# represent arguments to binary
# opcodes. Other variable names (c, d, ...)
# may be used for more complex opcodes.
# Some opcodes (e.g. jumps), take arguments in
# the form of 16 or 24 bit numbers that are defined
# statically at compilation time into the bytecode
# These push a constant at position x in the
# constant table onto the stack
LoadInt64 = 0u8,
LoadUInt64,
LoadInt32,
LoadUInt32,
LoadInt16,
LoadUInt16,
LoadInt8,
LoadUInt8,
LoadFloat64,
LoadFloat32,
LoadString,
## Singleton opcodes (each of them pushes a constant singleton on the operand stack)
LoadNil,
LoadTrue,
LoadFalse,
LoadNan,
LoadInf,
LoadNInf,
## Operations on primitive types
Negate,
NegateFloat64,
NegateFloat32,
Add,
Subtract,
Multiply,
Divide,
SignedDivide,
AddFloat64,
SubtractFloat64,
MultiplyFloat64,
DivideFloat64,
AddFloat32,
SubtractFloat32,
MultiplyFloat32,
DivideFloat32,
Pow,
SignedPow,
Mod,
SignedMod,
PowFloat64,
PowFloat32,
ModFloat64,
ModFloat32,
LShift,
RSHift,
Xor,
Or,
And,
Not,
Equal,
NotEqual,
GreaterThan,
LessThan,
GreaterOrEqual,
LessOrEqual,
SignedGreaterThan,
SignedLessThan,
SignedGreaterOrEqual,
SignedLessOrEqual,
Float64GreaterThan,
Float64LessThan,
Float64GreaterOrEqual,
Float64LessOrEqual,
Float32GreaterThan,
Float32LessThan,
Float32GreaterOrEqual,
Float32LessOrEqual,
LogicalNot,
Identity, # Pointer equality
## Print opcodes
PrintInt64,
PrintUInt64,
PrintInt32,
PrintUInt32,
PrintInt16,
PrintUint16,
PrintInt8,
PrintUInt8,
PrintFloat64,
PrintFloat32,
PrintHex,
PrintBool,
PrintNan,
PrintInf,
PrintString,
## Basic stack operations
Pop, # Pops an element off the operand stack and discards it
PopN, # Pops x elements off the call stack (optimization for exiting local scopes which usually pop many elements)
## Name resolution/handling
LoadAttribute, # Pushes the attribute b of object a onto the stack
LoadVar, # Pushes the object at position x in the stack onto the stack
StoreVar, # Stores the value of b at position a in the stack
AddVar, # An optimization for StoreVar (used when the variable is first declared)
## Looping and jumping
Jump, # Absolute, unconditional jump into the bytecode
JumpForwards, # Relative, unconditional, positive jump in the bytecode
JumpBackwards, # Relative, unconditional, negative jump in the bytecode
JumpIfFalse, # Jumps to a relative index in the bytecode if x is false
JumpIfTrue, # Jumps to a relative index in the bytecode if x is true
JumpIfFalsePop, # Like JumpIfFalse, but also pops off the stack (regardless of truthyness). Optimization for if statements
JumpIfFalseOrPop, # Jumps to an absolute index in the bytecode if x is false and pops otherwise (used for logical and)
## Functions
Call, # Calls a function and initiates a new stack frame
Return, # Terminates the current function
SetResult, # Sets the result of the current function
## Exception handling
Raise, # Raises exception x or re-raises active exception if x is nil
BeginTry, # Initiates an exception handling context
FinishTry, # Closes the current exception handling context
## Generators
Yield, # Yields control from a generator back to the caller
## Coroutines
Await, # Calls an asynchronous function
## Misc
Assert, # Raises an exception if x is false
NoOp, # Just a no-op
PopC, # Pop a value off the call stack and discard it
PushC, # Pop a value off the operand stack and push it onto the call stack
SysClock64, # Pushes the output of a monotonic clock on the stack
LoadTOS, # Pushes the top of the call stack onto the operand stack
DupTop, # Duplicates the top of the operand stack onto the operand stack
LoadGlobal # Loads a global variable
# We group instructions by their operation/operand types for easier handling when debugging
# Simple instructions encompass instructions that push onto/pop off the stack unconditionally (True, False, Pop, etc.)
const simpleInstructions* = {Return, LoadNil,
LoadTrue, LoadFalse,
LoadNan, LoadInf,
Pop, Raise, LoadNInf,
BeginTry, FinishTry, Yield,
Await, NoOp, SetResult,
PopC, PushC, SysClock64,
Negate,
NegateFloat64,
NegateFloat32,
Add,
Subtract,
Multiply,
Divide,
SignedDivide,
AddFloat64,
SubtractFloat64,
MultiplyFloat64,
DivideFloat64,
AddFloat32,
SubtractFloat32,
MultiplyFloat32,
DivideFloat32,
Pow,
SignedPow,
Mod,
SignedMod,
PowFloat64,
PowFloat32,
ModFloat64,
ModFloat32,
LShift,
RSHift,
Xor,
Or,
And,
Not,
Equal,
NotEqual,
GreaterThan,
LessThan,
GreaterOrEqual,
LessOrEqual,
PrintInt64,
PrintUInt64,
PrintInt32,
PrintUInt32,
PrintInt16,
PrintUint16,
PrintInt8,
PrintUInt8,
PrintFloat64,
PrintFloat32,
PrintHex,
PrintBool,
PrintNan,
PrintInf,
PrintString,
LogicalNot,
AddVar,
LoadTOS,
SignedGreaterThan,
SignedLessThan,
SignedGreaterOrEqual,
SignedLessOrEqual,
Float64GreaterThan,
Float64LessThan,
Float64GreaterOrEqual,
Float64LessOrEqual,
Float32GreaterThan,
Float32LessThan,
Float32GreaterOrEqual,
Float32LessOrEqual,
DupTop,
Identity
}
# Constant instructions are instructions that operate on the bytecode constant table
const constantInstructions* = {LoadInt64, LoadUInt64,
LoadInt32, LoadUInt32,
LoadInt16, LoadUInt16,
LoadInt8, LoadUInt8,
LoadFloat64, LoadFloat32,
LoadString}
# Stack triple instructions operate on the stack at arbitrary offsets and pop arguments off of it in the form
# of 24 bit integers
const stackTripleInstructions* = {StoreVar, LoadVar, LoadGlobal}
# Stack double instructions operate on the stack at arbitrary offsets and pop arguments off of it in the form
# of 16 bit integers
const stackDoubleInstructions* = {}
# Argument double argument instructions take hardcoded arguments as 16 bit integers
const argumentDoubleInstructions* = {PopN, }
# Jump instructions jump at relative or absolute bytecode offsets
const jumpInstructions* = {Jump, JumpIfFalse, JumpIfFalsePop,
JumpForwards, JumpBackwards,
JumpIfTrue, JumpIfFalseOrPop}
proc newChunk*: Chunk =
## Initializes a new, empty chunk
result = Chunk(consts: @[], code: @[], lines: @[], functions: @[])
proc write*(self: Chunk, newByte: uint8, line: int) =
## Adds the given instruction at the provided line number
## to the given chunk object
assert line > 0, "line must be greater than zero"
if self.lines.high() >= 1 and self.lines[^2] == line:
self.lines[^1] += 1
else:
self.lines.add(line)
self.lines.add(1)
self.code.add(newByte)
proc write*(self: Chunk, bytes: openarray[uint8], line: int) =
## Calls self.write() in a loop with all members of the
## given array
for cByte in bytes:
self.write(cByte, line)
proc write*(self: Chunk, newByte: OpCode, line: int) =
## Adds the given instruction at the provided line number
## to the given chunk object
self.write(uint8(newByte), line)
proc write*(self: Chunk, bytes: openarray[OpCode], line: int) =
## Calls write in a loop with all members of the given
## array
for cByte in bytes:
self.write(uint8(cByte), line)
proc getLine*(self: Chunk, idx: int): int =
## Returns the associated line of a given
## instruction index
if self.lines.len < 2:
raise newException(IndexDefect, "the chunk object is empty")
var
count: int
current: int = 0
for n in countup(0, self.lines.high(), 2):
count = self.lines[n + 1]
if idx in current - count..<current + count:
return self.lines[n]
current += count
raise newException(IndexDefect, "index out of range")
proc getIdx*(self: Chunk, line: int): int =
## Gets the index into self.lines
## where the line counter for the given
## line is located
for i, v in self.lines:
if (i and 1) != 0 and v == line:
return i
proc writeConstant*(self: Chunk, data: openarray[uint8]): array[3, uint8] =
## Writes a series of bytes to the chunk's constant
## table and returns the index of the first byte as
## an array of 3 bytes
result = self.consts.len().toTriple()
for b in data:
self.consts.add(b)

View File

@ -0,0 +1,277 @@
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import std/strformat
import std/terminal
import backend/bytecode/opcodes
import backend/bytecode/tooling/multibyte
type
Function = object
start, stop, argc: int
name: string
Module = object
start, stop: int
name: string
BytecodeDebugger* = ref object
chunk: Chunk
modules: seq[Module]
functions: seq[Function]
current: int
proc newBytecodeDebugger*: BytecodeDebugger =
## Initializes a new, empty
## debugger object
new(result)
result.functions = @[]
proc nl = stdout.write("\n")
proc printDebug(s: string, newline: bool = false) =
stdout.styledWrite(fgMagenta, "DEBUG - Disassembler -> ")
stdout.styledWrite(fgGreen, s)
if newline:
nl()
proc printName(opcode: OpCode, newline: bool = false) =
stdout.styledWrite(fgRed, $opcode, " (", fgYellow, $uint8(opcode), fgRed, ")")
if newline:
nl()
proc printInstruction(instruction: OpCode, newline: bool = false) =
printDebug("Instruction: ")
printName(instruction)
if newline:
nl()
proc checkFunctionStart(self: BytecodeDebugger, n: int) =
## Checks if a function begins at the given
## bytecode offset
for i, e in self.functions:
# Avoids duplicate output
if n == e.start:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function Start ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $e.start
styledEcho fgGreen, "\t- End offset: ", fgYellow, $e.stop
styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc, "\n"
proc checkFunctionEnd(self: BytecodeDebugger, n: int) =
## Checks if a function ends at the given
## bytecode offset
for i, e in self.functions:
if n == e.stop:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function End ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
proc checkModuleStart(self: BytecodeDebugger, n: int) =
## Checks if a module begins at the given
## bytecode offset
for i, m in self.modules:
if m.start == n:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module Start ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $m.start
styledEcho fgGreen, "\t- End offset: ", fgYellow, $m.stop, "\n"
proc checkModuleEnd(self: BytecodeDebugger, n: int) =
## Checks if a module ends at the given
## bytecode offset
for i, m in self.modules:
if m.stop == n:
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module End ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
proc simpleInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs simple instructions
printInstruction(instruction, true)
self.current += 1
if instruction == Return:
printDebug("Void: ")
if self.chunk.code[self.current] == 0:
stdout.styledWriteLine(fgYellow, "Yes")
else:
stdout.styledWriteLine(fgYellow, "No")
self.current += 1
proc stackTripleInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs instructions that operate on a single value on the stack using a 24-bit operand
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
printInstruction(instruction)
stdout.styledWriteLine(fgGreen, &", points to index ", fgYellow, $slot)
self.current += 4
proc stackDoubleInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs instructions that operate on a single value on the stack using a 16-bit operand
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2]].fromDouble()
printInstruction(instruction)
stdout.write(&", points to index ")
stdout.styledWriteLine(fgGreen, &", points to index ", fgYellow, $slot)
self.current += 3
proc argumentDoubleInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs instructions that operate on a hardcoded value on the stack using a 16-bit operand
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2]].fromDouble()
printInstruction(instruction)
stdout.styledWriteLine(fgGreen, &", has argument ", fgYellow, $slot)
self.current += 3
proc argumentTripleInstruction(self: BytecodeDebugger, instruction: OpCode) {.used.} =
## Debugs instructions that operate on a hardcoded value on the stack using a 24-bit operand
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
printInstruction(instruction)
stdout.styledWriteLine(fgGreen, ", has argument ", fgYellow, $slot)
self.current += 4
proc callInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs function calls
var size = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
self.current += 3
printInstruction(instruction)
styledEcho fgGreen, &", creates frame of size ", fgYellow, $(size + 2), fgGreen
self.current += 1
proc constantInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs instructions that operate on the constant table
var size: uint
if instruction == LoadString:
size = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
self.current += 3
var constant = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
printInstruction(instruction)
stdout.styledWrite(fgGreen, &", points to constant at position ", fgYellow, $constant)
self.current += 4
if instruction == LoadString:
stdout.styledWriteLine(fgGreen, " of length ", fgYellow, $size)
else:
stdout.write("\n")
proc jumpInstruction(self: BytecodeDebugger, instruction: OpCode) =
## Debugs jumps
var orig = self.current
var jump = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple().int()
printInstruction(instruction, true)
printDebug("Jump size: ")
stdout.styledWrite(fgYellow, $jump)
nl()
self.current += 4
while self.chunk.code[self.current] == NoOp.uint8:
inc(self.current)
proc disassembleInstruction*(self: BytecodeDebugger) =
## Takes one bytecode instruction and prints it
let opcode = OpCode(self.chunk.code[self.current])
self.checkModuleStart(self.current)
self.checkFunctionStart(self.current)
printDebug("Offset: ")
stdout.styledWriteLine(fgYellow, $(self.current))
printDebug("Line: ")
stdout.styledWriteLine(fgYellow, &"{self.chunk.getLine(self.current)}")
case opcode:
of simpleInstructions:
self.simpleInstruction(opcode)
# Functions (and modules) only have a single return statement at the
# end of their body, so we never execute this more than once per module/function
if opcode == Return:
# -2 to skip the hardcoded argument to return
# and the increment by simpleInstruction()
self.checkFunctionEnd(self.current - 2)
self.checkModuleEnd(self.current - 1)
of constantInstructions:
self.constantInstruction(opcode)
of stackDoubleInstructions:
self.stackDoubleInstruction(opcode)
of stackTripleInstructions:
self.stackTripleInstruction(opcode)
of argumentDoubleInstructions:
self.argumentDoubleInstruction(opcode)
of Call:
self.callInstruction(opcode)
of jumpInstructions:
self.jumpInstruction(opcode)
else:
echo &"DEBUG - Unknown opcode {opcode} at index {self.current}"
self.current += 1
proc parseFunctions(self: BytecodeDebugger) =
## Parses function information in the chunk
var
start, stop, argc: int
name: string
idx = 0
size = 0
while idx < self.chunk.functions.high():
start = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
idx += 3
stop = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
idx += 3
argc = int(self.chunk.functions[idx])
inc(idx)
size = int([self.chunk.functions[idx], self.chunk.functions[idx + 1]].fromDouble())
idx += 2
name = self.chunk.functions[idx..<idx + size].fromBytes()
inc(idx, size)
self.functions.add(Function(start: start, stop: stop, argc: argc, name: name))
proc parseModules(self: BytecodeDebugger) =
## Parses module information in the chunk
var
start, stop: int
name: string
idx = 0
size = 0
while idx < self.chunk.modules.high():
start = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
idx += 3
stop = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
idx += 3
size = int([self.chunk.modules[idx], self.chunk.modules[idx + 1]].fromDouble())
idx += 2
name = self.chunk.modules[idx..<idx + size].fromBytes()
inc(idx, size)
self.modules.add(Module(start: start, stop: stop, name: name))
proc disassembleChunk*(self: BytecodeDebugger, chunk: Chunk, name: string) =
## Takes a chunk of bytecode and prints it
self.chunk = chunk
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====\n"
self.current = 0
self.parseFunctions()
self.parseModules()
while self.current < self.chunk.code.len:
self.disassembleInstruction()
echo ""
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ===="

View File

@ -0,0 +1,87 @@
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Utilities to handle multibyte sequences
proc toDouble*(input: int | uint | uint16): array[2, uint8] =
## Converts an unsigned integer
## to an array[2, uint8]
result = cast[array[2, uint8]](uint16(input))
proc toTriple*(input: uint | int): array[3, uint8] =
## Converts an unsigned integer to an array[3, uint8]
result = cast[array[3, uint8]](uint(input))
proc toQuad*(input: int | uint | uint16 | uint32): array[4, uint8] =
## Converts an unsigned integer to an array[4, uint8]
result = cast[array[4, uint8]](uint(input))
proc toLong*(input: int | uint | uint16 | uint32 | uint64): array[8, uint8] =
## Converts an unsigned integer to an array[8, uint8]
result = cast[array[8, uint8]](uint(input))
proc fromDouble*(input: array[2, uint8]): uint16 =
## Rebuilds the output of toDouble into
## an uint16
copyMem(result.addr, unsafeAddr(input), sizeof(uint16))
proc fromTriple*(input: array[3, uint8]): uint =
## Rebuilds the output of toTriple into
## an uint
copyMem(result.addr, unsafeAddr(input), sizeof(uint8) * 3)
proc fromQuad*(input: array[4, uint8]): uint =
## Rebuilts the output of toQuad into
## an uint
copyMem(result.addr, unsafeAddr(input), sizeof(uint32))
proc fromLong*(input: array[8, uint8]): uint =
## Rebuilts the output of toQuad into
## an uint
copyMem(result.addr, unsafeAddr(input), sizeof(uint64))
proc toBytes*(s: string): seq[byte] =
## Converts a string into a sequence
## of bytes
for c in s:
result.add(byte(c))
proc toBytes*(s: int): array[8, uint8] =
## Converts
result = cast[array[8, uint8]](s)
proc fromBytes*(input: seq[byte]): string =
## Converts a sequence of bytes to
## a string
var i = 0
while i < input.len():
result.add(char(input[i]))
inc(i)
proc extend*[T](s: var seq[T], a: openarray[T]) =
## Extends s with the elements of a
for e in a:
s.add(e)

View File

@ -0,0 +1,253 @@
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Implementation of the peon bytecode serializer
import std/strformat
import std/strutils
import std/times
import config
import errors
import backend/bytecode/tooling/multibyte
import backend/bytecode/opcodes
type
BytecodeSerializer* = ref object
file: string
filename: string
chunk: Chunk
SerializedBytecode* = ref object
## Wrapper returned by
## the Serializer.read*
## procedures to store
## metadata
version*: tuple[major, minor, patch: int]
branch*: string
commit*: string
compileDate*: int
chunk*: Chunk
size*: int
SerializationError* = ref object of PeonException
proc `$`*(self: SerializedBytecode): string =
result = &"SerializedBytecode(version={self.version.major}.{self.version.minor}.{self.version.patch}, branch={self.branch}), commitHash={self.commit}, date={self.compileDate}, chunk={self.chunk[]}"
proc error(self: BytecodeSerializer, message: string) =
## Raises a formatted SerializationError exception
raise SerializationError(msg: message, file: self.filename)
proc newBytecodeSerializer*(self: BytecodeSerializer = nil): BytecodeSerializer =
new(result)
if self != nil:
result = self
result.file = ""
result.filename = ""
result.chunk = nil
proc writeHeaders(self: BytecodeSerializer, stream: var seq[byte]) =
## Writes the Peon bytecode headers in-place into the
## given byte sequence
stream.extend(PeonBytecodeMarker.toBytes())
stream.add(byte(PEON_VERSION.major))
stream.add(byte(PEON_VERSION.minor))
stream.add(byte(PEON_VERSION.patch))
stream.add(byte(len(PEON_BRANCH)))
stream.extend(PEON_BRANCH.toBytes())
stream.extend(PEON_COMMIT_HASH.toBytes())
stream.extend(getTime().toUnixFloat().int().toBytes())
proc writeLineData(self: BytecodeSerializer, stream: var seq[byte]) =
## Writes line information for debugging
## bytecode instructions to the given byte
## sequence
stream.extend(len(self.chunk.lines).toQuad())
for b in self.chunk.lines:
stream.extend(b.toTriple())
proc writeFunctions(self: BytecodeSerializer, stream: var seq[byte]) =
## Writes debug info about functions to the
## given byte sequence
stream.extend(len(self.chunk.functions).toQuad())
stream.extend(self.chunk.functions)
proc writeConstants(self: BytecodeSerializer, stream: var seq[byte]) =
## Writes the constants table in-place into the
## byte sequence
stream.extend(self.chunk.consts.len().toQuad())
stream.extend(self.chunk.consts)
proc writeModules(self: BytecodeSerializer, stream: var seq[byte]) =
## Writes module information to the given stream
stream.extend(self.chunk.modules.len().toQuad())
stream.extend(self.chunk.modules)
proc writeCode(self: BytecodeSerializer, stream: var seq[byte]) =
## Writes the bytecode from the given chunk to the
## given source stream
stream.extend(self.chunk.code.len.toTriple())
stream.extend(self.chunk.code)
proc readHeaders(self: BytecodeSerializer, stream: seq[byte], serialized: SerializedBytecode): int =
## Reads the bytecode headers from a given sequence
## of bytes
var stream = stream
if stream[0..<len(PeonBytecodeMarker)] != PeonBytecodeMarker.toBytes():
self.error("malformed bytecode marker")
result += len(PeonBytecodeMarker)
stream = stream[len(PeonBytecodeMarker)..^1]
serialized.version = (major: int(stream[0]), minor: int(stream[1]), patch: int(stream[2]))
stream = stream[3..^1]
result += 3
let branchLength = stream[0]
stream = stream[1..^1]
result += 1
serialized.branch = stream[0..<branchLength].fromBytes()
stream = stream[branchLength..^1]
result += int(branchLength)
serialized.commit = stream[0..<40].fromBytes().toLowerAscii()
stream = stream[40..^1]
result += 40
serialized.compileDate = int(fromLong([stream[0], stream[1], stream[2],
stream[3], stream[4], stream[5], stream[6], stream[7]]))
stream = stream[8..^1]
result += 8
proc readLineData(self: BytecodeSerializer, stream: seq[byte]): int =
## Reads line information from a stream
## of bytes
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.lines.add(int([stream[0], stream[1], stream[2]].fromTriple()))
result += 3
stream = stream[3..^1]
doAssert len(self.chunk.lines) == int(size)
proc readFunctions(self: BytecodeSerializer, stream: seq[byte]): int =
## Reads the function segment from a stream
## of bytes
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.functions.add(stream[i])
inc(result)
doAssert len(self.chunk.functions) == int(size)
proc readConstants(self: BytecodeSerializer, stream: seq[byte]): int =
## Reads the constant table from the given
## byte sequence
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.consts.add(stream[i])
inc(result)
doAssert len(self.chunk.consts) == int(size)
proc readModules(self: BytecodeSerializer, stream: seq[byte]): int =
## Reads module information
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
result += 4
var stream = stream[4..^1]
for i in countup(0, int(size) - 1):
self.chunk.modules.add(stream[i])
inc(result)
doAssert len(self.chunk.modules) == int(size)
proc readCode(self: BytecodeSerializer, stream: seq[byte]): int =
## Reads the bytecode from a given byte sequence
let size = [stream[0], stream[1], stream[2]].fromTriple()
var stream = stream[3..^1]
for i in countup(0, int(size) - 1):
self.chunk.code.add(stream[i])
doAssert len(self.chunk.code) == int(size)
return int(size)
proc dumpBytes*(self: BytecodeSerializer, chunk: Chunk, filename: string): seq[byte] =
## Dumps the given chunk to a sequence of bytes and returns it.
## The filename argument is for error reporting only, use dumpFile
## to dump bytecode to a file
self.filename = filename
self.chunk = chunk
self.writeHeaders(result)
self.writeLineData(result)
self.writeFunctions(result)
self.writeConstants(result)
self.writeModules(result)
self.writeCode(result)
proc dumpFile*(self: BytecodeSerializer, chunk: Chunk, filename, dest: string) =
## Dumps the result of dumpBytes to a file at dest
var fp = open(dest, fmWrite)
defer: fp.close()
let data = self.dumpBytes(chunk, filename)
discard fp.writeBytes(data, 0, len(data))
proc loadBytes*(self: BytecodeSerializer, stream: seq[byte]): SerializedBytecode =
## Loads the result from dumpBytes to a Serializer object
## for use in the VM or for inspection
discard self.newBytecodeSerializer()
new(result)
result.chunk = newChunk()
result.size = stream.len()
self.chunk = result.chunk
var stream = stream
try:
stream = stream[self.readHeaders(stream, result)..^1]
stream = stream[self.readLineData(stream)..^1]
stream = stream[self.readFunctions(stream)..^1]
stream = stream[self.readConstants(stream)..^1]
stream = stream[self.readModules(stream)..^1]
stream = stream[self.readCode(stream)..^1]
except IndexDefect:
self.error("truncated bytecode stream")
except AssertionDefect:
self.error(&"corrupted bytecode stream: {getCurrentExceptionMsg()}")
proc loadFile*(self: BytecodeSerializer, src: string): SerializedBytecode =
## Loads a bytecode file
var fp = open(src, fmRead)
defer: fp.close()
let size = fp.getFileSize()
var pos = 0'i64
var data: seq[byte] = newSeqOfCap[byte](size)
for _ in 0..<size:
data.add(0)
while pos < size:
discard fp.readBytes(data, pos, size)
pos = fp.getFilePos()
return self.loadBytes(data)

View File

@ -1,4 +1,4 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -32,12 +32,11 @@ import std/sets
import std/monotimes
when debugVM or debugMem or debugGC or debugAlloc:
import std/sequtils
import std/terminal
import frontend/compiler/targets/bytecode/opcodes
import frontend/compiler/targets/bytecode/util/multibyte
import backend/bytecode/opcodes
import backend/bytecode/tooling/multibyte
when debugVM:
@ -50,7 +49,7 @@ type
## peon objects
String, List,
Dict, Tuple,
CustomType,
Structure,
HeapObject* = object
## A tagged box for a heap-allocated
## peon object
@ -114,7 +113,6 @@ proc newPeonGC*: PeonGC =
proc collect*(self: var PeonVM)
proc reallocate*(self: var PeonVM, p: pointer, oldSize: int, newSize: int): pointer =
## Simple wrapper around realloc with
## built-in garbage collection
@ -217,27 +215,15 @@ proc markRoots(self: var PeonVM): HashSet[ptr HeapObject] =
# will mistakenly assume the object to be reachable, potentially
# leading to a nasty memory leak. Let's just hope a 48+ bit address
# space makes this occurrence rare enough not to be a problem
# handles a single type (uint64), while Lox has a stack
# of heap-allocated structs (which is convenient, but slow).
# What we do instead is store all pointers allocated by us
# in a hash set and then check if any source of roots contained
# any of the integer values that we're keeping track of. Note
# that this means that if a primitive object's value happens to
# collide with an active pointer, the GC will mistakenly assume
# the object to be reachable (potentially leading to a nasty
# memory leak). Hopefully, in a 64-bit address space, this
# occurrence is rare enough for us to ignore
var result = initHashSet[uint64](self.gc.pointers.len())
result = initHashSet[ptr HeapObject](self.gc.pointers.len())
for obj in self.calls:
if obj in self.gc.pointers:
result.incl(obj)
result.incl(cast[ptr HeapObject](obj))
for obj in self.operands:
if obj in self.gc.pointers:
result.incl(obj)
var obj: ptr HeapObject
result.incl(cast[ptr HeapObject](obj))
for p in result:
obj = cast[ptr HeapObject](p)
if obj.mark():
if p.mark():
when debugMarkGC:
echo &"DEBUG - GC: Marked object: {obj[]}"
when debugGC:
@ -270,10 +256,7 @@ proc trace(self: var PeonVM, roots: HashSet[ptr HeapObject]) =
proc free(self: var PeonVM, obj: ptr HeapObject) =
## Frees a single heap-allocated
## peon object and all the memory
## it directly or indirectly owns. Note
## that the pointer itself is not released
## from the GC's internal table and must be
## handled by the caller
## it directly or indirectly owns
when debugAlloc:
echo &"DEBUG - GC: Freeing object: {obj[]}"
case obj.kind:
@ -350,22 +333,11 @@ proc collect(self: var PeonVM) =
# Implementation of the peon VM
proc initCache*(self: var PeonVM) =
## Initializes the VM's
## singletons cache
self.cache[0] = 0x0 # False
self.cache[1] = 0x1 # True
self.cache[2] = 0x2 # Nil
self.cache[3] = 0x3 # Positive inf
self.cache[4] = 0x4 # Negative inf
self.cache[5] = 0x5 # NaN
proc newPeonVM*: PeonVM =
## Initializes a new, blank VM
## for executing Peon bytecode
result.ip = 0
result.initCache()
result.gc = newPeonGC()
result.frames = @[]
result.operands = @[]
@ -380,15 +352,16 @@ func getNil*(self: var PeonVM): uint64 = self.cache[2]
func getBool*(self: var PeonVM, value: bool): uint64 =
if value:
return self.cache[1]
return self.cache[0]
return 0
return 1
func getInf*(self: var PeonVM, positive: bool): uint64 =
if positive:
return self.cache[3]
return self.cache[4]
return cast[uint64](Inf)
return cast[uint64](-Inf)
func getNan*(self: var PeonVM): uint64 = self.cache[5]
func getNan*(self: var PeonVM): uint64 = cast[uint64](NaN)
# Thanks to nim's *genius* idea of making x > y a template
@ -842,17 +815,9 @@ proc dispatch*(self: var PeonVM) {.inline.} =
# Pops a value off the operand stack
discard self.pop()
of PushC:
# Pushes a value from the operand stack
# onto the call stack
# Pops a value off the operand stack
# and pushes it onto the call stack
self.pushc(self.pop())
of PopRepl:
# Pops a peon object off the
# operand stack and prints it.
# Used in interactive REPL mode
if self.frames.len() !> 1:
discard self.pop()
continue
echo self.pop()
of PopN:
# Pops N elements off the call stack
for _ in 0..<int(self.readShort()):
@ -1003,7 +968,10 @@ proc dispatch*(self: var PeonVM) {.inline.} =
of Float32LessOrEqual:
self.push(self.getBool(cast[float32](self.pop()) <= cast[float32](self.pop())))
of Identity:
# Identity is implemented simply as pointer equality :)
self.push(cast[uint64](self.pop() == self.pop()))
of LogicalNot:
self.push(uint64(not self.pop().bool))
# Print opcodes
of PrintInt64:
echo cast[int64](self.pop())
@ -1033,7 +1001,7 @@ proc dispatch*(self: var PeonVM) {.inline.} =
else:
echo "false"
of PrintInf:
if self.pop() == 0x3:
if self.pop() == self.getInf(positive=true):
echo "inf"
else:
echo "-inf"
@ -1046,8 +1014,6 @@ proc dispatch*(self: var PeonVM) {.inline.} =
stdout.write("\n")
of SysClock64:
self.push(cast[uint64](getMonoTime().ticks.float() / 1_000_000_000))
of LogicalNot:
self.push(uint64(not self.pop().bool))
else:
discard
@ -1066,11 +1032,11 @@ proc run*(self: var PeonVM, chunk: Chunk, breakpoints: seq[uint64] = @[], repl:
try:
self.dispatch()
except Defect as e:
stderr.writeLine(&"Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
stderr.writeLine(&"VM: Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
except CatchableError as e:
stderr.writeLine(&"Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
stderr.writeLine(&"VM: Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
except NilAccessDefect:
stderr.writeLine(&"Memory Access Violation (bytecode offset {self.ip}): SIGSEGV")
stderr.writeLine(&"VM: Memory Access Violation (bytecode offset {self.ip}): SIGSEGV")
quit(1)
if not repl:
# We clean up after ourselves!
@ -1095,4 +1061,4 @@ proc resume*(self: var PeonVM, chunk: Chunk) =
quit(1)
{.pop.}
{.pop.}

View File

@ -1,4 +1,4 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -16,26 +16,33 @@ import std/strformat
import std/os
# These variables can be tweaked to debug and test various components of the toolchain
var debugLexer* = false # Print the tokenizer's output
var debugParser* = false # Print the AST generated by the parser
type
PeonBackend* = enum
Bytecode,
NativeC # Coming soon
# These variables can be tweaked to debug and test various components of the toolchain. Do not modify them directly,
# use the command-line options instead (or -d:option=value for constants)
var debugLexer* = false # Print the tokenizer's output (main module only)
var debugParser* = false # Print the AST generated by the parser (main module only)
var debugTypeChecker* = false # Debug the typechecker's output (main module only)
var debugCompiler* = false # Disassemble and/or print the code generated by the compiler
var debugSerializer* = false # Validate the bytecode serializer's output
const debugVM* {.booldefine.} = false # Enable the runtime debugger in the bytecode VM
const debugGC* {.booldefine.} = false # Debug the Garbage Collector (extremely verbose)
const debugAlloc* {.booldefine.} = false # Trace object allocation (extremely verbose)
const debugMem* {.booldefine.} = false # Debug the memory allocator (extremely verbose)
var debugSerializer* = false # Validate the bytecode serializer's output
const debugStressGC* {.booldefine.} = false # Make the GC run a collection at every allocation (VERY SLOW!)
const debugMarkGC* {.booldefine.} = false # Trace the marking phase object by object (extremely verbose)
const PeonBytecodeMarker* = "PEON_BYTECODE" # Magic value at the beginning of bytecode files
const HeapGrowFactor* = 2 # The growth factor used by the GC to schedule the next collection
const FirstGC* = 1024 * 1024; # How many bytes to allocate before running the first GC
const enableVMChecks* {.booldefine.} = true; # Enables all types of compiler (nim-wise) checks in the VM
const enableVMChecks* {.booldefine.} = true; # Enables all types of compiler checks in the VM
# List of paths where peon looks for modules, in order (empty path means current directory, which always takes precedence)
const moduleLookupPaths*: seq[string] = @["", "src/peon/stdlib", absolutePath(joinPath(".local", "peon", "stdlib"), getenv("HOME"))]
when HeapGrowFactor <= 1:
{.fatal: "Heap growth factor must be > 1".}
const PeonVersion* = (major: 0, minor: 1, patch: 0)
const PeonVersion* = (major: 0, minor: 2, patch: 0)
const PeonRelease* = "alpha"
const PeonCommitHash* = staticExec("git rev-parse HEAD")
const PeonBranch* = staticExec("git symbolic-ref HEAD 2>/dev/null | cut -f 3 -d /")
@ -45,37 +52,44 @@ const HelpMessage* = """The peon programming language, Copyright (C) 2023 Mattia
This program is free software, see the license distributed with this program or check
http://www.apache.org/licenses/LICENSE-2.0 for more info.
Note: This is very much a work in progress
Basic Usage
-----------
$ peon file.pn Run the given Peon source file
$ peon file.pbc Run the given Peon bytecode file
peon [options] <file>[.pn] Run the given peon file
peon [options] file.pbc Run the given peon bytecode file
Options
-------
-h, --help Show this help text and exit
-v, --version Print the current peon version and exit
-s, --string Execute the passed string as if it was a file
-n, --noDump Don't dump the result of compilation to a file.
Note that no dump is created when using -s/--string
-b, --breakpoints Run the debugger at specific bytecode offsets (comma-separated).
Only available with --target:bytecode and when compiled with VM
debugging on (-d:debugVM at build time)
-d, --disassemble Disassemble the output of compilation (only makes sense with --target:bytecode)
-m, --mode Set the compilation mode. Acceptable values are 'debug' and
'release'. Defaults to 'debug'
-c, --compile Compile the code, but do not execute it. Useful along with -d
-w, --warnings Turn warnings on or off (default: on). Acceptable values are
yes/on and no/off
--noWarn Disable a specific warning (for example, --noWarn:unusedVariable)
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
--target Select the compilation target (valid values are: 'c' and 'bytecode'). Defaults to
'bytecode'
-o, --output Rename the output file with this value (with --target:bytecode, a '.pbc' extension
is added if not already present)
--debug-dump Debug the bytecode serializer. Only makes sense with --target:bytecode
--debug-lexer Show the lexer's output
--debug-parser Show the parser's output
-h, --help Show this help text and exit
-v, --version Print the current peon version and exit
-s, --string Use the passed string as if it was a file
-w, --warnings Turn warnings on or off (default: on). Acceptable values are
yes/on and no/off
--noWarn Disable a specific warning (example: --noWarn:UserWarning)
--noGen Don't generate any code (i.e. stop at the typechecking stage)
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
--debugLexer Show the lexer's output
--debugParser Show the parser's output
--debugTypeChecker Show the typechecker's output
--debugCompiler Show the generated code (backend-specific)
--listWarns Show a list of all warnings
-b, --backend Select the compilation backend. Currently only supports 'bytecode' (the default)
-c, --compile Compile the code, but do not run the main module
-o, --output Rename the output executable to this (a "bc" extension is added for bytecode files,
if not already present)
-s, --string Run the given string as if it were a file (the filename is set to '<string>')
--cacheDir Specify a directory where the peon compiler will dump code generation results
to speed up subsequent builds. Defaults to ".buildcache"
The following options are specific to the 'bytecode' backend:
-n, --noDump Do not dump bytecode files to the source directory. Note that
no files are dumped when using -s/--string
--breakpoints Set debugging breakpoints at the given bytecode offsets.
Input should be a comma-separated list of positive integers
(spacing is irrelevant). Only works if peon was compiled with
-d:debugVM
"""

View File

@ -1,4 +1,4 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -19,3 +19,6 @@ type
## peon failure (not to be used directly)
file*: string # The file where the error occurred
line*: int # The line where the error occurred
CodeGenError* = ref object of PeonException
## An exception for a code generation failure

File diff suppressed because it is too large Load Diff

View File

@ -1,71 +0,0 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## The code generator for translating peon to C code
import std/tables
import std/strformat
import std/algorithm
import std/parseutils
import std/strutils
import std/sequtils
import std/sets
import std/os
import frontend/compiler/compiler
import frontend/parsing/lexer
import frontend/parsing/parser
import frontend/parsing/ast
type
CompilerFunc = object
## An internal compiler function called
## by pragmas
kind: PragmaKind
handler: proc (self: NativeCCompiler, pragma: Pragma, name: Name)
NativeCCompiler* = ref object of Compiler
## The peon to C compiler
# Compiler procedures called by pragmas
compilerProcs: TableRef[string, CompilerFunc]
proc newNativeCCompiler*(replMode: bool = false): NativeCCompiler =
## Initializes a new, blank, NativeCCompiler
## object
new(result)
result.ast = @[]
result.current = 0
result.file = ""
result.names = @[]
result.depth = 0
result.lines = @[]
result.currentFunction = nil
result.replMode = replMode
result.currentModule = nil
result.compilerProcs = newTable[string, CompilerFunc]()
result.source = ""
result.lexer = newLexer()
result.lexer.fillSymbolTable()
result.parser = newParser()
result.isMainModule = false
result.disabledWarnings = @[]
method literal*(self: Compiler, node: ASTNode, compile: bool = true): Type {.discardable.} =
## Compiles literal expressions

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,308 @@
import errors
import frontend/parsing/parser
import std/tables
export ast, errors
type
IntegerSize* = enum
## Integer size enumeration
Tiny = 8
Short = 16
Long = 32
LongLong = 64
FloatSize* = enum
## Float size enumeration
Half = 32
Full = 64
TypeKind* = enum
## Enumeration of compile-time types
Integer,
Float,
String,
NaN,
Infinity,
Boolean,
Any,
Typevar,
Auto,
Byte,
Char,
Structure,
EnumEntry,
Reference,
Pointer,
Union,
Function,
Lent,
Const,
Generic
Type* = ref object
## A compile-time type
# Is this a type constant?
constant*: bool
# Can it be mutated?
mutable*: bool
# Is it a compiler intrinsic?
intrinsic*: bool
# Mapping of generic names to types
genericTypes*: TableRef[string, Type]
genericValues*: TableRef[string, Type]
# Type pragmas
pragmas*: TableRef[string, Pragma]
case kind*: TypeKind
of Integer:
signed*: bool
size*: IntegerSize
of Float:
width*: FloatSize
of Infinity:
positive*: bool
of Function:
isLambda*: bool
isGenerator*: bool
isCoroutine*: bool
isAuto*: bool
parameters*: TypeSignature
returnType*: Type
unsafe*: bool
of Typevar:
wrapped*: Type
of Structure:
name*: string
fields*: TableRef[string, Type]
parent*: Type
interfaces*: seq[Type]
isEnum*: bool
of Reference, Pointer, Lent, Const:
value*: Type
of Generic, Union:
types*: seq[tuple[match: bool, kind: Type, value: Expression]]
else:
discard
WarningKind* {.pure.} = enum
## A warning enumeration type
UserWarning
NameKind* {.pure.} = enum
## A name enumeration type
Default, Var, Module
Name* = ref object
## A generic name object
# Type of the identifier (NOT of the value!)
case kind*: NameKind
of Module:
path*: string
# Full absolute path of the module,
# including the extension
absPath*: string
# Just for easier lookup, it's all
# pointers anyway
names*: TableRef[string, Name]
of NameKind.Var:
# If the variable's value is another
# name, this attribute contains its
# name object. This is useful for things
# like assigning functions to variables and
# then calling the variable like it's the
# original function
assignedName*: Name
else:
discard
# The name's identifier
ident*: IdentExpr
# Owner of the identifier (module)
module*: Name
# File where the name is declared
file*: string
# Scope depth
depth*: int
# Is this name private?
isPrivate*: bool
# The type of the name's associated
# value
valueType*: Type
# The function that "owns" this name (may be nil!)
owner*: Name
# Where is this node declared in its file?
line*: int
# The AST node associated with this node. This
# is needed because we tyoecheck function and type
# declarations only if, and when, they're actually
# used
node*: Declaration
# Who is this name exported to? (Only makes sense if isPrivate
# equals false)
exportedTo*: seq[Name]
TypeSignature* = seq[tuple[name: string, kind: Type, default: TypedExpr]]
## Our typed AST representation
TypedNode* = ref object of RootObj
## A generic typed AST node
node*: ASTNode
TypedExpr* = ref object of TypedNode
## A generic typed expression
kind*: Type
TypedUnaryExpr* = ref object of TypedExpr
## A generic typed unary expression
a*: TypedExpr
TypedBinaryExpr* = ref object of TypedUnaryExpr
## A generic typed binary expression
b*: TypedExpr
TypedIdentExpr* = ref object of TypedExpr
## A typed identifier expression
name*: Name
TypedCallExpr* = ref object of TypedExpr
## A typed function call expression
callee*: Name
args*: seq[tuple[name: string, kind: Type, default: TypedExpr]]
TypedDecl* = ref object of TypedNode
## A typed declaration node
name*: Name # The declaration's name object
TypedVarDecl* = ref object of TypedDecl
## A typed variable declaration node
init*: TypedExpr
TypedTypeDecl* = ref object of TypedDecl
## A typed type declaration node
fields*: TableRef[string, TypedExpr]
parent*: Name
interfaces*: seq[TypedTypeDecl]
TypedEnumDecl* = ref object of TypedTypeDecl
## A typed enum declaration node
enumeration*: Type
variants: seq[TypedTypeDecl]
TypedFunDecl* = ref object of TypedDecl
## A typed function declaration
args*: seq[tuple[name: Name, default: TypedExpr]]
body*: TypedBlockStmt
TypedStmt* = ref object of TypedNode
## A typed statement node
TypedExprStmt* = ref object of TypedStmt
expression*: TypedExpr
TypedBlockStmt* = ref object of TypedStmt
## A typed block statement
body*: seq[TypedNode]
TypedIfStmt* = ref object of TypedStmt
## A typed if statement node
thenBranch*: TypedBlockStmt
elseBranch*: TypedBlockStmt
condition*: TypedExpr
TypedWhileStmt* = ref object of TypedStmt
## A typed while statement node
body*: TypedBlockStmt
condition*: TypedExpr
proc newTypedNode*(node: ASTNode): TypedNode =
## Initializes a new typed node
new(result)
result.node = node
proc newTypedExpr*(node: Expression, kind: Type): TypedExpr =
## Initializes a new typed expression
result = TypedExpr(node: node, kind: kind)
proc newTypedDecl*(node: Declaration, name: Name): TypedDecl =
## Initializes a new typed declaration
result = TypedDecl(node: node, name: name)
proc newTypedTypeDecl*(node: TypeDecl, name: Name, fields: TableRef[string, TypedExpr], parent: Name): TypedTypeDecl =
## Initializes a new typed function declaration
result = TypedTypeDecl(node: node, name: name, fields: fields, parent: parent)
proc newTypedEnumDecl*(node: TypeDecl, name: Name, variants: seq[TypedTypeDecl], enumeration: Type): TypedEnumDecl =
## Initializes a new typed function declaration
result = TypedEnumDecl(node: node, name: name, variants: variants, enumeration: enumeration)
proc newTypedFunDecl*(node: FunDecl, name: Name, body: TypedBlockStmt): TypedFunDecl =
## Initializes a new typed function declaration
result = TypedFunDecl(node: node, name: name, body: body)
proc newTypedVarDecl*(node: VarDecl, name: Name, init: TypedExpr): TypedVarDecl =
## Initializes a new typed function declaration
result = TypedVarDecl(node: node, name: name, init: init)
proc newTypedIdentExpr*(node: IdentExpr, name: Name): TypedIdentExpr =
## Initializes a new typed identifier expression
result = TypedIdentExpr(node: node, name: name, kind: name.valueType)
proc newTypedUnaryExpr*(node: UnaryExpr, kind: Type, a: TypedExpr): TypedUnaryExpr =
## Initializes a new typed unary expression
result = TypedUnaryExpr(node: node, a: a, kind: kind)
proc newTypedBinaryExpr*(node: UnaryExpr, kind: Type, a, b: TypedExpr): TypedBinaryExpr =
## Initializes a new typed binary expression
result = TypedBinaryExpr(node: node, a: a, b: b, kind: kind)
proc newTypedCallExpr*(node: CallExpr, callee: Name,
args: seq[tuple[name: string, kind: Type, default: TypedExpr]]): TypedCallExpr =
## Initializes a new typed function call expression
result = TypedCallExpr(node: node, callee: callee, args: args, kind: callee.valueType.returnType)
proc newTypedBlockStmt*(node: BlockStmt, body: seq[TypedNode]): TypedBlockStmt =
## Initializes a new typed block statement
result = TypedBlockStmt(node: node, body: body)
proc newTypedWhileStmt*(node: WhileStmt, body: TypedBlockStmt, condition: TypedExpr): TypedWhileStmt =
## Initializes a new typed while statement
result = TypedWhileStmt(node: node, body: body, condition: condition)
proc newTypedIfStmt*(node: IfStmt, thenBranch, elseBranch: TypedBlockStmt, condition: TypedExpr): TypedIfStmt =
## Initializes a new typed block statement
result = TypedIfStmt(node: node, thenBranch: thenBranch,
elseBranch: elseBranch, condition: condition)
proc getName*(self: TypedNode): Name =
## Gets the name object associated with the
## given typed node, if it has any
case self.node.kind:
of identExpr:
result = TypedIdentExpr(self).name
of NodeKind.funDecl, NodeKind.varDecl, NodeKind.typeDecl:
result = TypedDecl(self).name
else:
result = nil # TODO

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -39,6 +39,10 @@ type
# purposes
keywords: TableRef[string, TokenType]
symbols: TableRef[string, TokenType]
StringParseMode = enum
Default, Raw, Format, Byte
Lexer* = ref object
## A lexer object
symbols*: SymbolTable
@ -53,6 +57,7 @@ type
linePos: int
lineCurrent: int
spaces: int
LexingError* = ref object of PeonException
## A lexing exception
lexer*: Lexer
@ -233,8 +238,7 @@ proc peek(self: Lexer, distance: int = 0, length: int = 1): string =
## may be empty
var i = distance
while len(result) < length:
if self.done() or self.current + i > self.source.high() or
self.current + i < 0:
if self.current + i > self.source.high() or self.current + i < 0:
break
else:
result.add(self.source[self.current + i])
@ -315,17 +319,26 @@ proc parseEscape(self: Lexer) =
## likely be soon. Another notable limitation is that
## \xhhh and \nnn are limited to the size of a char
## (i.e. uint8, or 256 values)
# TODO: Modifying the source is a bad idea. Currently commenting out
# the code in here and just using it for validation purposes
case self.peek()[0]: # We use a char instead of a string because of how case statements handle ranges with strings
# (i.e. not well, given they crash the C code generator)
of 'a':
self.source[self.current] = cast[char](0x07)
# self.source[self.current] = cast[char](0x07)
discard
of 'b':
self.source[self.current] = cast[char](0x7f)
# self.source[self.current] = cast[char](0x7f)
discard
of 'e':
self.source[self.current] = cast[char](0x1B)
# self.source[self.current] = cast[char](0x1B)
discard
of 'f':
self.source[self.current] = cast[char](0x0C)
# self.source[self.current] = cast[char](0x0C)
discard
of 'n':
#[
when defined(windows):
# We natively convert LF to CRLF on Windows, and
# gotta thank Microsoft for the extra boilerplate!
@ -336,53 +349,59 @@ proc parseEscape(self: Lexer) =
self.source[self.current] = cast[char](0x0A)
when defined(linux):
self.source[self.current] = cast[char](0X0D)
]#
discard
of 'r':
self.source[self.current] = cast[char](0x0D)
# self.source[self.current] = cast[char](0x0D)
discard
of 't':
self.source[self.current] = cast[char](0x09)
# self.source[self.current] = cast[char](0x09)
discard
of 'v':
self.source[self.current] = cast[char](0x0B)
# self.source[self.current] = cast[char](0x0B)
discard
of '"':
self.source[self.current] = '"'
# self.source[self.current] = '"'
discard
of '\'':
self.source[self.current] = '\''
# self.source[self.current] = '\''
discard
of '\\':
self.source[self.current] = cast[char](0x5C)
# self.source[self.current] = cast[char](0x5C)
discard
of '0'..'9': # This is the reason we're using char instead of string. See https://github.com/nim-lang/Nim/issues/19678
var code = ""
var value = 0
var i = self.current
while i < self.source.high() and (let c = self.source[
i].toLowerAscii(); c in '0'..'7') and len(code) < 3:
while i < self.source.high() and (let c = self.source[i].toLowerAscii(); c in '0'..'7') and len(code) < 3:
code &= self.source[i]
i += 1
assert parseOct(code, value) == code.len()
if value > uint8.high().int:
self.error("escape sequence value too large (> 255)")
self.source[self.current] = cast[char](value)
# self.source[self.current] = cast[char](value)
of 'u', 'U':
self.error("unicode escape sequences are not supported (yet)")
self.error("unicode escape sequences are not supported yet")
of 'x':
var code = ""
var value = 0
var i = self.current
while i < self.source.high() and (let c = self.source[
i].toLowerAscii(); c in 'a'..'f' or c in '0'..'9'):
while i < self.source.high() and (let c = self.source[i].toLowerAscii(); c in 'a'..'f' or c in '0'..'9'):
code &= self.source[i]
i += 1
assert parseHex(code, value) == code.len()
if value > uint8.high().int:
self.error("escape sequence value too large (> 255)")
self.source[self.current] = cast[char](value)
# self.source[self.current] = cast[char](value)
else:
self.error(&"invalid escape sequence '\\{self.peek()}'")
proc parseString(self: Lexer, delimiter: string, mode: string = "single") =
## Parses string literals. They can be expressed using matching pairs
## of either single or double quotes. Most C-style escape sequences are
## supported, moreover, a specific prefix may be prepended
## to the string to instruct the lexer on how to parse it:
proc parseString(self: Lexer, delimiter: string, mode: StringParseMode = Default) =
## Parses string and character literals. They can be expressed using
## matching pairs of double or single quotes respectively. Most C-style
## escape sequences are supported, moreover, a specific prefix may be
## prepended to the string to instruct the lexer on how to parse it:
## - b -> declares a byte string, where each character is
## interpreted as an integer instead of a character
## - r -> declares a raw string literal, where escape sequences
@ -397,56 +416,45 @@ proc parseString(self: Lexer, delimiter: string, mode: string = "single") =
## strings, so a multi-line string prefixed with the "r" modifier
## is redundant, although multi-line byte/format strings are supported
var slen = 0
while not self.check(delimiter) and not self.done():
if self.match("\n"):
if mode == "multi":
self.incLine()
else:
self.error("unexpected EOL while parsing string literal")
if mode in ["raw", "multi"]:
while not self.check(delimiter) and not self.done():
inc(slen)
if mode == Raw:
discard self.step()
elif self.match("\\"):
# This madness here serves to get rid of the slash, since \x is mapped
# to a one-byte sequence but the string '\x' is actually 2 bytes (or more,
# depending on the specific escape sequence)
self.source = self.source[0..<self.current] & self.source[
self.current + 1..^1]
self.parseEscape()
if mode == "format" and self.match("{"):
discard self.step()
continue
elif mode == Format:
if self.match("{"):
self.source = self.source[0..<self.current] & self.source[
self.current + 1..^1]
continue
while not self.check(["}", "\""]):
discard self.step()
if self.check("\""):
self.error("unclosed '{' in format string")
elif mode == "format" and self.check("}"):
if not self.check("}", 1):
if self.match("{"):
continue
while not self.check(["}", "\""]):
discard self.step()
if self.check("\""):
self.error("unclosed '{' in format string")
elif self.check("}") and not self.check("}", 1):
self.error("unmatched '}' in format string")
else:
self.source = self.source[0..<self.current] & self.source[
self.current + 1..^1]
discard self.step()
inc(slen)
if slen > 1 and delimiter == "'":
self.error("invalid character literal (length must be one!)")
if mode == "multi":
if not self.match(delimiter.repeat(3)):
self.error("unexpected EOL while parsing multi-line string literal")
elif self.done() and self.peek(-1) != delimiter:
self.error("unexpected EOF while parsing string literal")
if self.done() and not self.match(delimiter):
if delimiter == "'":
self.error("unexpected EOF while parsing character literal")
else:
self.error("unexpected EOF while parsing string literal")
else:
discard self.step()
if delimiter == "\"":
if delimiter != "'":
self.createToken(String)
else:
if slen == 0:
self.error("character literal cannot be of length zero")
elif slen > 1:
self.error("invalid character literal (length must be one!)")
self.createToken(Char)
proc parseBinary(self: Lexer) =
## Parses binary numbers
while self.peek().isDigit():
while self.peek().isDigit() and not self.done():
if not self.check(["0", "1"]):
self.error(&"invalid digit '{self.peek()}' in binary literal")
discard self.step()
@ -454,7 +462,7 @@ proc parseBinary(self: Lexer) =
proc parseOctal(self: Lexer) =
## Parses octal numbers
while self.peek().isDigit():
while self.peek().isDigit() and not self.done():
if self.peek() notin "0".."7":
self.error(&"invalid digit '{self.peek()}' in octal literal")
discard self.step()
@ -462,7 +470,7 @@ proc parseOctal(self: Lexer) =
proc parseHex(self: Lexer) =
## Parses hexadecimal numbers
while self.peek().isAlphaNumeric():
while self.peek().isAlphaNumeric() and not self.done():
if not self.peek().isDigit() and self.peek().toLowerAscii() notin "a".."f":
self.error(&"invalid hexadecimal literal")
discard self.step()
@ -508,7 +516,7 @@ proc parseNumber(self: Lexer) =
elif self.check("."):
# TODO: Is there a better way?
discard self.step()
if not isDigit(self.peek()):
if not isDigit(self.peek()) or self.done():
self.error("invalid float number literal")
kind = Float
while isDigit(self.peek()) and not self.done():
@ -526,18 +534,18 @@ proc parseNumber(self: Lexer) =
proc parseBackticks(self: Lexer) =
## Parses tokens surrounded
## by backticks. This may be used
## for name stropping as well as to
## reimplement existing operators
## (e.g. +, -, etc.) without the
## parser complaining about syntax
## errors
## Parses any character surrounded
## by backticks and produces a single
## identifier. This allows using any
## otherwise "illegal" character as part
## of the identifier (like unicode runes),
## except for newlines, tabs, carriage returns
## and other useless/confusing escape sequences
## like \e and \f
while not self.match("`") and not self.done():
if self.peek().isAlphaNumeric() or self.symbols.existsSymbol(self.peek()):
discard self.step()
continue
self.error(&"unexpected character: '{self.peek()}'")
if self.match(["\n", "\t", "\e", "\r", "\e"]):
self.error(&"unexpected character in stropped identifier: '{self.peek()}'")
discard self.step()
self.createToken(Identifier)
# Strips the backticks
self.tokens[^1].lexeme = self.tokens[^1].lexeme[1..^2]
@ -545,9 +553,9 @@ proc parseBackticks(self: Lexer) =
proc parseIdentifier(self: Lexer) =
## Parses keywords and identifiers.
## Note that multi-character tokens
## (aka UTF runes) are not supported
## by design and *will* break things
## This function handles ASCII characters
## only. For unicode support, parseBackticks
## is used instead
while (self.peek().isAlphaNumeric() or self.check("_")) and not self.done():
discard self.step()
let name: string = self.source[self.start..<self.current]
@ -586,13 +594,12 @@ proc next(self: Lexer) =
self.parseBackticks()
elif self.match(["\"", "'"]):
# String or character literal
var mode = "single"
var delimiter = self.peek(-1)
if self.peek(-1) != "'" and self.check(self.peek(-1)) and self.check(
self.peek(-1), 1):
# Multiline strings start with 3 quotes
discard self.step(2)
mode = "multi"
self.parseString(self.peek(-1), mode)
delimiter.add(self.step(2))
self.parseString(self.peek(-1), Default)
elif self.peek().isDigit():
discard self.step() # Needed because parseNumber reads the next
# character to tell the base of the number
@ -600,13 +607,19 @@ proc next(self: Lexer) =
self.parseNumber()
elif self.peek().isAlphaNumeric() and self.check(["\"", "'"], 1):
# Prefixed string literal (i.e. f"Hi {name}!")
var mode = Default
var delimiter = self.step()
if self.peek(-1) != "'" and self.check(self.peek(-1)) and self.check(
self.peek(-1), 1):
# Multiline strings start with 3 quotes
delimiter.add(self.step(2))
case self.step():
of "r":
self.parseString(self.step(), "raw")
self.parseString(delimiter, Raw)
of "b":
self.parseString(self.step(), "bytes")
self.parseString(self.step(), Byte)
of "f":
self.parseString(self.step(), "format")
self.parseString(self.step(), Format)
else:
self.error(&"unknown string prefix '{self.peek(-1)}'")
elif self.peek().isAlphaNumeric() or self.check("_"):
@ -641,8 +654,10 @@ proc next(self: Lexer) =
return
dec(n)
# We just assume what we have in front of us
# is a symbol
discard self.step()
# is a symbol and parse as much as possible (i.e.
# until a space is found)
while not self.check(" ") and not self.done():
discard self.step()
self.createToken(Symbol)

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -34,14 +34,14 @@ type
Function, Break, Continue,
Var, Let, Const, Return,
Coroutine, Generator, Import,
Raise, Assert, Await, Foreach,
Assert, Await, Foreach,
Yield, Type, Operator, Case,
Enum, From, Ptr, Ref, Object,
Enum, Ptr, Ref, Object,
Export, Block, Switch, Lent
# Literal types
Integer, Float, String, Identifier,
Binary, Octal, Hex, Char
Binary, Octal, Hex, Char, Nan, Inf
# Brackets, parentheses,
# operators and others
@ -80,4 +80,4 @@ proc `$`*(self: Token): string =
proc `==`*(self, other: Token): bool =
## Returns self == other
return self.kind == other.kind and self.lexeme == other.lexeme
return self.kind == other.kind and self.lexeme == other.lexeme

View File

@ -1,70 +0,0 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import util/fmterr
import util/symbols
import frontend/parsing/lexer
import frontend/parsing/parser
import frontend/compiler/compiler
import std/strformat
proc `$`(self: TypedNode): string =
if self.node.isConst():
var self = TypedExpr(self)
return &"{self.node}: {self.kind[]}"
case self.node.kind:
of varDecl, typeDecl, funDecl:
var self = TypedDecl(self)
result = &"{self.name[]}: {self.name.valueType[]}"
of identExpr, binaryExpr, unaryExpr:
var self = TypedExpr(self)
result &= &"{self.node}: {self.kind[]}"
else:
result = &"{self.node}: ? ({self.node.kind})"
proc main =
var
lexer = newLexer()
parser = newParser()
compiler = newPeonCompiler()
source: string
file = "test.pn"
lexer.fillSymbolTable()
while true:
stdout.write(">>> ")
stdout.flushFile()
try:
source = stdin.readLine()
for typedNode in compiler.compile(parser.parse(lexer.lex(source, file), file, lexer.getLines(), lexer.getSource()), lexer.getFile(), lexer.getSource(),
showMismatches=true):
echo &"{typedNode.node} -> {compiler.stringify(typedNode)}\n"
except IOError:
echo ""
break
except LexingError as exc:
print(exc)
except ParseError as exc:
print(exc)
except CompileError as exc:
print(exc)
when isMainModule:
setControlCHook(proc () {.noconv.} = echo ""; quit(0))
main()

413
src/peon.nim Normal file
View File

@ -0,0 +1,413 @@
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import config
import util/fmterr
import util/symbols
import frontend/parsing/lexer
import frontend/parsing/parser
import frontend/compiler/typechecker
import backend/bytecode/codegen/generator
import backend/bytecode/tooling/serializer
import backend/bytecode/opcodes
import backend/bytecode/tooling/debugger
import backend/bytecode/vm
import std/os
import std/parseopt
import std/strutils
import std/terminal
import std/strformat
import std/times
# Thanks art <3
#[
import jale/editor as ed
import jale/templates
import jale/plugin/defaults
import jale/plugin/editor_history
import jale/keycodes
import jale/multiline
proc getLineEditor: LineEditor =
result = newLineEditor()
result.prompt = "=> "
result.populateDefaults()
let history = result.plugHistory()
result.bindHistory(history)
]#
proc `$`(self: TypedNode): string =
if self.node.isConst():
var self = TypedExpr(self)
return &"{self.node}: {self.kind[]}"
case self.node.kind:
of varDecl, typeDecl, funDecl:
var self = TypedDecl(self)
result = &"{self.name[]}: {self.name.valueType[]}"
of identExpr, binaryExpr, unaryExpr:
var self = TypedExpr(self)
result &= &"{self.node}: {self.kind[]}"
else:
result = &"{self.node}: ? ({self.node.kind})"
proc runFile(filename: string, fromString: bool = false, dump: bool = true, generate: bool = true, breakpoints: seq[uint64] = @[],
disabledWarnings: seq[WarningKind] = @[], mismatches: bool = false, run: bool = true,
backend: PeonBackend = PeonBackend.Bytecode, output: string, cacheDir: string) =
var
tokens: seq[Token]
tree: ParseTree
typedNodes: seq[TypedNode]
tokenizer = newLexer()
parser = newParser()
typeChecker = newTypeChecker()
input: string
filename = filename
isBinary = false
output = output
tokenizer.fillSymbolTable()
try:
if not fromString and filename.endsWith(".pbc"):
isBinary = true
if fromString:
input = filename
filename = "<string>"
else:
input = readFile(filename)
if not isBinary:
tokens = tokenizer.lex(input, filename)
if tokens.len() == 0:
return
if debugLexer:
styledEcho fgCyan, "Tokenizer output:"
for i, token in tokens:
if i == tokens.high():
# Who cares about EOF?
break
styledEcho fgGreen, "\t", $token
echo ""
tree = parser.parse(tokens, filename, tokenizer.getLines(), input)
if tree.len() == 0:
return
if debugParser:
styledEcho fgCyan, "Parser output:"
for node in tree:
styledEcho fgGreen, "\t", $node
echo ""
typedNodes = typeChecker.validate(tree, filename, tokenizer.getSource(), mismatches, disabledWarnings)
if debugTypeChecker:
styledEcho fgCyan, "Typechecker output:"
for typedNode in typedNodes:
case typedNode.node.kind:
of exprStmt:
# *Technically* an expression statement has no type, but that isn't really useful for debug
# purposes, so we print the type of the expression within it instead
let exprNode = TypedExprStmt(typedNode).expression
styledEcho fgGreen, &"\t{typedNode.node} (inner) -> {typeChecker.stringify(exprNode.kind)}\n"
else:
styledEcho fgGreen, &"\t{typedNode.node} -> {typeChecker.stringify(typedNode)}\n"
if not generate:
return
case backend:
of PeonBackend.Bytecode:
var
debugger = newBytecodeDebugger()
generator = newBytecodeGenerator()
serializer = newBytecodeSerializer()
vm = newPeonVM()
chunk: Chunk = newChunk()
serialized: SerializedBytecode
if not isBinary:
chunk = generator.generate(typedNodes, typeChecker)
serialized = serializer.loadBytes(serializer.dumpBytes(chunk, filename))
else:
serialized = serializer.loadFile(filename)
chunk = serialized.chunk
if dump and not fromString:
if not output.endsWith(".pbc"):
output.add(".pbc")
if not dirExists(cacheDir):
createDir(cacheDir)
serializer.dumpFile(chunk, joinPath(cacheDir, filename), output)
if debugCompiler:
styledEcho fgCyan, "Disassembler output below"
debugger.disassembleChunk(chunk, filename)
if debugSerializer:
styledEcho fgCyan, "Serializer checks: "
styledEcho fgBlue, "\t- Peon version: ", fgYellow, &"{serialized.version.major}.{serialized.version.minor}.{serialized.version.patch}", fgBlue, " (commit ", fgYellow, serialized.commit[0..8], fgBlue, ") on branch ", fgYellow, serialized.branch
stdout.styledWriteLine(fgBlue, "\t- Compilation date & time: ", fgYellow, fromUnix(serialized.compileDate).format("d/M/yyyy HH:mm:ss"))
stdout.styledWriteLine(fgBlue, "\t- Total binary size: ", fgYellow, formatSize(serialized.size))
stdout.styledWrite(fgBlue, &"\t- Constants segment: ")
if serialized.chunk.consts == chunk.consts:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, &"\t- Code segment: ")
if serialized.chunk.code == chunk.code:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Line info segment: ")
if serialized.chunk.lines == chunk.lines:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Functions segment: ")
if serialized.chunk.functions == chunk.functions:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
stdout.styledWrite(fgBlue, "\t- Modules segment: ")
if serialized.chunk.modules == chunk.modules:
styledEcho fgGreen, "OK"
else:
styledEcho fgRed, "Corrupted"
if run:
vm.run(chunk, breakpoints, repl=false)
else:
discard
except LexingError as exc:
print(exc)
except ParseError as exc:
print(exc)
except TypeCheckError as exc:
print(exc)
except CodeGenError as exc:
var file = exc.file
if file notin ["<string>", ""]:
file = relativePath(file, getCurrentDir())
stderr.styledWriteLine(fgRed, styleBright, "Error while generating code for ", fgYellow, file, fgDefault, &": {exc.msg}")
except SerializationError as exc:
var file = exc.file
if file notin ["<string>", ""]:
file = relativePath(file, getCurrentDir())
stderr.styledWriteLine(fgRed, styleBright, "Error while (de-)serializing ", fgYellow, file, fgDefault, &": {exc.msg}")
except IOError as exc:
stderr.styledWriteLine(fgRed, styleBright, "Error while trying to read ", fgYellow, filename, fgDefault, &": {exc.msg}")
except OSError as exc:
stderr.styledWriteLine(fgRed, styleBright, "Error while trying to read ", fgYellow, filename, fgDefault, &": {exc.msg} ({osErrorMsg(osLastError())})",
fgRed, "[errno ", fgYellow, $osLastError(), fgRed, "]")
#[
proc repl(warnings: seq[WarningKind] = @[], showMismatches: bool = false) =
var
keep = true
tokens: seq[Token]
tree: ParseTree
typeChecker = newTypeChecker()
lexer = newLexer()
parser = newParser()
editor = getLineEditor()
input: string
lexer.fillSymbolTable()
editor.bindEvent(jeQuit):
stdout.styledWriteLine(fgGreen, "Goodbye!")
keep = false
input = ""
editor.bindKey("ctrl+a"):
editor.content.home()
editor.bindKey("ctrl+e"):
editor.content.`end`()
while keep:
try:
input = editor.read()
if input == "#clear":
stdout.write("\x1Bc")
continue
elif input == "":
continue
tokens = lexer.lex(input, "stdin")
if tokens.len() == 0:
continue
if debugLexer:
styledEcho fgCyan, "Tokenizer output:"
for i, token in tokens:
if i == tokens.high():
# Who cares about EOF?
break
styledEcho fgGreen, "\t", $token
echo ""
tree = parser.parse(tokens, "stdin", lexer.getLines(), input, persist=true)
if tree.len() == 0:
continue
if debugParser:
styledEcho fgCyan, "Parser output:"
for node in tree:
styledEcho fgGreen, "\t", $node
echo ""
if debugTypeChecker:
styledEcho fgCyan, "Typechecker output:"
for typedNode in typeChecker.validate(parser.parse(lexer.lex(input, "<stdin>"), lexer.getFile(), lexer.getLines(), lexer.getSource()),
lexer.getFile(), lexer.getSource(), showMismatches=showMismatches, disabledWarnings=warnings):
if debugTypeChecker:
styledEcho fgGreen, &"\t{typedNode.node} -> {typeChecker.stringify(typedNode)}\n"
echo ""
except LexingError:
print(LexingError(getCurrentException()))
except ParseError:
print(ParseError(getCurrentException()))
except TypeCheckError:
print(TypeCheckError(getCurrentException()))
quit(0)
]#
when isMainModule:
setControlCHook(proc () {.noconv.} = quit(0))
var
optParser = initOptParser(commandLineParams())
file: string
fromString: bool
dump = true
warnings: seq[WarningKind] = @[]
showMismatches = false
cachePath: string = ".buildcache"
#mode: CompileMode = CompileMode.Debug
run = true
generateCode = true
backend: PeonBackend
output: string
breakpoints: seq[uint64]
for kind, key, value in optParser.getopt():
case kind:
of cmdArgument:
file = key
of cmdLongOption:
case key:
#[
of "mode":
if value.toLowerAscii() == "release":
mode = CompileMode.Release
elif value.toLowerAscii() == "debug":
discard
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid value for option 'mode' (valid options are: debug, release)")
quit()
]#
of "help":
echo HELP_MESSAGE
quit()
of "version":
echo PEON_VERSION_STRING
quit()
of "string":
file = key
fromString = true
of "noGen":
generateCode = false
of "noDump":
dump = false
of "warnings":
if value.toLowerAscii() in ["yes", "on"]:
warnings = @[]
elif value.toLowerAscii() in ["no", "off"]:
for warning in WarningKind:
warnings.add(warning)
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid value for option 'warnings' (valid options are: yes, on, no, off)")
quit()
of "showMismatches":
showMismatches = true
of "noWarn":
case value:
of "UserWarning":
warnings.add(UserWarning)
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid warning name for option 'noWarn'")
quit()
of "listWarns":
echo "Currently supported warnings: "
for warning in WarningKind:
echo &" - {warning}"
quit(0)
of "debugTypeChecker":
debugTypeChecker = true
of "debugCompiler":
debugCompiler = true
of "debugSerializer":
debugSerializer = true
of "compile":
run = false
of "output":
output = value
of "backend":
case value:
of "bytecode":
backend = PeonBackend.Bytecode
of "c":
backend = PeonBackend.NativeC
of "debug-dump":
debugSerializer = true
of "debugLexer":
debugLexer = true
of "debugParser":
debugParser = true
of "cachePath":
cachePath = value
of "breakpoints":
when debugVM:
for point in value.strip(chars={' '}).split(","):
try:
breakpoints.add(parseBiggestUInt(point))
except ValueError:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"error: invalid breakpoint value '{point}'")
quit()
when not debugVM:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "VM debugging is off, cannot set breakpoints (recompile with -d:debugVM to fix this)")
quit()
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"unkown long option '{key}'")
quit()
of cmdShortOption:
case key:
of "o":
output = value
of "h":
echo HELP_MESSAGE
quit()
of "v":
echo PEON_VERSION_STRING
quit()
of "s":
file = key
fromString = true
of "n":
dump = false
of "w":
if value.toLowerAscii() in ["yes", "on"]:
warnings = @[]
elif value.toLowerAscii() in ["no", "off"]:
for warning in WarningKind:
warnings.add(warning)
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid value for option 'w' (valid options are: yes, on, no, off)")
quit()
of "c":
run = false
else:
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"unkown short option '{key}'")
quit()
else:
echo "usage: peon [options] [filename.pn]"
quit()
if file == "":
echo "Sorry, the REPL is broken :("
# repl(warnings, showMismatches, backend, dump)
else:
runFile(file, fromString, dump, generateCode, breakpoints, warnings, showMismatches, run, backend, output, cachePath)

View File

@ -1,4 +1,4 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -12,12 +12,15 @@
# See the License for the specific language governing permissions and
# limitations under the License.
## Utilities to print formatted error messages to stderr
import frontend/compiler/compiler
## Utilities to format peon exceptions into human-readable error messages
## and print them
import frontend/compiler/typechecker
import frontend/parsing/parser
import frontend/parsing/lexer
import errors
export errors
import std/os
import std/terminal
@ -25,36 +28,37 @@ import std/strutils
import std/strformat
proc printError(file, line: string, lineNo: int, pos: tuple[start, stop: int], fn: Declaration, msg: string) =
## Internal helper to print a formatted error message
## to stderr
stderr.styledWrite(fgRed, styleBright, "Error in ", fgYellow, &"{file}:{lineNo}:{pos.start}")
proc formatError*(outFile = stderr, file, line: string, lineNo: int, pos: tuple[start, stop: int], fn: Declaration, msg: string, includeSource = true) =
## Helper to write a formatted error message to the given file object
outFile.styledWrite(fgRed, styleBright, "Error in ", fgYellow, &"{file}:{lineNo}:{pos.start}")
if not fn.isNil() and fn.kind == funDecl:
# Error occurred inside a (named) function
stderr.styledWrite(fgRed, styleBright, " in function ", fgYellow, FunDecl(fn).name.token.lexeme)
stderr.styledWriteLine(styleBright, fgDefault, ": ", msg)
if line.len() > 0:
stderr.styledWrite(fgRed, styleBright, "Source line: ", resetStyle, fgDefault, line[0..<pos.start])
if pos.stop == line.len():
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..<pos.stop])
stderr.styledWriteLine(fgDefault, line[pos.stop..^1])
outFile.styledWriteLine(styleBright, fgDefault, ": ", msg)
if line.len() > 0 and includeSource:
# Print the line where the error occurred and underline the exact node that caused
# the error. Might be inaccurate, but definitely better than nothing
outFile.styledWrite(fgRed, styleBright, "Source line: ", resetStyle, fgDefault, line[0..<pos.start])
outFile.styledWrite(fgRed, styleUnderscore, line[pos.start..pos.stop])
if pos.stop + 1 <= line.high():
outFile.styledWriteLine(fgDefault, line[pos.stop + 1..^1])
else:
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..pos.stop])
stderr.styledWriteLine(fgDefault, line[pos.stop + 1..^1])
outFile.styledWriteLine(fgDefault, "")
proc print*(exc: CompileError) =
proc print*(exc: TypeCheckError, includeSource = true) =
## Prints a formatted error message
## for compilation errors to stderr
## for type checking errors to stderr
var file = exc.file
var contents = ""
case exc.line:
of -1: discard
of 0: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line]
else: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
printError(file, contents, exc.line, exc.node.getRelativeBoundaries(), exc.function, exc.msg)
of 0: contents = exc.instance.getSource().strip(chars={'\n'}).splitLines()[exc.line]
else: contents = exc.instance.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
formatError(stderr, file, contents, exc.line, exc.node.getRelativeBoundaries(), exc.function, exc.msg, includeSource)
proc print*(exc: ParseError) =
proc print*(exc: ParseError, includeSource = true) =
## Prints a formatted error message
## for parsing errors to stderr
var file = exc.file
@ -65,10 +69,10 @@ proc print*(exc: ParseError) =
contents = exc.parser.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
else:
contents = ""
printError(file, contents, exc.line, exc.token.relPos, exc.parser.getCurrentFunction(), exc.msg)
formatError(stderr, file, contents, exc.line, exc.token.relPos, nil, exc.msg, includeSource)
proc print*(exc: LexingError) =
proc print*(exc: LexingError, includeSource = true) =
## Prints a formatted error message
## for lexing errors to stderr
var file = exc.file
@ -76,8 +80,8 @@ proc print*(exc: LexingError) =
file = relativePath(exc.file, getCurrentDir())
var contents = ""
if exc.line != -1:
contents = exc.lexer.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
contents = exc.lexer.getSource().splitLines()[exc.line - 1]
else:
contents = ""
printError(file, contents, exc.line, exc.pos, nil, exc.msg)
formatError(stderr, file, contents, exc.line, exc.pos, nil, exc.msg, includeSource)

View File

@ -1,47 +1,131 @@
import ../frontend/parsing/lexer
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import frontend/parsing/lexer
import std/tables
export tables
var tokens* = {"{": TokenType.LeftBrace,
"}": TokenType.RightBrace,
"(": TokenType.LeftParen,
")": TokenType.RightParen,
"[": TokenType.LeftBracket,
"]": TokenType.RightBracket,
".": TokenType.Dot,
",": TokenType.Comma,
":": TokenType.Semicolon,
"type": TokenType.Type,
"enum": TokenType.Enum,
"case": TokenType.Case,
"operator": TokenType.Operator,
"generator": TokenType.Generator,
"fn": TokenType.Function,
"coroutine": TokenType.Coroutine,
"break": TokenType.Break,
"continue": TokenType.Continue,
"while": TokenType.While,
"for": TokenType.For,
"foreach": TokenType.Foreach,
"if": TokenType.If,
"else": TokenType.Else,
"await": TokenType.Await,
"assert": TokenType.Assert,
"const": TokenType.Const,
"let": TokenType.Let,
"var": TokenType.Var,
"import": TokenType.Import,
"yield": TokenType.Yield,
"return": TokenType.Return,
"object": TokenType.Object,
"export": TokenType.Export,
"block": TokenType.Block,
"switch": TokenType.Switch,
"lent": TokenType.Lent,
"true": TokenType.True,
"false": TokenType.False,
"inf": TokenType.Inf,
"ptr": TokenType.Ptr,
"nan": TokenType.Nan,
"inf": TokenType.Inf,
}.toTable()
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
">>", "<<"]:
tokens[sym] = TokenType.Symbol
proc fillSymbolTable*(tokenizer: Lexer) =
## Initializes the Lexer's symbol
## table with builtin symbols and
## keywords
# Specialized symbols for which we need a specific token type
# for easier handling in the parser (it's nicer to use enum members
# rather than strings whenever possible)
tokenizer.symbols.addSymbol("{", TokenType.LeftBrace)
tokenizer.symbols.addSymbol("}", TokenType.RightBrace)
tokenizer.symbols.addSymbol("(", TokenType.LeftParen)
tokenizer.symbols.addSymbol(")", TokenType.RightParen)
tokenizer.symbols.addSymbol("[", TokenType.LeftBracket)
tokenizer.symbols.addSymbol("]", TokenType.RightBracket)
tokenizer.symbols.addSymbol(".", TokenType.Dot)
tokenizer.symbols.addSymbol(",", TokenType.Comma)
tokenizer.symbols.addSymbol(";", TokenType.Semicolon)
# 1-byte symbols
tokenizer.symbols.addSymbol("{", LeftBrace)
tokenizer.symbols.addSymbol("}", RightBrace)
tokenizer.symbols.addSymbol("(", LeftParen)
tokenizer.symbols.addSymbol(")", RightParen)
tokenizer.symbols.addSymbol("[", LeftBracket)
tokenizer.symbols.addSymbol("]", RightBracket)
tokenizer.symbols.addSymbol(".", Dot)
tokenizer.symbols.addSymbol(",", Comma)
tokenizer.symbols.addSymbol(";", Semicolon)
# Keywords
# Generic symbols avoid us the need to create a gazillion members of the
# TokenType enum. These are also not handled directly in the parser, but
# rather processed as classes of operators based on precedence, so using
# strings is less of a concern
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
">>", "<<"]:
tokenizer.symbols.addSymbol(sym, TokenType.Symbol)
# Keywords. We differentiate keywords from symbols because they have priority
# over the latter, and also because the lexer internally uses the symbol map to do
# maximal matching and it's helpful not to increase the amount of substrings we
# need to check (especially because keywords match exactly and uniquely, while symbols
# can share substrings)
tokenizer.symbols.addKeyword("type", TokenType.Type)
tokenizer.symbols.addKeyword("enum", Enum)
tokenizer.symbols.addKeyword("case", Case)
tokenizer.symbols.addKeyword("operator", Operator)
tokenizer.symbols.addKeyword("generator", Generator)
tokenizer.symbols.addKeyword("enum", TokenType.Enum)
tokenizer.symbols.addKeyword("case", TokenType.Case)
tokenizer.symbols.addKeyword("operator", TokenType.Operator)
tokenizer.symbols.addKeyword("generator", TokenType.Generator)
tokenizer.symbols.addKeyword("fn", TokenType.Function)
tokenizer.symbols.addKeyword("coroutine", Coroutine)
tokenizer.symbols.addKeyword("coroutine", TokenType.Coroutine)
tokenizer.symbols.addKeyword("break", TokenType.Break)
tokenizer.symbols.addKeyword("continue", Continue)
tokenizer.symbols.addKeyword("while", While)
tokenizer.symbols.addKeyword("for", For)
tokenizer.symbols.addKeyword("foreach", Foreach)
tokenizer.symbols.addKeyword("if", If)
tokenizer.symbols.addKeyword("else", Else)
tokenizer.symbols.addKeyword("continue", TokenType.Continue)
tokenizer.symbols.addKeyword("while", TokenType.While)
tokenizer.symbols.addKeyword("for", TokenType.For)
tokenizer.symbols.addKeyword("foreach", TokenType.Foreach)
tokenizer.symbols.addKeyword("if", TokenType.If)
tokenizer.symbols.addKeyword("else", TokenType.Else)
tokenizer.symbols.addKeyword("await", TokenType.Await)
tokenizer.symbols.addKeyword("raise", TokenType.Raise)
tokenizer.symbols.addKeyword("assert", TokenType.Assert)
tokenizer.symbols.addKeyword("const", Const)
tokenizer.symbols.addKeyword("let", Let)
tokenizer.symbols.addKeyword("const", TokenType.Const)
tokenizer.symbols.addKeyword("let", TokenType.Let)
tokenizer.symbols.addKeyword("var", TokenType.Var)
tokenizer.symbols.addKeyword("import", Import)
tokenizer.symbols.addKeyword("import", TokenType.Import)
tokenizer.symbols.addKeyword("yield", TokenType.Yield)
tokenizer.symbols.addKeyword("return", TokenType.Return)
tokenizer.symbols.addKeyword("object", Object)
tokenizer.symbols.addKeyword("export", Export)
tokenizer.symbols.addKeyword("object", TokenType.Object)
tokenizer.symbols.addKeyword("export", TokenType.Export)
tokenizer.symbols.addKeyword("block", TokenType.Block)
tokenizer.symbols.addKeyword("switch", TokenType.Switch)
tokenizer.symbols.addKeyword("lent", TokenType.Lent)
@ -50,11 +134,9 @@ proc fillSymbolTable*(tokenizer: Lexer) =
# but we don't need to care about that until
# we're in the parsing/compilation steps so
# it's fine
tokenizer.symbols.addKeyword("true", True)
tokenizer.symbols.addKeyword("false", False)
tokenizer.symbols.addKeyword("true", TokenType.True)
tokenizer.symbols.addKeyword("false", TokenType.False)
tokenizer.symbols.addKeyword("ref", TokenType.Ref)
tokenizer.symbols.addKeyword("ptr", TokenType.Ptr)
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
">>", "<<"]:
tokenizer.symbols.addSymbol(sym, Symbol)
tokenizer.symbols.addKeyword("nan", TokenType.Nan)
tokenizer.symbols.addKeyword("inf", TokenType.Inf)

286
src/util/testing.nim Normal file
View File

@ -0,0 +1,286 @@
# Copyright 2024 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Peon's own custom test suite. Because it's much better to spend a month rolling your
## own solution rather than spending 2 hours learning testament. Yeah, I suffer from NIH
## syndrome, so?
import std/strformat
import std/strutils
import std/sequtils
import frontend/parsing/lexer
import util/symbols
type
TestStatus* = enum
## Test status enumeration
Init, Running, Success,
Failed, Crashed,
TimedOut, Skipped
TestKind* = enum
## Test type enumeration
Tokenizer, Parser, TypeChecker,
Runtime
TestRunner = proc (suite: TestSuite, test: Test)
# Represents a test outcome. The exc field contains
# the exception raised during the test, if any. The
# error field indicates whether the test errored out
# or not. If exc is non-null and error is false, this
# means the error was expected behavior
TestOutcome = tuple[error: bool, exc: ref Exception, line: int, location: tuple[start, stop: int]]
Test* {.inheritable.} = ref object
## A generic test object
skip*: bool # Skip running this test if true
name*: string # Test name. Only useful for displaying purposes
kind*: TestKind # Test kind (tokenizer, parser, compiler, etc.)
source*: string # The source input of the test. Usually peon code
status*: TestStatus # The test's current state
expected*: TestStatus # The test's expected final state after run()
outcome*: TestOutcome # The test's outcome
runnerFunc: TestRunner # The test's internal runner function
reason*: string # A human readable reason why the test failed
TokenizerTest* = ref object of Test
## A tokenization test. Allows to specify
## a desired error message and error location
## upon tokenization failure
message: string
location: tuple[start, stop: int]
line: int
lexer: Lexer
tokens: seq[TokenType]
TestSuite* = ref object
## A suite of tests
tests*: seq[Test]
proc `$`(self: tuple[start, stop: int]): string =
if self == (-1, -1):
result = "none"
else:
result = &"(start={self.start}, stop={self.stop})"
proc `$`(self: TestOutcome): string =
result &= &"Outcome(error={self.error}"
if not self.exc.isNil():
var name = ($self.exc.name).split(":")[0]
result &= &", exc=(name='{name}', msg='{self.exc.msg}')"
if self.line != -1:
result &= &", line={self.line}"
if self.location != (-1, -1):
result &= &", location={self.location}"
result &= ")"
proc `$`*(self: Test): string =
case self.kind:
of Tokenizer:
var self = TokenizerTest(self)
return &"TokenizerTest(name='{self.name}', status={self.status}, outcome={self.outcome}, source='{self.source.escape()}', location={self.location}, message='{self.message}')"
else:
# TODO
return ""
proc setup(self: TokenizerTest) =
self.lexer = newLexer()
self.lexer.fillSymbolTable()
proc tokenizeSucceedsRunner(suite: TestSuite, test: Test) =
## Runs a tokenitazion test that is expected to succeed
## and checks that it returns the tokens we expect
var test = TokenizerTest(test)
test.setup()
try:
let tokens = test.lexer.lex(test.source, test.name)
if tokens.len() != test.tokens.len() :
test.status = Failed
test.reason = &"Number of provided tokens ({test.tokens.len()}) does not match number of returned tokens ({tokens.len()})"
return
var i = 0
for (token, kind) in zip(tokens, test.tokens):
if token.kind != kind:
test.status = Failed
test.reason = &"Token type mismatch at #{i}: expected {kind}, got {token.kind}"
return
inc(i)
except LexingError:
var exc = LexingError(getCurrentException())
test.outcome.location = exc.pos
test.outcome.line = exc.line
test.status = Failed
test.outcome.error = true
test.outcome.exc = getCurrentException()
return
except CatchableError:
test.status = Crashed
test.outcome.error = true
test.outcome.exc = getCurrentException()
return
test.status = Success
proc tokenizeFailsRunner(suite: TestSuite, test: Test) =
## Runs a tokenitazion test that is expected to fail
## and checks that it does so in the way we expect
var test = TokenizerTest(test)
test.setup()
try:
discard test.lexer.lex(test.source, test.name)
except LexingError:
var exc = LexingError(getCurrentException())
test.outcome.location = exc.pos
test.outcome.line = exc.line
if exc.pos == test.location and exc.line == test.line and exc.msg == test.message:
test.status = Success
else:
test.status = Failed
test.outcome.error = true
test.outcome.exc = getCurrentException()
return
except CatchableError:
test.status = Crashed
test.outcome.error = true
test.outcome.exc = getCurrentException()
return
test.status = Failed
proc newTestSuite*: TestSuite =
## Creates a new test suite
new(result)
proc addTest*(self: TestSuite, test: Test) =
## Adds a test to the test suite
self.tests.add(test)
proc addTests*(self: TestSuite, tests: openarray[Test]) =
## Adds the given tests to the test suite
for test in tests:
self.addTest(test)
proc removeTest*(self: TestSuite, test: Test) =
## Removes the given test from the test suite
self.tests.delete(self.tests.find(test))
proc removeTests*(self: TestSuite, tests: openarray[Test]) =
## Removes the given tests from the test suite
for test in tests:
self.removeTest(test)
proc newTokenizeTest(name, source: string, skip = false): TokenizerTest =
## Internal helper to initialize a tokenization test
new(result)
result.name = name
result.kind = Tokenizer
result.status = Init
result.source = source
result.skip = skip
result.line = -1
result.outcome.line = -1
result.outcome.location = (-1, -1)
result.location = (-1, -1)
result.message = ""
proc testTokenizeSucceeds*(name, source: string, tokens: seq[TokenType], skip = false): Test =
## Creates a new tokenizer test that is expected to succeed.
## The type of each token returned by the tokenizer is matched
## against the given list of token types: the test only succeeds
## if no discrepancies are found
var test = newTokenizeTest(name, source, skip)
test.runnerFunc = tokenizeSucceedsRunner
test.tokens = tokens
result = Test(test)
result.expected = Success
proc testTokenizeFails*(name, source: string, message: string, line: int, location: tuple[start, stop: int], skip = false): Test =
## Creates a new tokenizer test that is expected to fail with the
## given error message and at the given location
var test = newTokenizeTest(name, source, skip)
test.runnerFunc = tokenizeFailsRunner
test.message = message
test.location = location
test.line = line
result = Test(test)
result.expected = Failed
proc run*(self: TestSuite) =
## Runs the test suite to completion,
## sequentially
for test in self.tests:
if test.skip:
test.status = Skipped
continue
test.runnerFunc(self, test)
proc successful*(self: TestSuite): bool =
## Returns whether the test suite completed
## successfully or not. If called before run(),
## this function returns false. Skipped tests
## do not affect the outcome of this function
result = true
for test in self.tests:
if test.status in [Skipped, Success]:
continue
result = false
break
proc getExpectedException(self: TokenizerTest): ref Exception =
## Gets the exception that we expect to be
## raised by the test. Could be nil if we
## expect no errors
if self.expected == Success:
return nil
return LexingError(msg: self.message, line: self.line, file: self.name, lexer: self.lexer, pos: self.location)
proc getExpectedOutcome(self: TokenizerTest): TestOutcome =
## Gets the expected outcome of a tokenization test
if self.expected == Success:
return (false, self.getExpectedException(), -1, (-1, -1))
else:
return (false, self.getExpectedException, self.line, self.location)
proc getExpectedOutcome*(self: Test): TestOutcome =
## Returns the expected outcome of a test
doAssert self.expected in [Success, Failed], "expected outcome is neither Success nor Failed: wtf?"
case self.kind:
of Tokenizer:
return TokenizerTest(self).getExpectedOutcome()
else:
# TODO
discard

52
tests/generics.pn Normal file
View File

@ -0,0 +1,52 @@
type int64 = object {
#pragma[magic: "int64"]
}
# In peon, all objects are "first class", meaning they can be passed around as
# values: Just like you can pass around instances of the int64 type (1, 2, etc.),
# peon allows you to pass around the int64 type itself; This is awesome for expressiveness,
# but it creates a few ambiguities when trying to figure out whether "int64" means "a value
# of type int64" or "the type int64 itself": for this reason, generic declarations split their
# arguments into two parts, generic values and generic types. Generic types go in between
# angle brackets, while generic values go in between square brackets. This means that the type described
# below has one generic argument T that is the integer type itself, and another generic argument V that
# is a value of type int64. This fixes the ambiguity and keeps the generic instantiation syntax as simple
# as possible. This syntax is very useful in cases like the built-in array type: it allows the syntax for it
# to be just array[T, N], where T is the type of its elements and N is its size
type Test<T: int64>[V: int64] = object {
## This structure holds both the int64 type
## and a value of type int64
typeObj: T;
value: V;
}
type Test2<T: Test>[V: Test] = object {
## This structure holds both the Test
## type and a (concrete) instance of Test
typeObj: T;
value: V;
}
# Feel free to uncomment these and see how the typechecker reacts (hopefully it fails lol)
Test[int64, 1]; # Works: int64 is a type and 1 is a value of type int64
# Test[int64, int64]; # Error: expecting an expression of type int64, got typevar[int64] instead
# Test[1, int64]; # Error: expecting an expression of type typevar[int64], got int64 instead
Test2[Test, Test[int64, 1]]; # This also works. Nested generic instantiation go brrrr
# Test2[Test[int64, 1], Test]; # Error: expecting an expression of type typevar[Test<T: typevar[int64]>[V: int64]], got Test<T: typevar[int64]>[V: int64] instead
# P.S.: You might be wondering "what the hell is a typevar?". Good question!
# A typevar is a special built-in type that represents a... type. Yeah, not very
# useful hm? Think of it like this: when you declare an object Foo, the object you
# get by referencing it is of type typevar[Foo]: this means "Foo is a type". When you
# construct an instance, say x, of Foo, its type is just Foo. Typevars are mostly needed
# in places where you want to enforce that some value must be a type and not a value. A
# name of type typevar[int64 | int32], for example, means "I want this thing to either be
# the int32 type or the int64 type, and NOT an instance of them". If you've ever used Python,
# you can think of typevar as the "type" class, but on steroids
# P.P.S: A typevar is a generic type, so usually it wouldn't be possible to use it by itself.
# For convenience purposes however, peon allows the use of a bare typevar by replacing it with
# typevar[any] (any is another special built-in type that means "anything that has a type")

86
tests/tokenize.nim Normal file
View File

@ -0,0 +1,86 @@
import util/testing
import util/fmterr
import util/symbols
import frontend/parsing/lexer
import std/strformat
when isMainModule:
var suite = newTestSuite()
suite.addTests(
[
testTokenizeSucceeds("emptyFile", "", @[TokenType.EndOfFile]),
testTokenizeSucceeds("newLine", "\n", @[TokenType.EndOfFile]),
testTokenizeSucceeds("CarriageReturn", "\r", @[TokenType.EndOfFile]),
testTokenizeSucceeds("emptyString", "\"\"", @[TokenType.String, TokenType.EndOfFile]),
testTokenizeSucceeds("escapedSingleQuote", "'\\''", @[TokenType.Char, TokenType.EndOfFile]),
testTokenizeSucceeds("escapedDoubleQuote", """ "\"" """, @[TokenType.String, TokenType.EndOfFile]),
testTokenizeSucceeds("bareUnicode", "🌎 😂 👩‍👩‍👦‍👦", @[TokenType.Symbol, TokenType.Symbol, TokenType.Symbol, TokenType.EndOfFile]),
testTokenizeSucceeds("stroppedSingleUnicode", "`🌎` `😂` `👩‍👩‍👦‍👦`", @[TokenType.Identifier, TokenType.Identifier, TokenType.Identifier, TokenType.EndOfFile]),
testTokenizeSucceeds("stroppedMultiUnicode", "`🌎🌎` `😂😂` `👩‍👩‍👦‍👦👩‍👩‍👦‍👦`", @[TokenType.Identifier, TokenType.Identifier, TokenType.Identifier, TokenType.EndOfFile]),
testTokenizeSucceeds("stringWithEscapes", """ "\n\t\r\e\f" """, @[TokenType.String, TokenType.EndOfFile]),
testTokenizeSucceeds("allIntegers", "1 0x1 0o1 0b1", @[TokenType.Integer, TokenType.Hex, TokenType.Octal, TokenType.Binary, TokenType.EndOfFile]),
testTokenizeSucceeds("sizedNumbers", "1'u8 0x1'i8 0o1'i64 0b1'u32 2.0'f32 1e5'f64 1E5'f32 1.5e4'f64 1.5E4'f32",
@[TokenType.Integer, TokenType.Hex, TokenType.Octal, TokenType.Binary,
TokenType.Float, TokenType.Float, TokenType.Float, TokenType.Float, TokenType.Float,
TokenType.EndOfFile]),
testTokenizeSucceeds("allFloats", "1.0 1e5 1E5 1.5e4 1.5E4", @[TokenType.Float, TokenType.Float, TokenType.Float,
TokenType.Float, TokenType.Float, TokenType.EndOfFile]),
testTokenizeFails("invalidFloatEndsWithDot", "2.", "invalid float number literal", line=1, location=(0, 1)),
testTokenizeFails("invalidFloatSpuriousChats", "2.f", "invalid float number literal", line=1, location=(0, 1)),
testTokenizeFails("unterminatedChar", "'", "unexpected EOF while parsing character literal", line=1, location=(0, 0)),
testTokenizeFails("emptyChar", "''", "character literal cannot be of length zero", line=1, location=(0, 1)),
testTokenizeFails("charTooLong", "'ab'", "invalid character literal (length must be one!)", line=1, location=(0, 3)),
testTokenizeFails("unterminatedString", "\"", "unexpected EOF while parsing string literal", line=1, location=(0, 0)),
testTokenizeFails("unterminatedCharWithExtraContent", "'o;", "unexpected EOF while parsing character literal", line=1, location=(0, 2)),
testTokenizeFails("unterminatedStringWithExtraContent", "\"o;", "unexpected EOF while parsing string literal", line=1, location=(0, 2)),
testTokenizeFails("unterminatedCharWithNewline", "'\\n;", "unexpected EOF while parsing character literal", line=1, location=(0, 3)),
testTokenizeFails("unterminatedStringWithNewline", "\"\\n;", "unexpected EOF while parsing string literal", line=1, location=(0, 3)),
testTokenizeFails("illegalTabs", "\t", "tabs are not allowed in peon code, use spaces for indentation instead", line=1, location=(0, 0)),
]
)
var allTokens = ""
var allTokensList = newSeqOfCap[TokenType](symbols.tokens.len())
for lexeme in symbols.tokens.keys():
allTokens.add(&"{lexeme} ")
if lexeme == "_":
# Due to how the lexer is designed, a bare underscore is
# parsed as an identifier rather than a symbol
allTokensList.add(TokenType.Identifier)
else:
allTokensList.add(symbols.tokens[lexeme])
allTokensList.add(TokenType.EndOfFile)
suite.addTest(testTokenizeSucceeds("allTokens", allTokens, allTokensList))
const skippedChars = [';', '\'', '\n', '\\', '\t', '\e', '\a', '\r'];
var
characters = ""
charTokens = newSeqOfCap[TokenType](256)
for value in 0..255:
charTokens.add(Char)
if char(value) in skippedChars:
# These cases are special and we handle them separately
continue
characters.add(&"'{char(value)}'")
charTokens.add(TokenType.EndOfFile)
characters.add("""';' '\'' '\n' '\\' '\t' '\e' '\a' '\r'""")
suite.addTest(testTokenizeSucceeds("allCharacters", characters, charTokens))
suite.run()
echo "Tokenization test results: "
for test in suite.tests:
echo &" - {test.name} -> {test.status}"
if test.status in [Failed, Crashed]:
echo &" Details:"
echo &" - Outcome: {test.outcome}"
echo &" - Expected state: {test.expected} "
echo &" - Expected outcome: {test.getExpectedOutcome()}"
echo &"\n The test failed for the following reason: {test.reason}\n"
if not test.outcome.exc.isNil():
echo &"\n Formatted error message follows\n"
print(LexingError(test.outcome.exc))
echo "\n Formatted error message ends here\n"
if suite.successful():
echo "OK: All tokenizer tests were successful"
quit(0)
quit(-1)