Compare commits
47 Commits
6296341cb9
...
ecbdf120e3
Author | SHA1 | Date |
---|---|---|
Mattia Giambirtone | ecbdf120e3 | |
Mattia Giambirtone | 92993535d7 | |
Mattia Giambirtone | 20cca2c185 | |
Mattia Giambirtone | b0b7739a63 | |
Mattia Giambirtone | c5450a9e19 | |
Mattia Giambirtone | f1d2386175 | |
Mattia Giambirtone | 6db44570ae | |
Mattia Giambirtone | e061bb399b | |
Mattia Giambirtone | 79f3803328 | |
Mattia Giambirtone | 3b603d1fdf | |
Mattia Giambirtone | 0ec377b308 | |
Mattia Giambirtone | 40cbed2b19 | |
Mattia Giambirtone | 41abf59395 | |
Mattia Giambirtone | b2efb1c9b5 | |
Mattia Giambirtone | eb8f7c0a51 | |
Mattia Giambirtone | 31ee29538e | |
Mattia Giambirtone | c3bac2cf46 | |
Mattia Giambirtone | ee90dad3d2 | |
Mattia Giambirtone | 887d1ce8f5 | |
Mattia Giambirtone | 3f0a4708d3 | |
Mattia Giambirtone | c0bd1daebf | |
Mattia Giambirtone | d04f412347 | |
Mattia Giambirtone | 2f74c23774 | |
Mattia Giambirtone | 60d9b3c37e | |
Mattia Giambirtone | 34d5f77f65 | |
Mattia Giambirtone | a6a944a4fa | |
Mattia Giambirtone | 838fc3d5a1 | |
Mattia Giambirtone | 83051d67f8 | |
Mattia Giambirtone | 6181c49f1f | |
Mattia Giambirtone | 8b39cc3bc0 | |
Mattia Giambirtone | 3ad22dea12 | |
Mattia Giambirtone | e11ada2fec | |
Mattia Giambirtone | 13eea04e74 | |
Mattia Giambirtone | 8cac75ecef | |
Mattia Giambirtone | f7f6ae052f | |
Mattia Giambirtone | f2a23b8b77 | |
Mattia Giambirtone | 4c8cf89c8e | |
Mattia Giambirtone | 525a11adad | |
Mattia Giambirtone | f5d091bb9b | |
Mattia Giambirtone | db41234ee0 | |
Mattia Giambirtone | f34b71ec0b | |
Mattia Giambirtone | eccb2b5372 | |
Mattia Giambirtone | 3d0f35489c | |
Mattia Giambirtone | 9547d2c2bd | |
Mattia Giambirtone | 60d3c25e17 | |
Mattia Giambirtone | eaa5c7ada8 | |
Mattia Giambirtone | b6b3f67204 |
|
@ -143,3 +143,7 @@ dmypy.json
|
|||
# Cython debug symbols
|
||||
cython_debug/
|
||||
|
||||
tests/test.pn
|
||||
|
||||
# Binary stuff
|
||||
bin/
|
134
README.md
134
README.md
|
@ -1,126 +1,22 @@
|
|||
# The peon programming language
|
||||
# peon-rewrite
|
||||
|
||||
Peon is a modern, multi-paradigm, async-first programming language with a focus on correctness and speed.
|
||||
|
||||
[Go to the Manual](docs/manual.md)
|
||||
Work in progress for Peon 0.2.x
|
||||
|
||||
|
||||
## What's peon?
|
||||
## What changed
|
||||
|
||||
__Note__: For simplicity reasons, the verbs in this section refer to the present even though part of what's described here is not implemented yet.
|
||||
- Peon will no longer use a runtime GC. Instead, the memory model will use ~~lifetimes with regions~~-- actually, peon will use
|
||||
[generational references](https://verdagon.dev/blog/generational-references) instead (they're way cooler IMHO)
|
||||
- The compiler has been completely overhauled and no longer handles any code generation (in fact, currently there is no code generation
|
||||
at all, just a parser and a type checker). This is to allow for true multi-backend support as well as to improve separation of concerns
|
||||
because all the code generation stuff intertwined with the typechecking was driving me insane (please do send help)
|
||||
|
||||
## Build
|
||||
|
||||
Just run `nimble build`. It should grab all the dependencies for you and produce a `peon` binary in your current working directory
|
||||
|
||||
|
||||
Peon is a multi-paradigm, statically-typed programming language inspired by C, Nim, Python, Rust and C++: it supports modern, high-level
|
||||
features such as automatic type inference, parametrically polymorphic generic types, pure functions, closures, interfaces, single inheritance,
|
||||
reference types, templates, coroutines, raw pointers and exceptions.
|
||||
## Tests
|
||||
|
||||
The memory management model is rather simple: a Mark and Sweep garbage collector is employed to reclaim unused memory, although more garbage
|
||||
collection strategies (such as generational GC or deferred reference counting) are planned to be added in the future.
|
||||
|
||||
Peon features a native cooperative concurrency model designed to take advantage of the inherent waiting of typical I/O workloads, without the use of more than one OS thread (wherever possible), allowing for much greater efficiency and a smaller memory footprint. The asynchronous model used forces developers to write code that is both easy to reason about, thanks to the [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) model that is core to peon's async event loop implementation, and works as expected every time (without dropping signals, exceptions, or task return values).
|
||||
|
||||
Other notable features are the ability to define (and overload) custom operators with ease by implementing them as language-level functions, [Universal function call syntax](https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax), [Name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)) and named scopes.
|
||||
|
||||
In peon, all objects are first-class (this includes functions, iterators, closures and coroutines).
|
||||
|
||||
## Disclaimers
|
||||
|
||||
**Disclaimer 1**: The project is still in its very early days: lots of stuff is not implemented, a work in progress or
|
||||
otherwise outright broken. Feel free to report bugs!
|
||||
|
||||
**Disclaimer 2**: Currently, the `std` module has to be _always_ imported explicitly for even the most basic snippets to work. This is because intrinsic types and builtin operators are defined within it: if it is not imported, peon won't even know how to parse `2 + 2` (and even if it could, it would have no idea what the type of the expression would be). You can have a look at the [peon standard library](src/peon/stdlib) to see how the builtins are defined (be aware that they heavily rely on compiler black magic to work) and can even provide your own implementation if you're so inclined.
|
||||
|
||||
|
||||
### TODO List
|
||||
|
||||
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
|
||||
- User-defined types
|
||||
- Function calls ✅
|
||||
- Control flow (if-then-else, switch) ✅
|
||||
- Looping (while) ✅
|
||||
- Iteration (foreach)
|
||||
- Type conversions
|
||||
- Type casting
|
||||
- Intrinsics ✅
|
||||
- Type unions ✅
|
||||
- Functions ✅
|
||||
- Closures
|
||||
- Managed references
|
||||
- Unmanaged references
|
||||
- Named scopes/blocks ✅
|
||||
- Inheritance
|
||||
- Interfaces
|
||||
- Generics ✅
|
||||
- Automatic types ✅
|
||||
- Iterators/Generators
|
||||
- Coroutines
|
||||
- Pragmas ✅
|
||||
- Attribute resolution ✅
|
||||
- Universal Function Call Syntax
|
||||
- Import system ✅
|
||||
- Exceptions
|
||||
- Templates (_not_ like C++ templates) ✅
|
||||
- Optimizations (constant folding, branch and dead code elimination, inlining)
|
||||
|
||||
|
||||
## Feature wishlist
|
||||
|
||||
Here's a random list of high-level features I would like peon to have and that I think are kinda neat (some may
|
||||
have been implemented alredady):
|
||||
- Reference types are not nullable by default (must use `#pragma[nullable]`)
|
||||
- The `commutative` pragma, which allows to define just one implementation of an operator
|
||||
and have it become commutative
|
||||
- Easy C/Nim interop via FFI
|
||||
- C/C++ backend
|
||||
- Nim backend
|
||||
- [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) (must-have!)
|
||||
- Simple OOP (with multiple dispatch!)
|
||||
- RTTI, with methods that dispatch at runtime based on the true (aka runtime) type of a value
|
||||
- Limited compile-time evaluation (embed the Peon VM in the C/C++/Nim backend and use that to execute peon code at compile time)
|
||||
|
||||
|
||||
## The name
|
||||
|
||||
The name for peon comes from [Productive2's](https://git.nocturn9x.space/prod2) genius cute brain, and is a result of shortening
|
||||
the name of the fastest animal on earth: the **Pe**regrine Falc**on**. I guess I wanted this to mean peon will be blazing fast (I
|
||||
certainly hope so!)
|
||||
|
||||
# Peon needs you.
|
||||
|
||||
No, but really. I need help. This project is huge and (IMHO) awesome, but there's a lot of non-trivial work to do and doing
|
||||
it with other people is just plain more fun and rewarding. If you want to get involved, definitely try [contacting](https://nocturn9x.space/contact) me
|
||||
or open an issue/PR!
|
||||
|
||||
|
||||
# Credits
|
||||
|
||||
- Araq, for creating the amazing language that is [Nim](https://nim-lang.org) (as well as all of its contributors!)
|
||||
- Guido Van Rossum, aka the chad who created [Python](https://python.org) and its awesome community and resources
|
||||
- The Nim community and contributors, for making Nim what it is today
|
||||
- Bob Nystrom, for his amazing [book](https://craftinginterpreters.com) that inspired me
|
||||
and taught me how to actually make a programming language (kinda, I'm still very dumb)
|
||||
- [Njsmith](https://vorpus.org/), for his awesome articles on structured concurrency
|
||||
- All the amazing people in the [r/ProgrammingLanguages](https://reddit.com/r/ProgrammingLanguages) subreddit and its [Discord](https://discord.gg/tuFCPmB7Un) server
|
||||
- [Art](https://git.nocturn9x.space/art) <3
|
||||
- Everyone to listened (and still listens to) me ramble about compilers, programming languages and the likes (and for giving me ideas and testing peon!)
|
||||
- ... More? (I'd thank the contributors but it's just me :P)
|
||||
- Me! I guess
|
||||
|
||||
|
||||
## Ok, cool, how do I use it?
|
||||
|
||||
Great question! If this README somehow didn't turn you away already (thanks, by the way), then you may want to try peon
|
||||
out for yourself. Fortunately, the process is quite straightforward:
|
||||
|
||||
- First, you're gonna have to install [Nim](https://nim-lang.org/), the language peon is written in. I highly recommend
|
||||
using [choosenim](https://github.com/dom96/choosenim) to manage your Nim installations as it makes switching between them and updating them a breeze
|
||||
- Then, clone this repository and compile peon in release mode with `nim c -d:release --passC:"-flto" -o:peon src/main`, which should produce`peon` binary
|
||||
ready for you to play with (if your C toolchain doesn't support LTO then you can just omit the `--passC` option, although that would be pretty weird for
|
||||
a modern linker)
|
||||
- If you want to move the executable to a different directory (say, into your `PATH`), you should copy peon's standard
|
||||
library (found in `/src/peon/stdlib`) into a known folder, edit the `moduleLookupPaths` variable inside `src/config.nim`
|
||||
by adding said folder to it so that the peon compiler knows where to find modules when you `import std;` and then recompile
|
||||
peon. Hopefully I will automate this soon, but as of right now the work is all manual
|
||||
|
||||
|
||||
__Note__: On Linux, peon will also look into `~/.local/peon/stdlib` by default, so you can just create the `~/.local/peon` folder and copy `src/peon/stdlib` there
|
||||
Peon is starting to get large enough for it to need an automated test suite (wow, much fancy, such cool), so you can run `nimble test`
|
||||
to run that. The actual tests aren't using testament because I have a severe case of NIH syndrome, sorry folks!
|
1
nim.cfg
1
nim.cfg
|
@ -1 +1,2 @@
|
|||
--hints:off --deepCopy:on --experimental:strictFuncs --exceptions:setjmp
|
||||
path="src"
|
|
@ -0,0 +1,22 @@
|
|||
# Package
|
||||
|
||||
version = "0.1.0"
|
||||
author = "nocturn9x"
|
||||
description = "A rewrite of Peon 0.1.x"
|
||||
license = "Apache-2.0"
|
||||
srcDir = "src"
|
||||
bin = @["peon"]
|
||||
binDir = "bin"
|
||||
|
||||
|
||||
# Dependencies
|
||||
|
||||
requires "nim >= 2.1.1"
|
||||
requires "jale >= 0.1.1"
|
||||
|
||||
before build:
|
||||
exec "nimble test"
|
||||
|
||||
|
||||
task test, "Runs the test suite":
|
||||
exec "nim r tests/tokenize.nim"
|
|
@ -0,0 +1,464 @@
|
|||
import frontend/compiler/typechecker
|
||||
import backend/bytecode/opcodes
|
||||
import backend/bytecode/tooling/multibyte
|
||||
import errors
|
||||
|
||||
|
||||
import std/strutils
|
||||
import std/parseutils
|
||||
import std/tables
|
||||
import std/strformat
|
||||
|
||||
|
||||
type
|
||||
|
||||
FunctionWrapper = ref object
|
||||
## A wrapper around a typed function
|
||||
## declaration. This is necessary to
|
||||
## carry bytecode-specific information
|
||||
## regarding this function along with
|
||||
## the typed declaration itself
|
||||
decl: TypedFunDecl
|
||||
# The location where the function's code
|
||||
# begins and ends
|
||||
location: tuple[start, stop: int]
|
||||
|
||||
BytecodeGenerator* = ref object
|
||||
## A bytecode generator
|
||||
|
||||
# The piece of code we compile into
|
||||
chunk: Chunk
|
||||
# The current size of the call
|
||||
# stack (which is always known
|
||||
# statically)
|
||||
stackSize: int
|
||||
# Stores the position of all jumps
|
||||
jumps: seq[tuple[patched: bool, offset: int]]
|
||||
# Metadata regarding function locations (used to construct
|
||||
# the debugging fields in the resulting bytecode)
|
||||
functions: seq[tuple[start, stop, pos: int, fn: Name]]
|
||||
# Used for error reporting
|
||||
currentFile: string
|
||||
currentNode: TypedNode
|
||||
# The typechecker used to validate the peon code we're generating
|
||||
# bytecode for
|
||||
typeChecker: TypeChecker
|
||||
|
||||
|
||||
proc newBytecodeGenerator*: BytecodeGenerator =
|
||||
## Initializes a new, blank bytecode
|
||||
## generator
|
||||
result = BytecodeGenerator()
|
||||
|
||||
|
||||
proc generateExpression(self: BytecodeGenerator, expression: TypedExpr)
|
||||
|
||||
|
||||
proc error(self: BytecodeGenerator, msg: string, typedNode: TypedNode = nil) =
|
||||
## Raises a generic peon exception
|
||||
var typedNode = typedNode
|
||||
var file = self.currentFile
|
||||
if typedNode.isNil():
|
||||
typedNode = self.currentNode
|
||||
if file == "" and typedNode.node.isDecl():
|
||||
file = TypedDecl(typedNode).name.owner.ident.token.lexeme
|
||||
raise CodeGenError(msg: msg, line: typedNode.node.token.line, file: file)
|
||||
|
||||
|
||||
proc emitByte(self: BytecodeGenerator, byt: OpCode | uint8, line: int) {.inline.} =
|
||||
## Emits a single byte, writing it to
|
||||
## the current chunk being compiled
|
||||
self.chunk.write(uint8(byt), line)
|
||||
|
||||
|
||||
proc emitBytes(self: BytecodeGenerator, bytarr: openarray[OpCode | uint8], line: int) {.inline.} =
|
||||
## Handy helper method to write arbitrary bytes into
|
||||
## the current chunk, calling emitByte on each of its
|
||||
## elements
|
||||
for b in bytarr:
|
||||
self.emitByte(b, line)
|
||||
|
||||
|
||||
proc makeConstant(self: BytecodeGenerator, value: TypedExpr): array[3, uint8] =
|
||||
## Adds a constant to the current chunk's constant table
|
||||
## and returns its index as a 3-byte array of uint8s
|
||||
var lit: string
|
||||
if value.kind.kind == Integer:
|
||||
lit = value.node.token.lexeme
|
||||
if lit.contains("'"):
|
||||
var idx = lit.high()
|
||||
while lit[idx] != '\'':
|
||||
lit = lit[0..^2]
|
||||
dec(idx)
|
||||
lit = lit[0..^2]
|
||||
case value.kind.kind:
|
||||
of Integer:
|
||||
case value.kind.size:
|
||||
of Tiny:
|
||||
result = self.chunk.writeConstant([uint8(parseInt(lit))])
|
||||
of Short:
|
||||
result = self.chunk.writeConstant(parseInt(lit).toDouble())
|
||||
of Long:
|
||||
result = self.chunk.writeConstant(parseInt(lit).toQuad())
|
||||
of LongLong:
|
||||
if not value.kind.signed:
|
||||
result = self.chunk.writeConstant(parseInt(lit).toLong())
|
||||
else:
|
||||
result = self.chunk.writeConstant(parseBiggestUInt(lit).toLong())
|
||||
of String:
|
||||
result = self.chunk.writeConstant(value.node.token.lexeme[1..^1].toBytes())
|
||||
of Float:
|
||||
case value.kind.width:
|
||||
of Half:
|
||||
var f: float = 0.0
|
||||
discard parseFloat(value.node.token.lexeme, f)
|
||||
result = self.chunk.writeConstant(cast[array[4, uint8]](float32(f)))
|
||||
of Full:
|
||||
var f: float = 0.0
|
||||
discard parseFloat(value.node.token.lexeme, f)
|
||||
result = self.chunk.writeConstant(cast[array[8, uint8]](f))
|
||||
else:
|
||||
discard
|
||||
|
||||
|
||||
proc emitConstant(self: BytecodeGenerator, expression: TypedExpr) =
|
||||
## Emits a constant instruction along
|
||||
## with its operand
|
||||
let
|
||||
typ = expression.kind
|
||||
node = expression.node
|
||||
case typ.kind:
|
||||
of Integer:
|
||||
case typ.size:
|
||||
of LongLong:
|
||||
if typ.signed:
|
||||
self.emitByte(LoadInt64, node.token.line)
|
||||
else:
|
||||
self.emitByte(LoadUInt64, node.token.line)
|
||||
of Long:
|
||||
if typ.signed:
|
||||
self.emitByte(LoadInt32, node.token.line)
|
||||
else:
|
||||
self.emitByte(LoadUInt32, node.token.line)
|
||||
of Short:
|
||||
if typ.signed:
|
||||
self.emitByte(LoadInt16, node.token.line)
|
||||
else:
|
||||
self.emitByte(LoadUInt16, node.token.line)
|
||||
of Tiny:
|
||||
if typ.signed:
|
||||
self.emitByte(LoadInt8, node.token.line)
|
||||
else:
|
||||
self.emitByte(LoadUInt8, node.token.line)
|
||||
of String:
|
||||
self.emitByte(LoadString, node.token.line)
|
||||
let str = LiteralExpr(node).literal.lexeme
|
||||
if str.len() >= 16777216:
|
||||
self.error("string constants cannot be larger than 16777215 bytes", expression)
|
||||
self.emitBytes((str.len() - 2).toTriple(), node.token.line)
|
||||
of Float:
|
||||
case typ.width:
|
||||
of Half:
|
||||
self.emitByte(LoadFloat32, node.token.line)
|
||||
of Full:
|
||||
self.emitByte(LoadFloat64, node.token.line)
|
||||
else:
|
||||
discard # TODO
|
||||
self.emitBytes(self.makeConstant(expression), node.token.line)
|
||||
|
||||
|
||||
proc setJump(self: BytecodeGenerator, offset: int, jmp: array[3, uint8]) =
|
||||
## Sets a jump at the given
|
||||
## offset to the given value
|
||||
self.chunk.code[offset + 1] = jmp[0]
|
||||
self.chunk.code[offset + 2] = jmp[1]
|
||||
self.chunk.code[offset + 3] = jmp[2]
|
||||
|
||||
|
||||
proc setJump(self: BytecodeGenerator, offset: int, jmp: seq[uint8]) =
|
||||
## Sets a jump at the given
|
||||
## offset to the given value
|
||||
self.chunk.code[offset + 1] = jmp[0]
|
||||
self.chunk.code[offset + 2] = jmp[1]
|
||||
self.chunk.code[offset + 3] = jmp[2]
|
||||
|
||||
|
||||
proc emitJump(self: BytecodeGenerator, opcode: OpCode, line: int): int =
|
||||
## Emits a dummy jump offset to be patched later
|
||||
## and returns a unique identifier for that jump
|
||||
## to be passed to patchJump
|
||||
self.emitByte(opcode, line)
|
||||
self.jumps.add((patched: false, offset: self.chunk.code.high()))
|
||||
self.emitBytes(0.toTriple(), line)
|
||||
result = self.jumps.high()
|
||||
|
||||
|
||||
proc patchJump(self: BytecodeGenerator, offset: int) =
|
||||
## Patches a previously emitted relative
|
||||
## jump using emitJump
|
||||
var jump: int = self.chunk.code.len() - self.jumps[offset].offset
|
||||
if jump < 0:
|
||||
self.error("jump size cannot be negative (This is an internal error and most likely a bug)")
|
||||
if jump > 16777215:
|
||||
# TODO: Emit consecutive jumps using insertAt
|
||||
self.error("cannot jump more than 16777215 instructions")
|
||||
if jump > 0:
|
||||
self.setJump(self.jumps[offset].offset, (jump - 4).toTriple())
|
||||
self.jumps[offset].patched = true
|
||||
|
||||
|
||||
proc handleBuiltinFunction(self: BytecodeGenerator, fn: FunctionWrapper, args: seq[TypedExpr], line: int) =
|
||||
## Emits instructions for builtin functions
|
||||
## such as addition or subtraction
|
||||
var builtinOp: string
|
||||
for pragma in FunDecl(fn.decl.node).pragmas:
|
||||
if pragma.name.token.lexeme == "magic":
|
||||
builtinOp = pragma.args[0].token.lexeme
|
||||
if builtinOp notin ["LogicalOr", "LogicalAnd"]:
|
||||
if len(args) == 2:
|
||||
self.generateExpression(args[1])
|
||||
self.generateExpression(args[0])
|
||||
elif len(args) == 1:
|
||||
self.generateExpression(args[0])
|
||||
const codes: Table[string, OpCode] = {"Negate": Negate,
|
||||
"NegateFloat32": NegateFloat32,
|
||||
"NegateFloat64": NegateFloat64,
|
||||
"Add": Add,
|
||||
"Subtract": Subtract,
|
||||
"Divide": Divide,
|
||||
"Multiply": Multiply,
|
||||
"SignedDivide": SignedDivide,
|
||||
"AddFloat64": AddFloat64,
|
||||
"SubtractFloat64": SubtractFloat64,
|
||||
"DivideFloat64": DivideFloat64,
|
||||
"MultiplyFloat64": MultiplyFloat64,
|
||||
"AddFloat32": AddFloat32,
|
||||
"SubtractFloat32": SubtractFloat32,
|
||||
"DivideFloat32": DivideFloat32,
|
||||
"MultiplyFloat32": MultiplyFloat32,
|
||||
"Pow": Pow,
|
||||
"SignedPow": SignedPow,
|
||||
"PowFloat32": PowFloat32,
|
||||
"PowFloat64": PowFloat64,
|
||||
"Mod": Mod,
|
||||
"SignedMod": SignedMod,
|
||||
"ModFloat32": ModFloat32,
|
||||
"ModFloat64": ModFloat64,
|
||||
"Or": Or,
|
||||
"And": And,
|
||||
"Xor": Xor,
|
||||
"Not": Not,
|
||||
"LShift": LShift,
|
||||
"RShift": RShift,
|
||||
"Equal": Equal,
|
||||
"NotEqual": NotEqual,
|
||||
"LessThan": LessThan,
|
||||
"GreaterThan": GreaterThan,
|
||||
"LessOrEqual": LessOrEqual,
|
||||
"GreaterOrEqual": GreaterOrEqual,
|
||||
"SignedLessThan": SignedLessThan,
|
||||
"SignedGreaterThan": SignedGreaterThan,
|
||||
"SignedLessOrEqual": SignedLessOrEqual,
|
||||
"SignedGreaterOrEqual": SignedGreaterOrEqual,
|
||||
"Float32LessThan": Float32LessThan,
|
||||
"Float32GreaterThan": Float32GreaterThan,
|
||||
"Float32LessOrEqual": Float32LessOrEqual,
|
||||
"Float32GreaterOrEqual": Float32GreaterOrEqual,
|
||||
"Float64LessThan": Float64LessThan,
|
||||
"Float64GreaterThan": Float64GreaterThan,
|
||||
"Float64LessOrEqual": Float64LessOrEqual,
|
||||
"Float64GreaterOrEqual": Float64GreaterOrEqual,
|
||||
"PrintString": PrintString,
|
||||
"SysClock64": SysClock64,
|
||||
"LogicalNot": LogicalNot,
|
||||
"NegInf": LoadNInf,
|
||||
"Identity": Identity
|
||||
}.toTable()
|
||||
if builtinOp == "print":
|
||||
let typ = args[0].kind
|
||||
case typ.kind:
|
||||
of Integer:
|
||||
case typ.size:
|
||||
of LongLong:
|
||||
if typ.signed:
|
||||
self.emitByte(PrintInt64, line)
|
||||
else:
|
||||
self.emitByte(PrintUInt64, line)
|
||||
of Long:
|
||||
if typ.signed:
|
||||
self.emitByte(PrintInt32, line)
|
||||
else:
|
||||
self.emitByte(PrintUInt32, line)
|
||||
of Short:
|
||||
if typ.signed:
|
||||
self.emitByte(PrintInt16, line)
|
||||
else:
|
||||
self.emitByte(PrintUInt16, line)
|
||||
of Tiny:
|
||||
if typ.signed:
|
||||
self.emitByte(PrintInt8, line)
|
||||
else:
|
||||
self.emitByte(PrintUInt8, line)
|
||||
of Float:
|
||||
case typ.width:
|
||||
of Full:
|
||||
self.emitByte(PrintFloat64, line)
|
||||
of Half:
|
||||
self.emitByte(PrintFloat32, line)
|
||||
of String:
|
||||
self.emitByte(PrintString, line)
|
||||
of Boolean:
|
||||
self.emitByte(PrintBool, line)
|
||||
of TypeKind.Nan:
|
||||
self.emitByte(PrintNan, line)
|
||||
of TypeKind.Infinity:
|
||||
self.emitByte(PrintInf, line)
|
||||
of Function:
|
||||
self.emitByte(LoadString, line)
|
||||
var loc: string = fn.location.start.toHex()
|
||||
while loc[0] == '0' and loc.len() > 1:
|
||||
loc = loc[1..^1]
|
||||
var str: string
|
||||
if typ.isLambda:
|
||||
str = &"anonymous function at 0x{loc}"
|
||||
else:
|
||||
str = &"function '{FunDecl(fn.decl.node).name.token.lexeme}' at 0x{loc}"
|
||||
self.emitBytes(str.len().toTriple(), line)
|
||||
self.emitBytes(self.chunk.writeConstant(str.toBytes()), line)
|
||||
self.emitByte(PrintString, line)
|
||||
else:
|
||||
self.error(&"invalid type {self.typechecker.stringify(typ)} for built-in 'print'", args[0])
|
||||
return
|
||||
if builtinOp in codes:
|
||||
self.emitByte(codes[builtinOp], line)
|
||||
return
|
||||
# Some builtin operations are slightly more complex
|
||||
# so we handle them separately
|
||||
case builtinOp:
|
||||
of "LogicalOr":
|
||||
self.generateExpression(args[0])
|
||||
let jump = self.emitJump(JumpIfTrue, line)
|
||||
self.generateExpression(args[1])
|
||||
self.patchJump(jump)
|
||||
of "LogicalAnd":
|
||||
self.generateExpression(args[0])
|
||||
let jump = self.emitJump(JumpIfFalseOrPop, line)
|
||||
self.generateExpression(args[1])
|
||||
self.patchJump(jump)
|
||||
of "cast":
|
||||
# Type casts are a merely compile-time construct:
|
||||
# they don't produce any code at runtime because
|
||||
# the underlying data representation does not change!
|
||||
# The only reason why there's a "cast" pragma is to
|
||||
# make it so that the peon stub can have no body
|
||||
discard
|
||||
else:
|
||||
self.error(&"unknown built-in: '{builtinOp}'")
|
||||
|
||||
|
||||
proc patchReturnAddress(self: BytecodeGenerator, pos: int) =
|
||||
## Patches the return address of a function
|
||||
## call
|
||||
let address = self.chunk.code.len().toLong()
|
||||
self.chunk.consts[pos] = address[0]
|
||||
self.chunk.consts[pos + 1] = address[1]
|
||||
self.chunk.consts[pos + 2] = address[2]
|
||||
self.chunk.consts[pos + 3] = address[3]
|
||||
self.chunk.consts[pos + 4] = address[4]
|
||||
self.chunk.consts[pos + 5] = address[5]
|
||||
self.chunk.consts[pos + 6] = address[6]
|
||||
self.chunk.consts[pos + 7] = address[7]
|
||||
|
||||
|
||||
proc generateLiteral(self: BytecodeGenerator, literal: TypedExpr) =
|
||||
## Emits code for literals
|
||||
let
|
||||
typ = literal.kind
|
||||
node = literal.node
|
||||
case typ.kind:
|
||||
of Integer, Float:
|
||||
# No need to do any input validation here: the typechecker
|
||||
# has graciously done all the work for us! :)
|
||||
self.emitConstant(literal)
|
||||
of Infinity:
|
||||
if typ.positive:
|
||||
self.emitByte(LoadInf, node.token.line)
|
||||
else:
|
||||
self.emitByte(LoadNInf, node.token.line)
|
||||
of NaN:
|
||||
self.emitByte(LoadNaN, node.token.line)
|
||||
else:
|
||||
self.error(&"Unknown typed node of type {node.kind} at generateLiteral()")
|
||||
|
||||
|
||||
proc generateUnary(self: BytecodeGenerator, expression: TypedExpr) =
|
||||
## Emits code for unary expressions
|
||||
discard # TODO
|
||||
|
||||
|
||||
proc generateBinary(self: BytecodeGenerator, expression: TypedExpr) =
|
||||
## Emits code for binary expressions
|
||||
discard # TODO
|
||||
|
||||
|
||||
proc generateExpression(self: BytecodeGenerator, expression: TypedExpr) =
|
||||
## Emits code for expressions
|
||||
if expression.node.isConst():
|
||||
self.generateLiteral(expression)
|
||||
else:
|
||||
let node = expression.node
|
||||
case node.kind:
|
||||
of unaryExpr:
|
||||
self.generateUnary(expression)
|
||||
of binaryExpr:
|
||||
self.generateBinary(expression)
|
||||
else:
|
||||
self.error(&"Unknown typed node of type {node.kind} at generateExpression()")
|
||||
|
||||
|
||||
proc beginProgram(self: BytecodeGenerator): int =
|
||||
## Emits boilerplate code to set up
|
||||
## a peon program
|
||||
self.emitByte(LoadUInt64, 1)
|
||||
# The initial jump address is always the same
|
||||
self.emitBytes(self.chunk.writeConstant(12.toLong()), 1)
|
||||
self.emitByte(LoadUInt64, 1)
|
||||
# We emit a dummy return address which is patched later
|
||||
self.emitBytes(self.chunk.writeConstant(0.toLong()), 1)
|
||||
result = self.chunk.consts.len() - 8
|
||||
self.emitByte(Call, 1)
|
||||
self.emitBytes(0.toTriple(), 1)
|
||||
|
||||
|
||||
proc endProgram(self: BytecodeGenerator, pos: int) =
|
||||
## Emits boilerplate code to tear down
|
||||
## a peon program
|
||||
self.emitByte(OpCode.Return, self.currentNode.node.token.line)
|
||||
# Entry point has no return value
|
||||
self.emitByte(0, self.currentNode.node.token.line)
|
||||
# Patch the return address now that we know the boundaries
|
||||
# of the function
|
||||
self.patchReturnAddress(pos)
|
||||
|
||||
|
||||
proc generate*(self: BytecodeGenerator, compiled: seq[TypedNode], typeChecker: TypeChecker): Chunk =
|
||||
## Turn the given compilation output
|
||||
## into a bytecode chunk
|
||||
self.chunk = newChunk()
|
||||
self.typeChecker = typeChecker
|
||||
let offset = self.beginProgram()
|
||||
self.currentFile = typeChecker.getFile()
|
||||
for typedNode in compiled:
|
||||
self.currentNode = typedNode
|
||||
let currentFile = self.currentFile
|
||||
if self.currentNode.node.isDecl():
|
||||
self.currentFile = TypedDecl(typedNode).name.module.ident.token.lexeme
|
||||
case typedNode.node.kind:
|
||||
of exprStmt:
|
||||
self.generateExpression(TypedExprStmt(typedNode).expression)
|
||||
self.emitByte(Pop, typedNode.node.token.line)
|
||||
else:
|
||||
self.error(&"Unknown typed node of type {typedNode.node.kind} at generate()")
|
||||
self.currentFile = currentFile
|
||||
self.endProgram(offset)
|
||||
result = self.chunk
|
|
@ -0,0 +1,373 @@
|
|||
# Copyright 2023 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## Low level bytecode implementation details
|
||||
|
||||
import backend/bytecode/tooling/multibyte
|
||||
|
||||
|
||||
type
|
||||
Chunk* = ref object
|
||||
## A piece of bytecode.
|
||||
## consts is the code's constants table.
|
||||
## code is the linear sequence of compiled bytecode instructions.
|
||||
## lines maps bytecode instructions to line numbers using Run
|
||||
## Length Encoding. Instructions are encoded in groups whose structure
|
||||
## follows the following schema:
|
||||
## - The first integer represents the line number
|
||||
## - The second integer represents the number of
|
||||
## instructions on that line
|
||||
## For example, if lines equals [1, 5], it means that there are 5 instructions
|
||||
## at line 1, meaning that all instructions in code[0..4] belong to the same line.
|
||||
## This is more efficient than using the naive approach, which would encode
|
||||
## the same line number multiple times and waste considerable amounts of space.
|
||||
## functions encodes the following information:
|
||||
## - Function name
|
||||
## - Argument count
|
||||
## - Function boundaries
|
||||
## The encoding is the following:
|
||||
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
|
||||
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
|
||||
## - After that follows the argument count as a 1 byte integer
|
||||
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
||||
## its size as a 2-byte integer
|
||||
## modules contains information about all the peon modules that the compiler has encountered,
|
||||
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
|
||||
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
|
||||
## is contained within a single binary blob. While this simplifies the implementation and makes
|
||||
## bytecode files entirely "self-hosted", it also means that the original module information is
|
||||
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
|
||||
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
|
||||
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
|
||||
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
|
||||
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
|
||||
consts*: seq[uint8]
|
||||
code*: seq[uint8]
|
||||
lines*: seq[int]
|
||||
functions*: seq[uint8]
|
||||
modules*: seq[uint8]
|
||||
|
||||
OpCode* {.pure.} = enum
|
||||
## Enum of Peon's bytecode opcodes
|
||||
|
||||
# Note: x represents the argument
|
||||
# to unary opcodes, while a and b
|
||||
# represent arguments to binary
|
||||
# opcodes. Other variable names (c, d, ...)
|
||||
# may be used for more complex opcodes.
|
||||
# Some opcodes (e.g. jumps), take arguments in
|
||||
# the form of 16 or 24 bit numbers that are defined
|
||||
# statically at compilation time into the bytecode
|
||||
|
||||
# These push a constant at position x in the
|
||||
# constant table onto the stack
|
||||
LoadInt64 = 0u8,
|
||||
LoadUInt64,
|
||||
LoadInt32,
|
||||
LoadUInt32,
|
||||
LoadInt16,
|
||||
LoadUInt16,
|
||||
LoadInt8,
|
||||
LoadUInt8,
|
||||
LoadFloat64,
|
||||
LoadFloat32,
|
||||
LoadString,
|
||||
## Singleton opcodes (each of them pushes a constant singleton on the operand stack)
|
||||
LoadNil,
|
||||
LoadTrue,
|
||||
LoadFalse,
|
||||
LoadNan,
|
||||
LoadInf,
|
||||
LoadNInf,
|
||||
## Operations on primitive types
|
||||
Negate,
|
||||
NegateFloat64,
|
||||
NegateFloat32,
|
||||
Add,
|
||||
Subtract,
|
||||
Multiply,
|
||||
Divide,
|
||||
SignedDivide,
|
||||
AddFloat64,
|
||||
SubtractFloat64,
|
||||
MultiplyFloat64,
|
||||
DivideFloat64,
|
||||
AddFloat32,
|
||||
SubtractFloat32,
|
||||
MultiplyFloat32,
|
||||
DivideFloat32,
|
||||
Pow,
|
||||
SignedPow,
|
||||
Mod,
|
||||
SignedMod,
|
||||
PowFloat64,
|
||||
PowFloat32,
|
||||
ModFloat64,
|
||||
ModFloat32,
|
||||
LShift,
|
||||
RSHift,
|
||||
Xor,
|
||||
Or,
|
||||
And,
|
||||
Not,
|
||||
Equal,
|
||||
NotEqual,
|
||||
GreaterThan,
|
||||
LessThan,
|
||||
GreaterOrEqual,
|
||||
LessOrEqual,
|
||||
SignedGreaterThan,
|
||||
SignedLessThan,
|
||||
SignedGreaterOrEqual,
|
||||
SignedLessOrEqual,
|
||||
Float64GreaterThan,
|
||||
Float64LessThan,
|
||||
Float64GreaterOrEqual,
|
||||
Float64LessOrEqual,
|
||||
Float32GreaterThan,
|
||||
Float32LessThan,
|
||||
Float32GreaterOrEqual,
|
||||
Float32LessOrEqual,
|
||||
LogicalNot,
|
||||
Identity, # Pointer equality
|
||||
## Print opcodes
|
||||
PrintInt64,
|
||||
PrintUInt64,
|
||||
PrintInt32,
|
||||
PrintUInt32,
|
||||
PrintInt16,
|
||||
PrintUint16,
|
||||
PrintInt8,
|
||||
PrintUInt8,
|
||||
PrintFloat64,
|
||||
PrintFloat32,
|
||||
PrintHex,
|
||||
PrintBool,
|
||||
PrintNan,
|
||||
PrintInf,
|
||||
PrintString,
|
||||
## Basic stack operations
|
||||
Pop, # Pops an element off the operand stack and discards it
|
||||
PopN, # Pops x elements off the call stack (optimization for exiting local scopes which usually pop many elements)
|
||||
## Name resolution/handling
|
||||
LoadAttribute, # Pushes the attribute b of object a onto the stack
|
||||
LoadVar, # Pushes the object at position x in the stack onto the stack
|
||||
StoreVar, # Stores the value of b at position a in the stack
|
||||
AddVar, # An optimization for StoreVar (used when the variable is first declared)
|
||||
## Looping and jumping
|
||||
Jump, # Absolute, unconditional jump into the bytecode
|
||||
JumpForwards, # Relative, unconditional, positive jump in the bytecode
|
||||
JumpBackwards, # Relative, unconditional, negative jump in the bytecode
|
||||
JumpIfFalse, # Jumps to a relative index in the bytecode if x is false
|
||||
JumpIfTrue, # Jumps to a relative index in the bytecode if x is true
|
||||
JumpIfFalsePop, # Like JumpIfFalse, but also pops off the stack (regardless of truthyness). Optimization for if statements
|
||||
JumpIfFalseOrPop, # Jumps to an absolute index in the bytecode if x is false and pops otherwise (used for logical and)
|
||||
## Functions
|
||||
Call, # Calls a function and initiates a new stack frame
|
||||
Return, # Terminates the current function
|
||||
SetResult, # Sets the result of the current function
|
||||
## Exception handling
|
||||
Raise, # Raises exception x or re-raises active exception if x is nil
|
||||
BeginTry, # Initiates an exception handling context
|
||||
FinishTry, # Closes the current exception handling context
|
||||
## Generators
|
||||
Yield, # Yields control from a generator back to the caller
|
||||
## Coroutines
|
||||
Await, # Calls an asynchronous function
|
||||
## Misc
|
||||
Assert, # Raises an exception if x is false
|
||||
NoOp, # Just a no-op
|
||||
PopC, # Pop a value off the call stack and discard it
|
||||
PushC, # Pop a value off the operand stack and push it onto the call stack
|
||||
SysClock64, # Pushes the output of a monotonic clock on the stack
|
||||
LoadTOS, # Pushes the top of the call stack onto the operand stack
|
||||
DupTop, # Duplicates the top of the operand stack onto the operand stack
|
||||
LoadGlobal # Loads a global variable
|
||||
|
||||
|
||||
# We group instructions by their operation/operand types for easier handling when debugging
|
||||
|
||||
# Simple instructions encompass instructions that push onto/pop off the stack unconditionally (True, False, Pop, etc.)
|
||||
const simpleInstructions* = {Return, LoadNil,
|
||||
LoadTrue, LoadFalse,
|
||||
LoadNan, LoadInf,
|
||||
Pop, Raise, LoadNInf,
|
||||
BeginTry, FinishTry, Yield,
|
||||
Await, NoOp, SetResult,
|
||||
PopC, PushC, SysClock64,
|
||||
Negate,
|
||||
NegateFloat64,
|
||||
NegateFloat32,
|
||||
Add,
|
||||
Subtract,
|
||||
Multiply,
|
||||
Divide,
|
||||
SignedDivide,
|
||||
AddFloat64,
|
||||
SubtractFloat64,
|
||||
MultiplyFloat64,
|
||||
DivideFloat64,
|
||||
AddFloat32,
|
||||
SubtractFloat32,
|
||||
MultiplyFloat32,
|
||||
DivideFloat32,
|
||||
Pow,
|
||||
SignedPow,
|
||||
Mod,
|
||||
SignedMod,
|
||||
PowFloat64,
|
||||
PowFloat32,
|
||||
ModFloat64,
|
||||
ModFloat32,
|
||||
LShift,
|
||||
RSHift,
|
||||
Xor,
|
||||
Or,
|
||||
And,
|
||||
Not,
|
||||
Equal,
|
||||
NotEqual,
|
||||
GreaterThan,
|
||||
LessThan,
|
||||
GreaterOrEqual,
|
||||
LessOrEqual,
|
||||
PrintInt64,
|
||||
PrintUInt64,
|
||||
PrintInt32,
|
||||
PrintUInt32,
|
||||
PrintInt16,
|
||||
PrintUint16,
|
||||
PrintInt8,
|
||||
PrintUInt8,
|
||||
PrintFloat64,
|
||||
PrintFloat32,
|
||||
PrintHex,
|
||||
PrintBool,
|
||||
PrintNan,
|
||||
PrintInf,
|
||||
PrintString,
|
||||
LogicalNot,
|
||||
AddVar,
|
||||
LoadTOS,
|
||||
SignedGreaterThan,
|
||||
SignedLessThan,
|
||||
SignedGreaterOrEqual,
|
||||
SignedLessOrEqual,
|
||||
Float64GreaterThan,
|
||||
Float64LessThan,
|
||||
Float64GreaterOrEqual,
|
||||
Float64LessOrEqual,
|
||||
Float32GreaterThan,
|
||||
Float32LessThan,
|
||||
Float32GreaterOrEqual,
|
||||
Float32LessOrEqual,
|
||||
DupTop,
|
||||
Identity
|
||||
}
|
||||
|
||||
# Constant instructions are instructions that operate on the bytecode constant table
|
||||
const constantInstructions* = {LoadInt64, LoadUInt64,
|
||||
LoadInt32, LoadUInt32,
|
||||
LoadInt16, LoadUInt16,
|
||||
LoadInt8, LoadUInt8,
|
||||
LoadFloat64, LoadFloat32,
|
||||
LoadString}
|
||||
|
||||
# Stack triple instructions operate on the stack at arbitrary offsets and pop arguments off of it in the form
|
||||
# of 24 bit integers
|
||||
const stackTripleInstructions* = {StoreVar, LoadVar, LoadGlobal}
|
||||
|
||||
# Stack double instructions operate on the stack at arbitrary offsets and pop arguments off of it in the form
|
||||
# of 16 bit integers
|
||||
const stackDoubleInstructions* = {}
|
||||
|
||||
# Argument double argument instructions take hardcoded arguments as 16 bit integers
|
||||
const argumentDoubleInstructions* = {PopN, }
|
||||
|
||||
|
||||
# Jump instructions jump at relative or absolute bytecode offsets
|
||||
const jumpInstructions* = {Jump, JumpIfFalse, JumpIfFalsePop,
|
||||
JumpForwards, JumpBackwards,
|
||||
JumpIfTrue, JumpIfFalseOrPop}
|
||||
|
||||
|
||||
proc newChunk*: Chunk =
|
||||
## Initializes a new, empty chunk
|
||||
result = Chunk(consts: @[], code: @[], lines: @[], functions: @[])
|
||||
|
||||
|
||||
proc write*(self: Chunk, newByte: uint8, line: int) =
|
||||
## Adds the given instruction at the provided line number
|
||||
## to the given chunk object
|
||||
assert line > 0, "line must be greater than zero"
|
||||
if self.lines.high() >= 1 and self.lines[^2] == line:
|
||||
self.lines[^1] += 1
|
||||
else:
|
||||
self.lines.add(line)
|
||||
self.lines.add(1)
|
||||
self.code.add(newByte)
|
||||
|
||||
|
||||
proc write*(self: Chunk, bytes: openarray[uint8], line: int) =
|
||||
## Calls self.write() in a loop with all members of the
|
||||
## given array
|
||||
for cByte in bytes:
|
||||
self.write(cByte, line)
|
||||
|
||||
|
||||
proc write*(self: Chunk, newByte: OpCode, line: int) =
|
||||
## Adds the given instruction at the provided line number
|
||||
## to the given chunk object
|
||||
self.write(uint8(newByte), line)
|
||||
|
||||
|
||||
proc write*(self: Chunk, bytes: openarray[OpCode], line: int) =
|
||||
## Calls write in a loop with all members of the given
|
||||
## array
|
||||
for cByte in bytes:
|
||||
self.write(uint8(cByte), line)
|
||||
|
||||
|
||||
proc getLine*(self: Chunk, idx: int): int =
|
||||
## Returns the associated line of a given
|
||||
## instruction index
|
||||
if self.lines.len < 2:
|
||||
raise newException(IndexDefect, "the chunk object is empty")
|
||||
var
|
||||
count: int
|
||||
current: int = 0
|
||||
for n in countup(0, self.lines.high(), 2):
|
||||
count = self.lines[n + 1]
|
||||
if idx in current - count..<current + count:
|
||||
return self.lines[n]
|
||||
current += count
|
||||
raise newException(IndexDefect, "index out of range")
|
||||
|
||||
|
||||
proc getIdx*(self: Chunk, line: int): int =
|
||||
## Gets the index into self.lines
|
||||
## where the line counter for the given
|
||||
## line is located
|
||||
for i, v in self.lines:
|
||||
if (i and 1) != 0 and v == line:
|
||||
return i
|
||||
|
||||
|
||||
proc writeConstant*(self: Chunk, data: openarray[uint8]): array[3, uint8] =
|
||||
## Writes a series of bytes to the chunk's constant
|
||||
## table and returns the index of the first byte as
|
||||
## an array of 3 bytes
|
||||
result = self.consts.len().toTriple()
|
||||
for b in data:
|
||||
self.consts.add(b)
|
|
@ -0,0 +1,277 @@
|
|||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
import std/strformat
|
||||
import std/terminal
|
||||
|
||||
|
||||
import backend/bytecode/opcodes
|
||||
import backend/bytecode/tooling/multibyte
|
||||
|
||||
|
||||
type
|
||||
Function = object
|
||||
start, stop, argc: int
|
||||
name: string
|
||||
Module = object
|
||||
start, stop: int
|
||||
name: string
|
||||
BytecodeDebugger* = ref object
|
||||
chunk: Chunk
|
||||
modules: seq[Module]
|
||||
functions: seq[Function]
|
||||
current: int
|
||||
|
||||
|
||||
proc newBytecodeDebugger*: BytecodeDebugger =
|
||||
## Initializes a new, empty
|
||||
## debugger object
|
||||
new(result)
|
||||
result.functions = @[]
|
||||
|
||||
|
||||
proc nl = stdout.write("\n")
|
||||
|
||||
|
||||
proc printDebug(s: string, newline: bool = false) =
|
||||
stdout.styledWrite(fgMagenta, "DEBUG - Disassembler -> ")
|
||||
stdout.styledWrite(fgGreen, s)
|
||||
if newline:
|
||||
nl()
|
||||
|
||||
|
||||
proc printName(opcode: OpCode, newline: bool = false) =
|
||||
stdout.styledWrite(fgRed, $opcode, " (", fgYellow, $uint8(opcode), fgRed, ")")
|
||||
if newline:
|
||||
nl()
|
||||
|
||||
|
||||
proc printInstruction(instruction: OpCode, newline: bool = false) =
|
||||
printDebug("Instruction: ")
|
||||
printName(instruction)
|
||||
if newline:
|
||||
nl()
|
||||
|
||||
|
||||
proc checkFunctionStart(self: BytecodeDebugger, n: int) =
|
||||
## Checks if a function begins at the given
|
||||
## bytecode offset
|
||||
for i, e in self.functions:
|
||||
# Avoids duplicate output
|
||||
if n == e.start:
|
||||
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function Start ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
|
||||
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $e.start
|
||||
styledEcho fgGreen, "\t- End offset: ", fgYellow, $e.stop
|
||||
styledEcho fgGreen, "\t- Argument count: ", fgYellow, $e.argc, "\n"
|
||||
|
||||
|
||||
proc checkFunctionEnd(self: BytecodeDebugger, n: int) =
|
||||
## Checks if a function ends at the given
|
||||
## bytecode offset
|
||||
for i, e in self.functions:
|
||||
if n == e.stop:
|
||||
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Function End ", fgYellow, &"'{e.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
|
||||
|
||||
|
||||
proc checkModuleStart(self: BytecodeDebugger, n: int) =
|
||||
## Checks if a module begins at the given
|
||||
## bytecode offset
|
||||
for i, m in self.modules:
|
||||
if m.start == n:
|
||||
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module Start ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
|
||||
styledEcho fgGreen, "\t- Start offset: ", fgYellow, $m.start
|
||||
styledEcho fgGreen, "\t- End offset: ", fgYellow, $m.stop, "\n"
|
||||
|
||||
|
||||
proc checkModuleEnd(self: BytecodeDebugger, n: int) =
|
||||
## Checks if a module ends at the given
|
||||
## bytecode offset
|
||||
for i, m in self.modules:
|
||||
if m.stop == n:
|
||||
styledEcho fgBlue, "\n==== Peon Bytecode Disassembler - Module End ", fgYellow, &"'{m.name}' ", fgBlue, "(", fgYellow, $i, fgBlue, ") ===="
|
||||
|
||||
|
||||
proc simpleInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs simple instructions
|
||||
printInstruction(instruction, true)
|
||||
self.current += 1
|
||||
if instruction == Return:
|
||||
printDebug("Void: ")
|
||||
if self.chunk.code[self.current] == 0:
|
||||
stdout.styledWriteLine(fgYellow, "Yes")
|
||||
else:
|
||||
stdout.styledWriteLine(fgYellow, "No")
|
||||
self.current += 1
|
||||
|
||||
|
||||
proc stackTripleInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs instructions that operate on a single value on the stack using a 24-bit operand
|
||||
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
|
||||
printInstruction(instruction)
|
||||
stdout.styledWriteLine(fgGreen, &", points to index ", fgYellow, $slot)
|
||||
self.current += 4
|
||||
|
||||
|
||||
proc stackDoubleInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs instructions that operate on a single value on the stack using a 16-bit operand
|
||||
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2]].fromDouble()
|
||||
printInstruction(instruction)
|
||||
stdout.write(&", points to index ")
|
||||
stdout.styledWriteLine(fgGreen, &", points to index ", fgYellow, $slot)
|
||||
self.current += 3
|
||||
|
||||
|
||||
proc argumentDoubleInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs instructions that operate on a hardcoded value on the stack using a 16-bit operand
|
||||
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2]].fromDouble()
|
||||
printInstruction(instruction)
|
||||
stdout.styledWriteLine(fgGreen, &", has argument ", fgYellow, $slot)
|
||||
self.current += 3
|
||||
|
||||
|
||||
proc argumentTripleInstruction(self: BytecodeDebugger, instruction: OpCode) {.used.} =
|
||||
## Debugs instructions that operate on a hardcoded value on the stack using a 24-bit operand
|
||||
var slot = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
|
||||
printInstruction(instruction)
|
||||
stdout.styledWriteLine(fgGreen, ", has argument ", fgYellow, $slot)
|
||||
self.current += 4
|
||||
|
||||
|
||||
proc callInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs function calls
|
||||
var size = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
|
||||
self.current += 3
|
||||
printInstruction(instruction)
|
||||
styledEcho fgGreen, &", creates frame of size ", fgYellow, $(size + 2), fgGreen
|
||||
self.current += 1
|
||||
|
||||
|
||||
proc constantInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs instructions that operate on the constant table
|
||||
var size: uint
|
||||
if instruction == LoadString:
|
||||
size = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
|
||||
self.current += 3
|
||||
var constant = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple()
|
||||
printInstruction(instruction)
|
||||
stdout.styledWrite(fgGreen, &", points to constant at position ", fgYellow, $constant)
|
||||
self.current += 4
|
||||
if instruction == LoadString:
|
||||
stdout.styledWriteLine(fgGreen, " of length ", fgYellow, $size)
|
||||
else:
|
||||
stdout.write("\n")
|
||||
|
||||
|
||||
proc jumpInstruction(self: BytecodeDebugger, instruction: OpCode) =
|
||||
## Debugs jumps
|
||||
var orig = self.current
|
||||
var jump = [self.chunk.code[self.current + 1], self.chunk.code[self.current + 2], self.chunk.code[self.current + 3]].fromTriple().int()
|
||||
printInstruction(instruction, true)
|
||||
printDebug("Jump size: ")
|
||||
stdout.styledWrite(fgYellow, $jump)
|
||||
nl()
|
||||
self.current += 4
|
||||
while self.chunk.code[self.current] == NoOp.uint8:
|
||||
inc(self.current)
|
||||
|
||||
|
||||
proc disassembleInstruction*(self: BytecodeDebugger) =
|
||||
## Takes one bytecode instruction and prints it
|
||||
let opcode = OpCode(self.chunk.code[self.current])
|
||||
self.checkModuleStart(self.current)
|
||||
self.checkFunctionStart(self.current)
|
||||
printDebug("Offset: ")
|
||||
stdout.styledWriteLine(fgYellow, $(self.current))
|
||||
printDebug("Line: ")
|
||||
stdout.styledWriteLine(fgYellow, &"{self.chunk.getLine(self.current)}")
|
||||
case opcode:
|
||||
of simpleInstructions:
|
||||
self.simpleInstruction(opcode)
|
||||
# Functions (and modules) only have a single return statement at the
|
||||
# end of their body, so we never execute this more than once per module/function
|
||||
if opcode == Return:
|
||||
# -2 to skip the hardcoded argument to return
|
||||
# and the increment by simpleInstruction()
|
||||
self.checkFunctionEnd(self.current - 2)
|
||||
self.checkModuleEnd(self.current - 1)
|
||||
of constantInstructions:
|
||||
self.constantInstruction(opcode)
|
||||
of stackDoubleInstructions:
|
||||
self.stackDoubleInstruction(opcode)
|
||||
of stackTripleInstructions:
|
||||
self.stackTripleInstruction(opcode)
|
||||
of argumentDoubleInstructions:
|
||||
self.argumentDoubleInstruction(opcode)
|
||||
of Call:
|
||||
self.callInstruction(opcode)
|
||||
of jumpInstructions:
|
||||
self.jumpInstruction(opcode)
|
||||
else:
|
||||
echo &"DEBUG - Unknown opcode {opcode} at index {self.current}"
|
||||
self.current += 1
|
||||
|
||||
|
||||
proc parseFunctions(self: BytecodeDebugger) =
|
||||
## Parses function information in the chunk
|
||||
var
|
||||
start, stop, argc: int
|
||||
name: string
|
||||
idx = 0
|
||||
size = 0
|
||||
while idx < self.chunk.functions.high():
|
||||
start = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
|
||||
idx += 3
|
||||
stop = int([self.chunk.functions[idx], self.chunk.functions[idx + 1], self.chunk.functions[idx + 2]].fromTriple())
|
||||
idx += 3
|
||||
argc = int(self.chunk.functions[idx])
|
||||
inc(idx)
|
||||
size = int([self.chunk.functions[idx], self.chunk.functions[idx + 1]].fromDouble())
|
||||
idx += 2
|
||||
name = self.chunk.functions[idx..<idx + size].fromBytes()
|
||||
inc(idx, size)
|
||||
self.functions.add(Function(start: start, stop: stop, argc: argc, name: name))
|
||||
|
||||
|
||||
proc parseModules(self: BytecodeDebugger) =
|
||||
## Parses module information in the chunk
|
||||
var
|
||||
start, stop: int
|
||||
name: string
|
||||
idx = 0
|
||||
size = 0
|
||||
while idx < self.chunk.modules.high():
|
||||
start = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
|
||||
idx += 3
|
||||
stop = int([self.chunk.modules[idx], self.chunk.modules[idx + 1], self.chunk.modules[idx + 2]].fromTriple())
|
||||
idx += 3
|
||||
size = int([self.chunk.modules[idx], self.chunk.modules[idx + 1]].fromDouble())
|
||||
idx += 2
|
||||
name = self.chunk.modules[idx..<idx + size].fromBytes()
|
||||
inc(idx, size)
|
||||
self.modules.add(Module(start: start, stop: stop, name: name))
|
||||
|
||||
|
||||
proc disassembleChunk*(self: BytecodeDebugger, chunk: Chunk, name: string) =
|
||||
## Takes a chunk of bytecode and prints it
|
||||
self.chunk = chunk
|
||||
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ====\n"
|
||||
self.current = 0
|
||||
self.parseFunctions()
|
||||
self.parseModules()
|
||||
while self.current < self.chunk.code.len:
|
||||
self.disassembleInstruction()
|
||||
echo ""
|
||||
|
||||
styledEcho fgBlue, &"==== Peon Bytecode Disassembler - Chunk '{name}' ===="
|
|
@ -0,0 +1,87 @@
|
|||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## Utilities to handle multibyte sequences
|
||||
|
||||
|
||||
proc toDouble*(input: int | uint | uint16): array[2, uint8] =
|
||||
## Converts an unsigned integer
|
||||
## to an array[2, uint8]
|
||||
result = cast[array[2, uint8]](uint16(input))
|
||||
|
||||
|
||||
proc toTriple*(input: uint | int): array[3, uint8] =
|
||||
## Converts an unsigned integer to an array[3, uint8]
|
||||
result = cast[array[3, uint8]](uint(input))
|
||||
|
||||
|
||||
proc toQuad*(input: int | uint | uint16 | uint32): array[4, uint8] =
|
||||
## Converts an unsigned integer to an array[4, uint8]
|
||||
result = cast[array[4, uint8]](uint(input))
|
||||
|
||||
|
||||
proc toLong*(input: int | uint | uint16 | uint32 | uint64): array[8, uint8] =
|
||||
## Converts an unsigned integer to an array[8, uint8]
|
||||
result = cast[array[8, uint8]](uint(input))
|
||||
|
||||
|
||||
proc fromDouble*(input: array[2, uint8]): uint16 =
|
||||
## Rebuilds the output of toDouble into
|
||||
## an uint16
|
||||
copyMem(result.addr, unsafeAddr(input), sizeof(uint16))
|
||||
|
||||
|
||||
proc fromTriple*(input: array[3, uint8]): uint =
|
||||
## Rebuilds the output of toTriple into
|
||||
## an uint
|
||||
copyMem(result.addr, unsafeAddr(input), sizeof(uint8) * 3)
|
||||
|
||||
|
||||
proc fromQuad*(input: array[4, uint8]): uint =
|
||||
## Rebuilts the output of toQuad into
|
||||
## an uint
|
||||
copyMem(result.addr, unsafeAddr(input), sizeof(uint32))
|
||||
|
||||
|
||||
proc fromLong*(input: array[8, uint8]): uint =
|
||||
## Rebuilts the output of toQuad into
|
||||
## an uint
|
||||
copyMem(result.addr, unsafeAddr(input), sizeof(uint64))
|
||||
|
||||
|
||||
proc toBytes*(s: string): seq[byte] =
|
||||
## Converts a string into a sequence
|
||||
## of bytes
|
||||
for c in s:
|
||||
result.add(byte(c))
|
||||
|
||||
|
||||
proc toBytes*(s: int): array[8, uint8] =
|
||||
## Converts
|
||||
result = cast[array[8, uint8]](s)
|
||||
|
||||
|
||||
proc fromBytes*(input: seq[byte]): string =
|
||||
## Converts a sequence of bytes to
|
||||
## a string
|
||||
var i = 0
|
||||
while i < input.len():
|
||||
result.add(char(input[i]))
|
||||
inc(i)
|
||||
|
||||
|
||||
proc extend*[T](s: var seq[T], a: openarray[T]) =
|
||||
## Extends s with the elements of a
|
||||
for e in a:
|
||||
s.add(e)
|
|
@ -0,0 +1,253 @@
|
|||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## Implementation of the peon bytecode serializer
|
||||
import std/strformat
|
||||
import std/strutils
|
||||
import std/times
|
||||
|
||||
|
||||
import config
|
||||
import errors
|
||||
import backend/bytecode/tooling/multibyte
|
||||
import backend/bytecode/opcodes
|
||||
|
||||
|
||||
type
|
||||
BytecodeSerializer* = ref object
|
||||
file: string
|
||||
filename: string
|
||||
chunk: Chunk
|
||||
SerializedBytecode* = ref object
|
||||
## Wrapper returned by
|
||||
## the Serializer.read*
|
||||
## procedures to store
|
||||
## metadata
|
||||
version*: tuple[major, minor, patch: int]
|
||||
branch*: string
|
||||
commit*: string
|
||||
compileDate*: int
|
||||
chunk*: Chunk
|
||||
size*: int
|
||||
SerializationError* = ref object of PeonException
|
||||
|
||||
|
||||
proc `$`*(self: SerializedBytecode): string =
|
||||
result = &"SerializedBytecode(version={self.version.major}.{self.version.minor}.{self.version.patch}, branch={self.branch}), commitHash={self.commit}, date={self.compileDate}, chunk={self.chunk[]}"
|
||||
|
||||
|
||||
proc error(self: BytecodeSerializer, message: string) =
|
||||
## Raises a formatted SerializationError exception
|
||||
raise SerializationError(msg: message, file: self.filename)
|
||||
|
||||
|
||||
proc newBytecodeSerializer*(self: BytecodeSerializer = nil): BytecodeSerializer =
|
||||
new(result)
|
||||
if self != nil:
|
||||
result = self
|
||||
result.file = ""
|
||||
result.filename = ""
|
||||
result.chunk = nil
|
||||
|
||||
|
||||
proc writeHeaders(self: BytecodeSerializer, stream: var seq[byte]) =
|
||||
## Writes the Peon bytecode headers in-place into the
|
||||
## given byte sequence
|
||||
stream.extend(PeonBytecodeMarker.toBytes())
|
||||
stream.add(byte(PEON_VERSION.major))
|
||||
stream.add(byte(PEON_VERSION.minor))
|
||||
stream.add(byte(PEON_VERSION.patch))
|
||||
stream.add(byte(len(PEON_BRANCH)))
|
||||
stream.extend(PEON_BRANCH.toBytes())
|
||||
stream.extend(PEON_COMMIT_HASH.toBytes())
|
||||
stream.extend(getTime().toUnixFloat().int().toBytes())
|
||||
|
||||
|
||||
proc writeLineData(self: BytecodeSerializer, stream: var seq[byte]) =
|
||||
## Writes line information for debugging
|
||||
## bytecode instructions to the given byte
|
||||
## sequence
|
||||
stream.extend(len(self.chunk.lines).toQuad())
|
||||
for b in self.chunk.lines:
|
||||
stream.extend(b.toTriple())
|
||||
|
||||
|
||||
proc writeFunctions(self: BytecodeSerializer, stream: var seq[byte]) =
|
||||
## Writes debug info about functions to the
|
||||
## given byte sequence
|
||||
stream.extend(len(self.chunk.functions).toQuad())
|
||||
stream.extend(self.chunk.functions)
|
||||
|
||||
|
||||
proc writeConstants(self: BytecodeSerializer, stream: var seq[byte]) =
|
||||
## Writes the constants table in-place into the
|
||||
## byte sequence
|
||||
stream.extend(self.chunk.consts.len().toQuad())
|
||||
stream.extend(self.chunk.consts)
|
||||
|
||||
|
||||
proc writeModules(self: BytecodeSerializer, stream: var seq[byte]) =
|
||||
## Writes module information to the given stream
|
||||
stream.extend(self.chunk.modules.len().toQuad())
|
||||
stream.extend(self.chunk.modules)
|
||||
|
||||
|
||||
proc writeCode(self: BytecodeSerializer, stream: var seq[byte]) =
|
||||
## Writes the bytecode from the given chunk to the
|
||||
## given source stream
|
||||
stream.extend(self.chunk.code.len.toTriple())
|
||||
stream.extend(self.chunk.code)
|
||||
|
||||
|
||||
proc readHeaders(self: BytecodeSerializer, stream: seq[byte], serialized: SerializedBytecode): int =
|
||||
## Reads the bytecode headers from a given sequence
|
||||
## of bytes
|
||||
var stream = stream
|
||||
if stream[0..<len(PeonBytecodeMarker)] != PeonBytecodeMarker.toBytes():
|
||||
self.error("malformed bytecode marker")
|
||||
result += len(PeonBytecodeMarker)
|
||||
stream = stream[len(PeonBytecodeMarker)..^1]
|
||||
serialized.version = (major: int(stream[0]), minor: int(stream[1]), patch: int(stream[2]))
|
||||
stream = stream[3..^1]
|
||||
result += 3
|
||||
let branchLength = stream[0]
|
||||
stream = stream[1..^1]
|
||||
result += 1
|
||||
serialized.branch = stream[0..<branchLength].fromBytes()
|
||||
stream = stream[branchLength..^1]
|
||||
result += int(branchLength)
|
||||
serialized.commit = stream[0..<40].fromBytes().toLowerAscii()
|
||||
stream = stream[40..^1]
|
||||
result += 40
|
||||
serialized.compileDate = int(fromLong([stream[0], stream[1], stream[2],
|
||||
stream[3], stream[4], stream[5], stream[6], stream[7]]))
|
||||
stream = stream[8..^1]
|
||||
result += 8
|
||||
|
||||
|
||||
proc readLineData(self: BytecodeSerializer, stream: seq[byte]): int =
|
||||
## Reads line information from a stream
|
||||
## of bytes
|
||||
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
|
||||
result += 4
|
||||
var stream = stream[4..^1]
|
||||
for i in countup(0, int(size) - 1):
|
||||
self.chunk.lines.add(int([stream[0], stream[1], stream[2]].fromTriple()))
|
||||
result += 3
|
||||
stream = stream[3..^1]
|
||||
doAssert len(self.chunk.lines) == int(size)
|
||||
|
||||
|
||||
proc readFunctions(self: BytecodeSerializer, stream: seq[byte]): int =
|
||||
## Reads the function segment from a stream
|
||||
## of bytes
|
||||
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
|
||||
result += 4
|
||||
var stream = stream[4..^1]
|
||||
for i in countup(0, int(size) - 1):
|
||||
self.chunk.functions.add(stream[i])
|
||||
inc(result)
|
||||
doAssert len(self.chunk.functions) == int(size)
|
||||
|
||||
|
||||
proc readConstants(self: BytecodeSerializer, stream: seq[byte]): int =
|
||||
## Reads the constant table from the given
|
||||
## byte sequence
|
||||
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
|
||||
result += 4
|
||||
var stream = stream[4..^1]
|
||||
for i in countup(0, int(size) - 1):
|
||||
self.chunk.consts.add(stream[i])
|
||||
inc(result)
|
||||
doAssert len(self.chunk.consts) == int(size)
|
||||
|
||||
|
||||
proc readModules(self: BytecodeSerializer, stream: seq[byte]): int =
|
||||
## Reads module information
|
||||
let size = [stream[0], stream[1], stream[2], stream[3]].fromQuad()
|
||||
result += 4
|
||||
var stream = stream[4..^1]
|
||||
for i in countup(0, int(size) - 1):
|
||||
self.chunk.modules.add(stream[i])
|
||||
inc(result)
|
||||
doAssert len(self.chunk.modules) == int(size)
|
||||
|
||||
|
||||
proc readCode(self: BytecodeSerializer, stream: seq[byte]): int =
|
||||
## Reads the bytecode from a given byte sequence
|
||||
let size = [stream[0], stream[1], stream[2]].fromTriple()
|
||||
var stream = stream[3..^1]
|
||||
for i in countup(0, int(size) - 1):
|
||||
self.chunk.code.add(stream[i])
|
||||
doAssert len(self.chunk.code) == int(size)
|
||||
return int(size)
|
||||
|
||||
|
||||
proc dumpBytes*(self: BytecodeSerializer, chunk: Chunk, filename: string): seq[byte] =
|
||||
## Dumps the given chunk to a sequence of bytes and returns it.
|
||||
## The filename argument is for error reporting only, use dumpFile
|
||||
## to dump bytecode to a file
|
||||
self.filename = filename
|
||||
self.chunk = chunk
|
||||
self.writeHeaders(result)
|
||||
self.writeLineData(result)
|
||||
self.writeFunctions(result)
|
||||
self.writeConstants(result)
|
||||
self.writeModules(result)
|
||||
self.writeCode(result)
|
||||
|
||||
|
||||
proc dumpFile*(self: BytecodeSerializer, chunk: Chunk, filename, dest: string) =
|
||||
## Dumps the result of dumpBytes to a file at dest
|
||||
var fp = open(dest, fmWrite)
|
||||
defer: fp.close()
|
||||
let data = self.dumpBytes(chunk, filename)
|
||||
discard fp.writeBytes(data, 0, len(data))
|
||||
|
||||
|
||||
proc loadBytes*(self: BytecodeSerializer, stream: seq[byte]): SerializedBytecode =
|
||||
## Loads the result from dumpBytes to a Serializer object
|
||||
## for use in the VM or for inspection
|
||||
discard self.newBytecodeSerializer()
|
||||
new(result)
|
||||
result.chunk = newChunk()
|
||||
result.size = stream.len()
|
||||
self.chunk = result.chunk
|
||||
var stream = stream
|
||||
try:
|
||||
stream = stream[self.readHeaders(stream, result)..^1]
|
||||
stream = stream[self.readLineData(stream)..^1]
|
||||
stream = stream[self.readFunctions(stream)..^1]
|
||||
stream = stream[self.readConstants(stream)..^1]
|
||||
stream = stream[self.readModules(stream)..^1]
|
||||
stream = stream[self.readCode(stream)..^1]
|
||||
except IndexDefect:
|
||||
self.error("truncated bytecode stream")
|
||||
except AssertionDefect:
|
||||
self.error(&"corrupted bytecode stream: {getCurrentExceptionMsg()}")
|
||||
|
||||
|
||||
proc loadFile*(self: BytecodeSerializer, src: string): SerializedBytecode =
|
||||
## Loads a bytecode file
|
||||
var fp = open(src, fmRead)
|
||||
defer: fp.close()
|
||||
let size = fp.getFileSize()
|
||||
var pos = 0'i64
|
||||
var data: seq[byte] = newSeqOfCap[byte](size)
|
||||
for _ in 0..<size:
|
||||
data.add(0)
|
||||
while pos < size:
|
||||
discard fp.readBytes(data, pos, size)
|
||||
pos = fp.getFilePos()
|
||||
return self.loadBytes(data)
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -32,12 +32,11 @@ import std/sets
|
|||
import std/monotimes
|
||||
|
||||
when debugVM or debugMem or debugGC or debugAlloc:
|
||||
import std/sequtils
|
||||
import std/terminal
|
||||
|
||||
|
||||
import frontend/compiler/targets/bytecode/opcodes
|
||||
import frontend/compiler/targets/bytecode/util/multibyte
|
||||
import backend/bytecode/opcodes
|
||||
import backend/bytecode/tooling/multibyte
|
||||
|
||||
|
||||
when debugVM:
|
||||
|
@ -50,7 +49,7 @@ type
|
|||
## peon objects
|
||||
String, List,
|
||||
Dict, Tuple,
|
||||
CustomType,
|
||||
Structure,
|
||||
HeapObject* = object
|
||||
## A tagged box for a heap-allocated
|
||||
## peon object
|
||||
|
@ -114,7 +113,6 @@ proc newPeonGC*: PeonGC =
|
|||
proc collect*(self: var PeonVM)
|
||||
|
||||
|
||||
|
||||
proc reallocate*(self: var PeonVM, p: pointer, oldSize: int, newSize: int): pointer =
|
||||
## Simple wrapper around realloc with
|
||||
## built-in garbage collection
|
||||
|
@ -217,27 +215,15 @@ proc markRoots(self: var PeonVM): HashSet[ptr HeapObject] =
|
|||
# will mistakenly assume the object to be reachable, potentially
|
||||
# leading to a nasty memory leak. Let's just hope a 48+ bit address
|
||||
# space makes this occurrence rare enough not to be a problem
|
||||
# handles a single type (uint64), while Lox has a stack
|
||||
# of heap-allocated structs (which is convenient, but slow).
|
||||
# What we do instead is store all pointers allocated by us
|
||||
# in a hash set and then check if any source of roots contained
|
||||
# any of the integer values that we're keeping track of. Note
|
||||
# that this means that if a primitive object's value happens to
|
||||
# collide with an active pointer, the GC will mistakenly assume
|
||||
# the object to be reachable (potentially leading to a nasty
|
||||
# memory leak). Hopefully, in a 64-bit address space, this
|
||||
# occurrence is rare enough for us to ignore
|
||||
var result = initHashSet[uint64](self.gc.pointers.len())
|
||||
result = initHashSet[ptr HeapObject](self.gc.pointers.len())
|
||||
for obj in self.calls:
|
||||
if obj in self.gc.pointers:
|
||||
result.incl(obj)
|
||||
result.incl(cast[ptr HeapObject](obj))
|
||||
for obj in self.operands:
|
||||
if obj in self.gc.pointers:
|
||||
result.incl(obj)
|
||||
var obj: ptr HeapObject
|
||||
result.incl(cast[ptr HeapObject](obj))
|
||||
for p in result:
|
||||
obj = cast[ptr HeapObject](p)
|
||||
if obj.mark():
|
||||
if p.mark():
|
||||
when debugMarkGC:
|
||||
echo &"DEBUG - GC: Marked object: {obj[]}"
|
||||
when debugGC:
|
||||
|
@ -270,10 +256,7 @@ proc trace(self: var PeonVM, roots: HashSet[ptr HeapObject]) =
|
|||
proc free(self: var PeonVM, obj: ptr HeapObject) =
|
||||
## Frees a single heap-allocated
|
||||
## peon object and all the memory
|
||||
## it directly or indirectly owns. Note
|
||||
## that the pointer itself is not released
|
||||
## from the GC's internal table and must be
|
||||
## handled by the caller
|
||||
## it directly or indirectly owns
|
||||
when debugAlloc:
|
||||
echo &"DEBUG - GC: Freeing object: {obj[]}"
|
||||
case obj.kind:
|
||||
|
@ -350,22 +333,11 @@ proc collect(self: var PeonVM) =
|
|||
|
||||
# Implementation of the peon VM
|
||||
|
||||
proc initCache*(self: var PeonVM) =
|
||||
## Initializes the VM's
|
||||
## singletons cache
|
||||
self.cache[0] = 0x0 # False
|
||||
self.cache[1] = 0x1 # True
|
||||
self.cache[2] = 0x2 # Nil
|
||||
self.cache[3] = 0x3 # Positive inf
|
||||
self.cache[4] = 0x4 # Negative inf
|
||||
self.cache[5] = 0x5 # NaN
|
||||
|
||||
|
||||
proc newPeonVM*: PeonVM =
|
||||
## Initializes a new, blank VM
|
||||
## for executing Peon bytecode
|
||||
result.ip = 0
|
||||
result.initCache()
|
||||
result.gc = newPeonGC()
|
||||
result.frames = @[]
|
||||
result.operands = @[]
|
||||
|
@ -380,15 +352,16 @@ func getNil*(self: var PeonVM): uint64 = self.cache[2]
|
|||
|
||||
func getBool*(self: var PeonVM, value: bool): uint64 =
|
||||
if value:
|
||||
return self.cache[1]
|
||||
return self.cache[0]
|
||||
return 0
|
||||
return 1
|
||||
|
||||
func getInf*(self: var PeonVM, positive: bool): uint64 =
|
||||
if positive:
|
||||
return self.cache[3]
|
||||
return self.cache[4]
|
||||
return cast[uint64](Inf)
|
||||
return cast[uint64](-Inf)
|
||||
|
||||
func getNan*(self: var PeonVM): uint64 = self.cache[5]
|
||||
|
||||
func getNan*(self: var PeonVM): uint64 = cast[uint64](NaN)
|
||||
|
||||
|
||||
# Thanks to nim's *genius* idea of making x > y a template
|
||||
|
@ -842,17 +815,9 @@ proc dispatch*(self: var PeonVM) {.inline.} =
|
|||
# Pops a value off the operand stack
|
||||
discard self.pop()
|
||||
of PushC:
|
||||
# Pushes a value from the operand stack
|
||||
# onto the call stack
|
||||
# Pops a value off the operand stack
|
||||
# and pushes it onto the call stack
|
||||
self.pushc(self.pop())
|
||||
of PopRepl:
|
||||
# Pops a peon object off the
|
||||
# operand stack and prints it.
|
||||
# Used in interactive REPL mode
|
||||
if self.frames.len() !> 1:
|
||||
discard self.pop()
|
||||
continue
|
||||
echo self.pop()
|
||||
of PopN:
|
||||
# Pops N elements off the call stack
|
||||
for _ in 0..<int(self.readShort()):
|
||||
|
@ -1003,7 +968,10 @@ proc dispatch*(self: var PeonVM) {.inline.} =
|
|||
of Float32LessOrEqual:
|
||||
self.push(self.getBool(cast[float32](self.pop()) <= cast[float32](self.pop())))
|
||||
of Identity:
|
||||
# Identity is implemented simply as pointer equality :)
|
||||
self.push(cast[uint64](self.pop() == self.pop()))
|
||||
of LogicalNot:
|
||||
self.push(uint64(not self.pop().bool))
|
||||
# Print opcodes
|
||||
of PrintInt64:
|
||||
echo cast[int64](self.pop())
|
||||
|
@ -1033,7 +1001,7 @@ proc dispatch*(self: var PeonVM) {.inline.} =
|
|||
else:
|
||||
echo "false"
|
||||
of PrintInf:
|
||||
if self.pop() == 0x3:
|
||||
if self.pop() == self.getInf(positive=true):
|
||||
echo "inf"
|
||||
else:
|
||||
echo "-inf"
|
||||
|
@ -1046,8 +1014,6 @@ proc dispatch*(self: var PeonVM) {.inline.} =
|
|||
stdout.write("\n")
|
||||
of SysClock64:
|
||||
self.push(cast[uint64](getMonoTime().ticks.float() / 1_000_000_000))
|
||||
of LogicalNot:
|
||||
self.push(uint64(not self.pop().bool))
|
||||
else:
|
||||
discard
|
||||
|
||||
|
@ -1066,11 +1032,11 @@ proc run*(self: var PeonVM, chunk: Chunk, breakpoints: seq[uint64] = @[], repl:
|
|||
try:
|
||||
self.dispatch()
|
||||
except Defect as e:
|
||||
stderr.writeLine(&"Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
|
||||
stderr.writeLine(&"VM: Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
|
||||
except CatchableError as e:
|
||||
stderr.writeLine(&"Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
|
||||
stderr.writeLine(&"VM: Fatal error at bytecode offset {self.ip - 1}: {e.name} -> {e.msg}")
|
||||
except NilAccessDefect:
|
||||
stderr.writeLine(&"Memory Access Violation (bytecode offset {self.ip}): SIGSEGV")
|
||||
stderr.writeLine(&"VM: Memory Access Violation (bytecode offset {self.ip}): SIGSEGV")
|
||||
quit(1)
|
||||
if not repl:
|
||||
# We clean up after ourselves!
|
||||
|
@ -1095,4 +1061,4 @@ proc resume*(self: var PeonVM, chunk: Chunk) =
|
|||
quit(1)
|
||||
|
||||
|
||||
{.pop.}
|
||||
{.pop.}
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -16,26 +16,33 @@ import std/strformat
|
|||
import std/os
|
||||
|
||||
|
||||
# These variables can be tweaked to debug and test various components of the toolchain
|
||||
var debugLexer* = false # Print the tokenizer's output
|
||||
var debugParser* = false # Print the AST generated by the parser
|
||||
type
|
||||
PeonBackend* = enum
|
||||
Bytecode,
|
||||
NativeC # Coming soon
|
||||
|
||||
# These variables can be tweaked to debug and test various components of the toolchain. Do not modify them directly,
|
||||
# use the command-line options instead (or -d:option=value for constants)
|
||||
var debugLexer* = false # Print the tokenizer's output (main module only)
|
||||
var debugParser* = false # Print the AST generated by the parser (main module only)
|
||||
var debugTypeChecker* = false # Debug the typechecker's output (main module only)
|
||||
var debugCompiler* = false # Disassemble and/or print the code generated by the compiler
|
||||
var debugSerializer* = false # Validate the bytecode serializer's output
|
||||
const debugVM* {.booldefine.} = false # Enable the runtime debugger in the bytecode VM
|
||||
const debugGC* {.booldefine.} = false # Debug the Garbage Collector (extremely verbose)
|
||||
const debugAlloc* {.booldefine.} = false # Trace object allocation (extremely verbose)
|
||||
const debugMem* {.booldefine.} = false # Debug the memory allocator (extremely verbose)
|
||||
var debugSerializer* = false # Validate the bytecode serializer's output
|
||||
const debugStressGC* {.booldefine.} = false # Make the GC run a collection at every allocation (VERY SLOW!)
|
||||
const debugMarkGC* {.booldefine.} = false # Trace the marking phase object by object (extremely verbose)
|
||||
const PeonBytecodeMarker* = "PEON_BYTECODE" # Magic value at the beginning of bytecode files
|
||||
const HeapGrowFactor* = 2 # The growth factor used by the GC to schedule the next collection
|
||||
const FirstGC* = 1024 * 1024; # How many bytes to allocate before running the first GC
|
||||
const enableVMChecks* {.booldefine.} = true; # Enables all types of compiler (nim-wise) checks in the VM
|
||||
const enableVMChecks* {.booldefine.} = true; # Enables all types of compiler checks in the VM
|
||||
# List of paths where peon looks for modules, in order (empty path means current directory, which always takes precedence)
|
||||
const moduleLookupPaths*: seq[string] = @["", "src/peon/stdlib", absolutePath(joinPath(".local", "peon", "stdlib"), getenv("HOME"))]
|
||||
when HeapGrowFactor <= 1:
|
||||
{.fatal: "Heap growth factor must be > 1".}
|
||||
const PeonVersion* = (major: 0, minor: 1, patch: 0)
|
||||
const PeonVersion* = (major: 0, minor: 2, patch: 0)
|
||||
const PeonRelease* = "alpha"
|
||||
const PeonCommitHash* = staticExec("git rev-parse HEAD")
|
||||
const PeonBranch* = staticExec("git symbolic-ref HEAD 2>/dev/null | cut -f 3 -d /")
|
||||
|
@ -45,37 +52,44 @@ const HelpMessage* = """The peon programming language, Copyright (C) 2023 Mattia
|
|||
This program is free software, see the license distributed with this program or check
|
||||
http://www.apache.org/licenses/LICENSE-2.0 for more info.
|
||||
|
||||
Note: This is very much a work in progress
|
||||
|
||||
Basic Usage
|
||||
-----------
|
||||
|
||||
$ peon file.pn Run the given Peon source file
|
||||
$ peon file.pbc Run the given Peon bytecode file
|
||||
peon [options] <file>[.pn] Run the given peon file
|
||||
peon [options] file.pbc Run the given peon bytecode file
|
||||
|
||||
|
||||
Options
|
||||
-------
|
||||
|
||||
-h, --help Show this help text and exit
|
||||
-v, --version Print the current peon version and exit
|
||||
-s, --string Execute the passed string as if it was a file
|
||||
-n, --noDump Don't dump the result of compilation to a file.
|
||||
Note that no dump is created when using -s/--string
|
||||
-b, --breakpoints Run the debugger at specific bytecode offsets (comma-separated).
|
||||
Only available with --target:bytecode and when compiled with VM
|
||||
debugging on (-d:debugVM at build time)
|
||||
-d, --disassemble Disassemble the output of compilation (only makes sense with --target:bytecode)
|
||||
-m, --mode Set the compilation mode. Acceptable values are 'debug' and
|
||||
'release'. Defaults to 'debug'
|
||||
-c, --compile Compile the code, but do not execute it. Useful along with -d
|
||||
-w, --warnings Turn warnings on or off (default: on). Acceptable values are
|
||||
yes/on and no/off
|
||||
--noWarn Disable a specific warning (for example, --noWarn:unusedVariable)
|
||||
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
|
||||
--target Select the compilation target (valid values are: 'c' and 'bytecode'). Defaults to
|
||||
'bytecode'
|
||||
-o, --output Rename the output file with this value (with --target:bytecode, a '.pbc' extension
|
||||
is added if not already present)
|
||||
--debug-dump Debug the bytecode serializer. Only makes sense with --target:bytecode
|
||||
--debug-lexer Show the lexer's output
|
||||
--debug-parser Show the parser's output
|
||||
-h, --help Show this help text and exit
|
||||
-v, --version Print the current peon version and exit
|
||||
-s, --string Use the passed string as if it was a file
|
||||
-w, --warnings Turn warnings on or off (default: on). Acceptable values are
|
||||
yes/on and no/off
|
||||
--noWarn Disable a specific warning (example: --noWarn:UserWarning)
|
||||
--noGen Don't generate any code (i.e. stop at the typechecking stage)
|
||||
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
|
||||
--debugLexer Show the lexer's output
|
||||
--debugParser Show the parser's output
|
||||
--debugTypeChecker Show the typechecker's output
|
||||
--debugCompiler Show the generated code (backend-specific)
|
||||
--listWarns Show a list of all warnings
|
||||
-b, --backend Select the compilation backend. Currently only supports 'bytecode' (the default)
|
||||
-c, --compile Compile the code, but do not run the main module
|
||||
-o, --output Rename the output executable to this (a "bc" extension is added for bytecode files,
|
||||
if not already present)
|
||||
-s, --string Run the given string as if it were a file (the filename is set to '<string>')
|
||||
--cacheDir Specify a directory where the peon compiler will dump code generation results
|
||||
to speed up subsequent builds. Defaults to ".buildcache"
|
||||
|
||||
The following options are specific to the 'bytecode' backend:
|
||||
-n, --noDump Do not dump bytecode files to the source directory. Note that
|
||||
no files are dumped when using -s/--string
|
||||
--breakpoints Set debugging breakpoints at the given bytecode offsets.
|
||||
Input should be a comma-separated list of positive integers
|
||||
(spacing is irrelevant). Only works if peon was compiled with
|
||||
-d:debugVM
|
||||
"""
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -19,3 +19,6 @@ type
|
|||
## peon failure (not to be used directly)
|
||||
file*: string # The file where the error occurred
|
||||
line*: int # The line where the error occurred
|
||||
|
||||
CodeGenError* = ref object of PeonException
|
||||
## An exception for a code generation failure
|
File diff suppressed because it is too large
Load Diff
|
@ -1,71 +0,0 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## The code generator for translating peon to C code
|
||||
import std/tables
|
||||
import std/strformat
|
||||
import std/algorithm
|
||||
import std/parseutils
|
||||
import std/strutils
|
||||
import std/sequtils
|
||||
import std/sets
|
||||
import std/os
|
||||
|
||||
|
||||
import frontend/compiler/compiler
|
||||
import frontend/parsing/lexer
|
||||
import frontend/parsing/parser
|
||||
import frontend/parsing/ast
|
||||
|
||||
|
||||
type
|
||||
CompilerFunc = object
|
||||
## An internal compiler function called
|
||||
## by pragmas
|
||||
kind: PragmaKind
|
||||
handler: proc (self: NativeCCompiler, pragma: Pragma, name: Name)
|
||||
|
||||
NativeCCompiler* = ref object of Compiler
|
||||
## The peon to C compiler
|
||||
|
||||
# Compiler procedures called by pragmas
|
||||
compilerProcs: TableRef[string, CompilerFunc]
|
||||
|
||||
|
||||
proc newNativeCCompiler*(replMode: bool = false): NativeCCompiler =
|
||||
## Initializes a new, blank, NativeCCompiler
|
||||
## object
|
||||
new(result)
|
||||
result.ast = @[]
|
||||
result.current = 0
|
||||
result.file = ""
|
||||
result.names = @[]
|
||||
result.depth = 0
|
||||
result.lines = @[]
|
||||
result.currentFunction = nil
|
||||
result.replMode = replMode
|
||||
result.currentModule = nil
|
||||
result.compilerProcs = newTable[string, CompilerFunc]()
|
||||
result.source = ""
|
||||
result.lexer = newLexer()
|
||||
result.lexer.fillSymbolTable()
|
||||
result.parser = newParser()
|
||||
result.isMainModule = false
|
||||
result.disabledWarnings = @[]
|
||||
|
||||
|
||||
method literal*(self: Compiler, node: ASTNode, compile: bool = true): Type {.discardable.} =
|
||||
## Compiles literal expressions
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,308 @@
|
|||
import errors
|
||||
import frontend/parsing/parser
|
||||
|
||||
|
||||
import std/tables
|
||||
|
||||
|
||||
export ast, errors
|
||||
|
||||
|
||||
|
||||
type
|
||||
IntegerSize* = enum
|
||||
## Integer size enumeration
|
||||
Tiny = 8
|
||||
Short = 16
|
||||
Long = 32
|
||||
LongLong = 64
|
||||
|
||||
FloatSize* = enum
|
||||
## Float size enumeration
|
||||
Half = 32
|
||||
Full = 64
|
||||
|
||||
TypeKind* = enum
|
||||
## Enumeration of compile-time types
|
||||
Integer,
|
||||
Float,
|
||||
String,
|
||||
NaN,
|
||||
Infinity,
|
||||
Boolean,
|
||||
Any,
|
||||
Typevar,
|
||||
Auto,
|
||||
Byte,
|
||||
Char,
|
||||
Structure,
|
||||
EnumEntry,
|
||||
Reference,
|
||||
Pointer,
|
||||
Union,
|
||||
Function,
|
||||
Lent,
|
||||
Const,
|
||||
Generic
|
||||
|
||||
Type* = ref object
|
||||
## A compile-time type
|
||||
|
||||
# Is this a type constant?
|
||||
constant*: bool
|
||||
# Can it be mutated?
|
||||
mutable*: bool
|
||||
# Is it a compiler intrinsic?
|
||||
intrinsic*: bool
|
||||
# Mapping of generic names to types
|
||||
genericTypes*: TableRef[string, Type]
|
||||
genericValues*: TableRef[string, Type]
|
||||
# Type pragmas
|
||||
pragmas*: TableRef[string, Pragma]
|
||||
case kind*: TypeKind
|
||||
of Integer:
|
||||
signed*: bool
|
||||
size*: IntegerSize
|
||||
of Float:
|
||||
width*: FloatSize
|
||||
of Infinity:
|
||||
positive*: bool
|
||||
of Function:
|
||||
isLambda*: bool
|
||||
isGenerator*: bool
|
||||
isCoroutine*: bool
|
||||
isAuto*: bool
|
||||
parameters*: TypeSignature
|
||||
returnType*: Type
|
||||
unsafe*: bool
|
||||
of Typevar:
|
||||
wrapped*: Type
|
||||
of Structure:
|
||||
name*: string
|
||||
fields*: TableRef[string, Type]
|
||||
parent*: Type
|
||||
interfaces*: seq[Type]
|
||||
isEnum*: bool
|
||||
of Reference, Pointer, Lent, Const:
|
||||
value*: Type
|
||||
of Generic, Union:
|
||||
types*: seq[tuple[match: bool, kind: Type, value: Expression]]
|
||||
else:
|
||||
discard
|
||||
|
||||
WarningKind* {.pure.} = enum
|
||||
## A warning enumeration type
|
||||
UserWarning
|
||||
|
||||
NameKind* {.pure.} = enum
|
||||
## A name enumeration type
|
||||
Default, Var, Module
|
||||
|
||||
Name* = ref object
|
||||
## A generic name object
|
||||
|
||||
# Type of the identifier (NOT of the value!)
|
||||
case kind*: NameKind
|
||||
of Module:
|
||||
path*: string
|
||||
# Full absolute path of the module,
|
||||
# including the extension
|
||||
absPath*: string
|
||||
# Just for easier lookup, it's all
|
||||
# pointers anyway
|
||||
names*: TableRef[string, Name]
|
||||
of NameKind.Var:
|
||||
# If the variable's value is another
|
||||
# name, this attribute contains its
|
||||
# name object. This is useful for things
|
||||
# like assigning functions to variables and
|
||||
# then calling the variable like it's the
|
||||
# original function
|
||||
assignedName*: Name
|
||||
else:
|
||||
discard
|
||||
# The name's identifier
|
||||
ident*: IdentExpr
|
||||
# Owner of the identifier (module)
|
||||
module*: Name
|
||||
# File where the name is declared
|
||||
file*: string
|
||||
# Scope depth
|
||||
depth*: int
|
||||
# Is this name private?
|
||||
isPrivate*: bool
|
||||
# The type of the name's associated
|
||||
# value
|
||||
valueType*: Type
|
||||
# The function that "owns" this name (may be nil!)
|
||||
owner*: Name
|
||||
# Where is this node declared in its file?
|
||||
line*: int
|
||||
# The AST node associated with this node. This
|
||||
# is needed because we tyoecheck function and type
|
||||
# declarations only if, and when, they're actually
|
||||
# used
|
||||
node*: Declaration
|
||||
# Who is this name exported to? (Only makes sense if isPrivate
|
||||
# equals false)
|
||||
exportedTo*: seq[Name]
|
||||
|
||||
TypeSignature* = seq[tuple[name: string, kind: Type, default: TypedExpr]]
|
||||
|
||||
## Our typed AST representation
|
||||
|
||||
TypedNode* = ref object of RootObj
|
||||
## A generic typed AST node
|
||||
node*: ASTNode
|
||||
|
||||
TypedExpr* = ref object of TypedNode
|
||||
## A generic typed expression
|
||||
kind*: Type
|
||||
|
||||
TypedUnaryExpr* = ref object of TypedExpr
|
||||
## A generic typed unary expression
|
||||
a*: TypedExpr
|
||||
|
||||
TypedBinaryExpr* = ref object of TypedUnaryExpr
|
||||
## A generic typed binary expression
|
||||
b*: TypedExpr
|
||||
|
||||
TypedIdentExpr* = ref object of TypedExpr
|
||||
## A typed identifier expression
|
||||
name*: Name
|
||||
|
||||
TypedCallExpr* = ref object of TypedExpr
|
||||
## A typed function call expression
|
||||
callee*: Name
|
||||
args*: seq[tuple[name: string, kind: Type, default: TypedExpr]]
|
||||
|
||||
TypedDecl* = ref object of TypedNode
|
||||
## A typed declaration node
|
||||
name*: Name # The declaration's name object
|
||||
|
||||
TypedVarDecl* = ref object of TypedDecl
|
||||
## A typed variable declaration node
|
||||
init*: TypedExpr
|
||||
|
||||
TypedTypeDecl* = ref object of TypedDecl
|
||||
## A typed type declaration node
|
||||
fields*: TableRef[string, TypedExpr]
|
||||
parent*: Name
|
||||
interfaces*: seq[TypedTypeDecl]
|
||||
|
||||
TypedEnumDecl* = ref object of TypedTypeDecl
|
||||
## A typed enum declaration node
|
||||
enumeration*: Type
|
||||
variants: seq[TypedTypeDecl]
|
||||
|
||||
TypedFunDecl* = ref object of TypedDecl
|
||||
## A typed function declaration
|
||||
args*: seq[tuple[name: Name, default: TypedExpr]]
|
||||
body*: TypedBlockStmt
|
||||
|
||||
TypedStmt* = ref object of TypedNode
|
||||
## A typed statement node
|
||||
|
||||
TypedExprStmt* = ref object of TypedStmt
|
||||
expression*: TypedExpr
|
||||
|
||||
TypedBlockStmt* = ref object of TypedStmt
|
||||
## A typed block statement
|
||||
body*: seq[TypedNode]
|
||||
|
||||
TypedIfStmt* = ref object of TypedStmt
|
||||
## A typed if statement node
|
||||
thenBranch*: TypedBlockStmt
|
||||
elseBranch*: TypedBlockStmt
|
||||
condition*: TypedExpr
|
||||
|
||||
TypedWhileStmt* = ref object of TypedStmt
|
||||
## A typed while statement node
|
||||
body*: TypedBlockStmt
|
||||
condition*: TypedExpr
|
||||
|
||||
|
||||
proc newTypedNode*(node: ASTNode): TypedNode =
|
||||
## Initializes a new typed node
|
||||
new(result)
|
||||
result.node = node
|
||||
|
||||
|
||||
proc newTypedExpr*(node: Expression, kind: Type): TypedExpr =
|
||||
## Initializes a new typed expression
|
||||
result = TypedExpr(node: node, kind: kind)
|
||||
|
||||
|
||||
proc newTypedDecl*(node: Declaration, name: Name): TypedDecl =
|
||||
## Initializes a new typed declaration
|
||||
result = TypedDecl(node: node, name: name)
|
||||
|
||||
|
||||
proc newTypedTypeDecl*(node: TypeDecl, name: Name, fields: TableRef[string, TypedExpr], parent: Name): TypedTypeDecl =
|
||||
## Initializes a new typed function declaration
|
||||
result = TypedTypeDecl(node: node, name: name, fields: fields, parent: parent)
|
||||
|
||||
|
||||
proc newTypedEnumDecl*(node: TypeDecl, name: Name, variants: seq[TypedTypeDecl], enumeration: Type): TypedEnumDecl =
|
||||
## Initializes a new typed function declaration
|
||||
result = TypedEnumDecl(node: node, name: name, variants: variants, enumeration: enumeration)
|
||||
|
||||
|
||||
proc newTypedFunDecl*(node: FunDecl, name: Name, body: TypedBlockStmt): TypedFunDecl =
|
||||
## Initializes a new typed function declaration
|
||||
result = TypedFunDecl(node: node, name: name, body: body)
|
||||
|
||||
|
||||
proc newTypedVarDecl*(node: VarDecl, name: Name, init: TypedExpr): TypedVarDecl =
|
||||
## Initializes a new typed function declaration
|
||||
result = TypedVarDecl(node: node, name: name, init: init)
|
||||
|
||||
|
||||
proc newTypedIdentExpr*(node: IdentExpr, name: Name): TypedIdentExpr =
|
||||
## Initializes a new typed identifier expression
|
||||
result = TypedIdentExpr(node: node, name: name, kind: name.valueType)
|
||||
|
||||
|
||||
proc newTypedUnaryExpr*(node: UnaryExpr, kind: Type, a: TypedExpr): TypedUnaryExpr =
|
||||
## Initializes a new typed unary expression
|
||||
result = TypedUnaryExpr(node: node, a: a, kind: kind)
|
||||
|
||||
|
||||
proc newTypedBinaryExpr*(node: UnaryExpr, kind: Type, a, b: TypedExpr): TypedBinaryExpr =
|
||||
## Initializes a new typed binary expression
|
||||
result = TypedBinaryExpr(node: node, a: a, b: b, kind: kind)
|
||||
|
||||
|
||||
proc newTypedCallExpr*(node: CallExpr, callee: Name,
|
||||
args: seq[tuple[name: string, kind: Type, default: TypedExpr]]): TypedCallExpr =
|
||||
|
||||
## Initializes a new typed function call expression
|
||||
result = TypedCallExpr(node: node, callee: callee, args: args, kind: callee.valueType.returnType)
|
||||
|
||||
|
||||
proc newTypedBlockStmt*(node: BlockStmt, body: seq[TypedNode]): TypedBlockStmt =
|
||||
## Initializes a new typed block statement
|
||||
result = TypedBlockStmt(node: node, body: body)
|
||||
|
||||
|
||||
proc newTypedWhileStmt*(node: WhileStmt, body: TypedBlockStmt, condition: TypedExpr): TypedWhileStmt =
|
||||
## Initializes a new typed while statement
|
||||
result = TypedWhileStmt(node: node, body: body, condition: condition)
|
||||
|
||||
|
||||
proc newTypedIfStmt*(node: IfStmt, thenBranch, elseBranch: TypedBlockStmt, condition: TypedExpr): TypedIfStmt =
|
||||
## Initializes a new typed block statement
|
||||
result = TypedIfStmt(node: node, thenBranch: thenBranch,
|
||||
elseBranch: elseBranch, condition: condition)
|
||||
|
||||
|
||||
proc getName*(self: TypedNode): Name =
|
||||
## Gets the name object associated with the
|
||||
## given typed node, if it has any
|
||||
case self.node.kind:
|
||||
of identExpr:
|
||||
result = TypedIdentExpr(self).name
|
||||
of NodeKind.funDecl, NodeKind.varDecl, NodeKind.typeDecl:
|
||||
result = TypedDecl(self).name
|
||||
else:
|
||||
result = nil # TODO
|
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -39,6 +39,10 @@ type
|
|||
# purposes
|
||||
keywords: TableRef[string, TokenType]
|
||||
symbols: TableRef[string, TokenType]
|
||||
|
||||
StringParseMode = enum
|
||||
Default, Raw, Format, Byte
|
||||
|
||||
Lexer* = ref object
|
||||
## A lexer object
|
||||
symbols*: SymbolTable
|
||||
|
@ -53,6 +57,7 @@ type
|
|||
linePos: int
|
||||
lineCurrent: int
|
||||
spaces: int
|
||||
|
||||
LexingError* = ref object of PeonException
|
||||
## A lexing exception
|
||||
lexer*: Lexer
|
||||
|
@ -233,8 +238,7 @@ proc peek(self: Lexer, distance: int = 0, length: int = 1): string =
|
|||
## may be empty
|
||||
var i = distance
|
||||
while len(result) < length:
|
||||
if self.done() or self.current + i > self.source.high() or
|
||||
self.current + i < 0:
|
||||
if self.current + i > self.source.high() or self.current + i < 0:
|
||||
break
|
||||
else:
|
||||
result.add(self.source[self.current + i])
|
||||
|
@ -315,17 +319,26 @@ proc parseEscape(self: Lexer) =
|
|||
## likely be soon. Another notable limitation is that
|
||||
## \xhhh and \nnn are limited to the size of a char
|
||||
## (i.e. uint8, or 256 values)
|
||||
|
||||
# TODO: Modifying the source is a bad idea. Currently commenting out
|
||||
# the code in here and just using it for validation purposes
|
||||
|
||||
case self.peek()[0]: # We use a char instead of a string because of how case statements handle ranges with strings
|
||||
# (i.e. not well, given they crash the C code generator)
|
||||
of 'a':
|
||||
self.source[self.current] = cast[char](0x07)
|
||||
# self.source[self.current] = cast[char](0x07)
|
||||
discard
|
||||
of 'b':
|
||||
self.source[self.current] = cast[char](0x7f)
|
||||
# self.source[self.current] = cast[char](0x7f)
|
||||
discard
|
||||
of 'e':
|
||||
self.source[self.current] = cast[char](0x1B)
|
||||
# self.source[self.current] = cast[char](0x1B)
|
||||
discard
|
||||
of 'f':
|
||||
self.source[self.current] = cast[char](0x0C)
|
||||
# self.source[self.current] = cast[char](0x0C)
|
||||
discard
|
||||
of 'n':
|
||||
#[
|
||||
when defined(windows):
|
||||
# We natively convert LF to CRLF on Windows, and
|
||||
# gotta thank Microsoft for the extra boilerplate!
|
||||
|
@ -336,53 +349,59 @@ proc parseEscape(self: Lexer) =
|
|||
self.source[self.current] = cast[char](0x0A)
|
||||
when defined(linux):
|
||||
self.source[self.current] = cast[char](0X0D)
|
||||
]#
|
||||
discard
|
||||
of 'r':
|
||||
self.source[self.current] = cast[char](0x0D)
|
||||
# self.source[self.current] = cast[char](0x0D)
|
||||
discard
|
||||
of 't':
|
||||
self.source[self.current] = cast[char](0x09)
|
||||
# self.source[self.current] = cast[char](0x09)
|
||||
discard
|
||||
of 'v':
|
||||
self.source[self.current] = cast[char](0x0B)
|
||||
# self.source[self.current] = cast[char](0x0B)
|
||||
discard
|
||||
of '"':
|
||||
self.source[self.current] = '"'
|
||||
# self.source[self.current] = '"'
|
||||
discard
|
||||
of '\'':
|
||||
self.source[self.current] = '\''
|
||||
# self.source[self.current] = '\''
|
||||
discard
|
||||
of '\\':
|
||||
self.source[self.current] = cast[char](0x5C)
|
||||
# self.source[self.current] = cast[char](0x5C)
|
||||
discard
|
||||
of '0'..'9': # This is the reason we're using char instead of string. See https://github.com/nim-lang/Nim/issues/19678
|
||||
var code = ""
|
||||
var value = 0
|
||||
var i = self.current
|
||||
while i < self.source.high() and (let c = self.source[
|
||||
i].toLowerAscii(); c in '0'..'7') and len(code) < 3:
|
||||
while i < self.source.high() and (let c = self.source[i].toLowerAscii(); c in '0'..'7') and len(code) < 3:
|
||||
code &= self.source[i]
|
||||
i += 1
|
||||
assert parseOct(code, value) == code.len()
|
||||
if value > uint8.high().int:
|
||||
self.error("escape sequence value too large (> 255)")
|
||||
self.source[self.current] = cast[char](value)
|
||||
# self.source[self.current] = cast[char](value)
|
||||
of 'u', 'U':
|
||||
self.error("unicode escape sequences are not supported (yet)")
|
||||
self.error("unicode escape sequences are not supported yet")
|
||||
of 'x':
|
||||
var code = ""
|
||||
var value = 0
|
||||
var i = self.current
|
||||
while i < self.source.high() and (let c = self.source[
|
||||
i].toLowerAscii(); c in 'a'..'f' or c in '0'..'9'):
|
||||
while i < self.source.high() and (let c = self.source[i].toLowerAscii(); c in 'a'..'f' or c in '0'..'9'):
|
||||
code &= self.source[i]
|
||||
i += 1
|
||||
assert parseHex(code, value) == code.len()
|
||||
if value > uint8.high().int:
|
||||
self.error("escape sequence value too large (> 255)")
|
||||
self.source[self.current] = cast[char](value)
|
||||
# self.source[self.current] = cast[char](value)
|
||||
else:
|
||||
self.error(&"invalid escape sequence '\\{self.peek()}'")
|
||||
|
||||
|
||||
proc parseString(self: Lexer, delimiter: string, mode: string = "single") =
|
||||
## Parses string literals. They can be expressed using matching pairs
|
||||
## of either single or double quotes. Most C-style escape sequences are
|
||||
## supported, moreover, a specific prefix may be prepended
|
||||
## to the string to instruct the lexer on how to parse it:
|
||||
proc parseString(self: Lexer, delimiter: string, mode: StringParseMode = Default) =
|
||||
## Parses string and character literals. They can be expressed using
|
||||
## matching pairs of double or single quotes respectively. Most C-style
|
||||
## escape sequences are supported, moreover, a specific prefix may be
|
||||
## prepended to the string to instruct the lexer on how to parse it:
|
||||
## - b -> declares a byte string, where each character is
|
||||
## interpreted as an integer instead of a character
|
||||
## - r -> declares a raw string literal, where escape sequences
|
||||
|
@ -397,56 +416,45 @@ proc parseString(self: Lexer, delimiter: string, mode: string = "single") =
|
|||
## strings, so a multi-line string prefixed with the "r" modifier
|
||||
## is redundant, although multi-line byte/format strings are supported
|
||||
var slen = 0
|
||||
while not self.check(delimiter) and not self.done():
|
||||
if self.match("\n"):
|
||||
if mode == "multi":
|
||||
self.incLine()
|
||||
else:
|
||||
self.error("unexpected EOL while parsing string literal")
|
||||
if mode in ["raw", "multi"]:
|
||||
while not self.check(delimiter) and not self.done():
|
||||
inc(slen)
|
||||
if mode == Raw:
|
||||
discard self.step()
|
||||
elif self.match("\\"):
|
||||
# This madness here serves to get rid of the slash, since \x is mapped
|
||||
# to a one-byte sequence but the string '\x' is actually 2 bytes (or more,
|
||||
# depending on the specific escape sequence)
|
||||
self.source = self.source[0..<self.current] & self.source[
|
||||
self.current + 1..^1]
|
||||
self.parseEscape()
|
||||
if mode == "format" and self.match("{"):
|
||||
discard self.step()
|
||||
continue
|
||||
elif mode == Format:
|
||||
if self.match("{"):
|
||||
self.source = self.source[0..<self.current] & self.source[
|
||||
self.current + 1..^1]
|
||||
continue
|
||||
while not self.check(["}", "\""]):
|
||||
discard self.step()
|
||||
if self.check("\""):
|
||||
self.error("unclosed '{' in format string")
|
||||
elif mode == "format" and self.check("}"):
|
||||
if not self.check("}", 1):
|
||||
if self.match("{"):
|
||||
continue
|
||||
while not self.check(["}", "\""]):
|
||||
discard self.step()
|
||||
if self.check("\""):
|
||||
self.error("unclosed '{' in format string")
|
||||
elif self.check("}") and not self.check("}", 1):
|
||||
self.error("unmatched '}' in format string")
|
||||
else:
|
||||
self.source = self.source[0..<self.current] & self.source[
|
||||
self.current + 1..^1]
|
||||
discard self.step()
|
||||
inc(slen)
|
||||
if slen > 1 and delimiter == "'":
|
||||
self.error("invalid character literal (length must be one!)")
|
||||
if mode == "multi":
|
||||
if not self.match(delimiter.repeat(3)):
|
||||
self.error("unexpected EOL while parsing multi-line string literal")
|
||||
elif self.done() and self.peek(-1) != delimiter:
|
||||
self.error("unexpected EOF while parsing string literal")
|
||||
if self.done() and not self.match(delimiter):
|
||||
if delimiter == "'":
|
||||
self.error("unexpected EOF while parsing character literal")
|
||||
else:
|
||||
self.error("unexpected EOF while parsing string literal")
|
||||
else:
|
||||
discard self.step()
|
||||
if delimiter == "\"":
|
||||
if delimiter != "'":
|
||||
self.createToken(String)
|
||||
else:
|
||||
if slen == 0:
|
||||
self.error("character literal cannot be of length zero")
|
||||
elif slen > 1:
|
||||
self.error("invalid character literal (length must be one!)")
|
||||
self.createToken(Char)
|
||||
|
||||
|
||||
proc parseBinary(self: Lexer) =
|
||||
## Parses binary numbers
|
||||
while self.peek().isDigit():
|
||||
while self.peek().isDigit() and not self.done():
|
||||
if not self.check(["0", "1"]):
|
||||
self.error(&"invalid digit '{self.peek()}' in binary literal")
|
||||
discard self.step()
|
||||
|
@ -454,7 +462,7 @@ proc parseBinary(self: Lexer) =
|
|||
|
||||
proc parseOctal(self: Lexer) =
|
||||
## Parses octal numbers
|
||||
while self.peek().isDigit():
|
||||
while self.peek().isDigit() and not self.done():
|
||||
if self.peek() notin "0".."7":
|
||||
self.error(&"invalid digit '{self.peek()}' in octal literal")
|
||||
discard self.step()
|
||||
|
@ -462,7 +470,7 @@ proc parseOctal(self: Lexer) =
|
|||
|
||||
proc parseHex(self: Lexer) =
|
||||
## Parses hexadecimal numbers
|
||||
while self.peek().isAlphaNumeric():
|
||||
while self.peek().isAlphaNumeric() and not self.done():
|
||||
if not self.peek().isDigit() and self.peek().toLowerAscii() notin "a".."f":
|
||||
self.error(&"invalid hexadecimal literal")
|
||||
discard self.step()
|
||||
|
@ -508,7 +516,7 @@ proc parseNumber(self: Lexer) =
|
|||
elif self.check("."):
|
||||
# TODO: Is there a better way?
|
||||
discard self.step()
|
||||
if not isDigit(self.peek()):
|
||||
if not isDigit(self.peek()) or self.done():
|
||||
self.error("invalid float number literal")
|
||||
kind = Float
|
||||
while isDigit(self.peek()) and not self.done():
|
||||
|
@ -526,18 +534,18 @@ proc parseNumber(self: Lexer) =
|
|||
|
||||
|
||||
proc parseBackticks(self: Lexer) =
|
||||
## Parses tokens surrounded
|
||||
## by backticks. This may be used
|
||||
## for name stropping as well as to
|
||||
## reimplement existing operators
|
||||
## (e.g. +, -, etc.) without the
|
||||
## parser complaining about syntax
|
||||
## errors
|
||||
## Parses any character surrounded
|
||||
## by backticks and produces a single
|
||||
## identifier. This allows using any
|
||||
## otherwise "illegal" character as part
|
||||
## of the identifier (like unicode runes),
|
||||
## except for newlines, tabs, carriage returns
|
||||
## and other useless/confusing escape sequences
|
||||
## like \e and \f
|
||||
while not self.match("`") and not self.done():
|
||||
if self.peek().isAlphaNumeric() or self.symbols.existsSymbol(self.peek()):
|
||||
discard self.step()
|
||||
continue
|
||||
self.error(&"unexpected character: '{self.peek()}'")
|
||||
if self.match(["\n", "\t", "\e", "\r", "\e"]):
|
||||
self.error(&"unexpected character in stropped identifier: '{self.peek()}'")
|
||||
discard self.step()
|
||||
self.createToken(Identifier)
|
||||
# Strips the backticks
|
||||
self.tokens[^1].lexeme = self.tokens[^1].lexeme[1..^2]
|
||||
|
@ -545,9 +553,9 @@ proc parseBackticks(self: Lexer) =
|
|||
|
||||
proc parseIdentifier(self: Lexer) =
|
||||
## Parses keywords and identifiers.
|
||||
## Note that multi-character tokens
|
||||
## (aka UTF runes) are not supported
|
||||
## by design and *will* break things
|
||||
## This function handles ASCII characters
|
||||
## only. For unicode support, parseBackticks
|
||||
## is used instead
|
||||
while (self.peek().isAlphaNumeric() or self.check("_")) and not self.done():
|
||||
discard self.step()
|
||||
let name: string = self.source[self.start..<self.current]
|
||||
|
@ -586,13 +594,12 @@ proc next(self: Lexer) =
|
|||
self.parseBackticks()
|
||||
elif self.match(["\"", "'"]):
|
||||
# String or character literal
|
||||
var mode = "single"
|
||||
var delimiter = self.peek(-1)
|
||||
if self.peek(-1) != "'" and self.check(self.peek(-1)) and self.check(
|
||||
self.peek(-1), 1):
|
||||
# Multiline strings start with 3 quotes
|
||||
discard self.step(2)
|
||||
mode = "multi"
|
||||
self.parseString(self.peek(-1), mode)
|
||||
delimiter.add(self.step(2))
|
||||
self.parseString(self.peek(-1), Default)
|
||||
elif self.peek().isDigit():
|
||||
discard self.step() # Needed because parseNumber reads the next
|
||||
# character to tell the base of the number
|
||||
|
@ -600,13 +607,19 @@ proc next(self: Lexer) =
|
|||
self.parseNumber()
|
||||
elif self.peek().isAlphaNumeric() and self.check(["\"", "'"], 1):
|
||||
# Prefixed string literal (i.e. f"Hi {name}!")
|
||||
var mode = Default
|
||||
var delimiter = self.step()
|
||||
if self.peek(-1) != "'" and self.check(self.peek(-1)) and self.check(
|
||||
self.peek(-1), 1):
|
||||
# Multiline strings start with 3 quotes
|
||||
delimiter.add(self.step(2))
|
||||
case self.step():
|
||||
of "r":
|
||||
self.parseString(self.step(), "raw")
|
||||
self.parseString(delimiter, Raw)
|
||||
of "b":
|
||||
self.parseString(self.step(), "bytes")
|
||||
self.parseString(self.step(), Byte)
|
||||
of "f":
|
||||
self.parseString(self.step(), "format")
|
||||
self.parseString(self.step(), Format)
|
||||
else:
|
||||
self.error(&"unknown string prefix '{self.peek(-1)}'")
|
||||
elif self.peek().isAlphaNumeric() or self.check("_"):
|
||||
|
@ -641,8 +654,10 @@ proc next(self: Lexer) =
|
|||
return
|
||||
dec(n)
|
||||
# We just assume what we have in front of us
|
||||
# is a symbol
|
||||
discard self.step()
|
||||
# is a symbol and parse as much as possible (i.e.
|
||||
# until a space is found)
|
||||
while not self.check(" ") and not self.done():
|
||||
discard self.step()
|
||||
self.createToken(Symbol)
|
||||
|
||||
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -34,14 +34,14 @@ type
|
|||
Function, Break, Continue,
|
||||
Var, Let, Const, Return,
|
||||
Coroutine, Generator, Import,
|
||||
Raise, Assert, Await, Foreach,
|
||||
Assert, Await, Foreach,
|
||||
Yield, Type, Operator, Case,
|
||||
Enum, From, Ptr, Ref, Object,
|
||||
Enum, Ptr, Ref, Object,
|
||||
Export, Block, Switch, Lent
|
||||
|
||||
# Literal types
|
||||
Integer, Float, String, Identifier,
|
||||
Binary, Octal, Hex, Char
|
||||
Binary, Octal, Hex, Char, Nan, Inf
|
||||
|
||||
# Brackets, parentheses,
|
||||
# operators and others
|
||||
|
@ -80,4 +80,4 @@ proc `$`*(self: Token): string =
|
|||
|
||||
proc `==`*(self, other: Token): bool =
|
||||
## Returns self == other
|
||||
return self.kind == other.kind and self.lexeme == other.lexeme
|
||||
return self.kind == other.kind and self.lexeme == other.lexeme
|
70
src/main.nim
70
src/main.nim
|
@ -1,70 +0,0 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import util/fmterr
|
||||
import util/symbols
|
||||
import frontend/parsing/lexer
|
||||
import frontend/parsing/parser
|
||||
import frontend/compiler/compiler
|
||||
|
||||
|
||||
import std/strformat
|
||||
|
||||
|
||||
proc `$`(self: TypedNode): string =
|
||||
if self.node.isConst():
|
||||
var self = TypedExpr(self)
|
||||
return &"{self.node}: {self.kind[]}"
|
||||
case self.node.kind:
|
||||
of varDecl, typeDecl, funDecl:
|
||||
var self = TypedDecl(self)
|
||||
result = &"{self.name[]}: {self.name.valueType[]}"
|
||||
of identExpr, binaryExpr, unaryExpr:
|
||||
var self = TypedExpr(self)
|
||||
result &= &"{self.node}: {self.kind[]}"
|
||||
else:
|
||||
result = &"{self.node}: ? ({self.node.kind})"
|
||||
|
||||
|
||||
proc main =
|
||||
var
|
||||
lexer = newLexer()
|
||||
parser = newParser()
|
||||
compiler = newPeonCompiler()
|
||||
source: string
|
||||
file = "test.pn"
|
||||
lexer.fillSymbolTable()
|
||||
while true:
|
||||
stdout.write(">>> ")
|
||||
stdout.flushFile()
|
||||
try:
|
||||
source = stdin.readLine()
|
||||
for typedNode in compiler.compile(parser.parse(lexer.lex(source, file), file, lexer.getLines(), lexer.getSource()), lexer.getFile(), lexer.getSource(),
|
||||
showMismatches=true):
|
||||
echo &"{typedNode.node} -> {compiler.stringify(typedNode)}\n"
|
||||
except IOError:
|
||||
echo ""
|
||||
break
|
||||
except LexingError as exc:
|
||||
print(exc)
|
||||
except ParseError as exc:
|
||||
print(exc)
|
||||
except CompileError as exc:
|
||||
print(exc)
|
||||
|
||||
|
||||
|
||||
when isMainModule:
|
||||
setControlCHook(proc () {.noconv.} = echo ""; quit(0))
|
||||
main()
|
|
@ -0,0 +1,413 @@
|
|||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import config
|
||||
import util/fmterr
|
||||
import util/symbols
|
||||
import frontend/parsing/lexer
|
||||
import frontend/parsing/parser
|
||||
import frontend/compiler/typechecker
|
||||
import backend/bytecode/codegen/generator
|
||||
import backend/bytecode/tooling/serializer
|
||||
import backend/bytecode/opcodes
|
||||
import backend/bytecode/tooling/debugger
|
||||
import backend/bytecode/vm
|
||||
|
||||
|
||||
import std/os
|
||||
import std/parseopt
|
||||
import std/strutils
|
||||
import std/terminal
|
||||
import std/strformat
|
||||
import std/times
|
||||
|
||||
|
||||
# Thanks art <3
|
||||
#[
|
||||
import jale/editor as ed
|
||||
import jale/templates
|
||||
import jale/plugin/defaults
|
||||
import jale/plugin/editor_history
|
||||
import jale/keycodes
|
||||
import jale/multiline
|
||||
|
||||
proc getLineEditor: LineEditor =
|
||||
result = newLineEditor()
|
||||
result.prompt = "=> "
|
||||
result.populateDefaults()
|
||||
let history = result.plugHistory()
|
||||
result.bindHistory(history)
|
||||
]#
|
||||
|
||||
|
||||
proc `$`(self: TypedNode): string =
|
||||
if self.node.isConst():
|
||||
var self = TypedExpr(self)
|
||||
return &"{self.node}: {self.kind[]}"
|
||||
case self.node.kind:
|
||||
of varDecl, typeDecl, funDecl:
|
||||
var self = TypedDecl(self)
|
||||
result = &"{self.name[]}: {self.name.valueType[]}"
|
||||
of identExpr, binaryExpr, unaryExpr:
|
||||
var self = TypedExpr(self)
|
||||
result &= &"{self.node}: {self.kind[]}"
|
||||
else:
|
||||
result = &"{self.node}: ? ({self.node.kind})"
|
||||
|
||||
|
||||
proc runFile(filename: string, fromString: bool = false, dump: bool = true, generate: bool = true, breakpoints: seq[uint64] = @[],
|
||||
disabledWarnings: seq[WarningKind] = @[], mismatches: bool = false, run: bool = true,
|
||||
backend: PeonBackend = PeonBackend.Bytecode, output: string, cacheDir: string) =
|
||||
var
|
||||
tokens: seq[Token]
|
||||
tree: ParseTree
|
||||
typedNodes: seq[TypedNode]
|
||||
tokenizer = newLexer()
|
||||
parser = newParser()
|
||||
typeChecker = newTypeChecker()
|
||||
input: string
|
||||
filename = filename
|
||||
isBinary = false
|
||||
output = output
|
||||
tokenizer.fillSymbolTable()
|
||||
try:
|
||||
if not fromString and filename.endsWith(".pbc"):
|
||||
isBinary = true
|
||||
if fromString:
|
||||
input = filename
|
||||
filename = "<string>"
|
||||
else:
|
||||
input = readFile(filename)
|
||||
if not isBinary:
|
||||
tokens = tokenizer.lex(input, filename)
|
||||
if tokens.len() == 0:
|
||||
return
|
||||
if debugLexer:
|
||||
styledEcho fgCyan, "Tokenizer output:"
|
||||
for i, token in tokens:
|
||||
if i == tokens.high():
|
||||
# Who cares about EOF?
|
||||
break
|
||||
styledEcho fgGreen, "\t", $token
|
||||
echo ""
|
||||
tree = parser.parse(tokens, filename, tokenizer.getLines(), input)
|
||||
if tree.len() == 0:
|
||||
return
|
||||
if debugParser:
|
||||
styledEcho fgCyan, "Parser output:"
|
||||
for node in tree:
|
||||
styledEcho fgGreen, "\t", $node
|
||||
echo ""
|
||||
typedNodes = typeChecker.validate(tree, filename, tokenizer.getSource(), mismatches, disabledWarnings)
|
||||
if debugTypeChecker:
|
||||
styledEcho fgCyan, "Typechecker output:"
|
||||
for typedNode in typedNodes:
|
||||
case typedNode.node.kind:
|
||||
of exprStmt:
|
||||
# *Technically* an expression statement has no type, but that isn't really useful for debug
|
||||
# purposes, so we print the type of the expression within it instead
|
||||
let exprNode = TypedExprStmt(typedNode).expression
|
||||
styledEcho fgGreen, &"\t{typedNode.node} (inner) -> {typeChecker.stringify(exprNode.kind)}\n"
|
||||
else:
|
||||
styledEcho fgGreen, &"\t{typedNode.node} -> {typeChecker.stringify(typedNode)}\n"
|
||||
if not generate:
|
||||
return
|
||||
case backend:
|
||||
of PeonBackend.Bytecode:
|
||||
var
|
||||
debugger = newBytecodeDebugger()
|
||||
generator = newBytecodeGenerator()
|
||||
serializer = newBytecodeSerializer()
|
||||
vm = newPeonVM()
|
||||
chunk: Chunk = newChunk()
|
||||
serialized: SerializedBytecode
|
||||
if not isBinary:
|
||||
chunk = generator.generate(typedNodes, typeChecker)
|
||||
serialized = serializer.loadBytes(serializer.dumpBytes(chunk, filename))
|
||||
else:
|
||||
serialized = serializer.loadFile(filename)
|
||||
chunk = serialized.chunk
|
||||
if dump and not fromString:
|
||||
if not output.endsWith(".pbc"):
|
||||
output.add(".pbc")
|
||||
if not dirExists(cacheDir):
|
||||
createDir(cacheDir)
|
||||
serializer.dumpFile(chunk, joinPath(cacheDir, filename), output)
|
||||
if debugCompiler:
|
||||
styledEcho fgCyan, "Disassembler output below"
|
||||
debugger.disassembleChunk(chunk, filename)
|
||||
if debugSerializer:
|
||||
styledEcho fgCyan, "Serializer checks: "
|
||||
styledEcho fgBlue, "\t- Peon version: ", fgYellow, &"{serialized.version.major}.{serialized.version.minor}.{serialized.version.patch}", fgBlue, " (commit ", fgYellow, serialized.commit[0..8], fgBlue, ") on branch ", fgYellow, serialized.branch
|
||||
stdout.styledWriteLine(fgBlue, "\t- Compilation date & time: ", fgYellow, fromUnix(serialized.compileDate).format("d/M/yyyy HH:mm:ss"))
|
||||
stdout.styledWriteLine(fgBlue, "\t- Total binary size: ", fgYellow, formatSize(serialized.size))
|
||||
|
||||
stdout.styledWrite(fgBlue, &"\t- Constants segment: ")
|
||||
if serialized.chunk.consts == chunk.consts:
|
||||
styledEcho fgGreen, "OK"
|
||||
else:
|
||||
styledEcho fgRed, "Corrupted"
|
||||
stdout.styledWrite(fgBlue, &"\t- Code segment: ")
|
||||
if serialized.chunk.code == chunk.code:
|
||||
styledEcho fgGreen, "OK"
|
||||
else:
|
||||
styledEcho fgRed, "Corrupted"
|
||||
stdout.styledWrite(fgBlue, "\t- Line info segment: ")
|
||||
if serialized.chunk.lines == chunk.lines:
|
||||
styledEcho fgGreen, "OK"
|
||||
else:
|
||||
styledEcho fgRed, "Corrupted"
|
||||
stdout.styledWrite(fgBlue, "\t- Functions segment: ")
|
||||
if serialized.chunk.functions == chunk.functions:
|
||||
styledEcho fgGreen, "OK"
|
||||
else:
|
||||
styledEcho fgRed, "Corrupted"
|
||||
stdout.styledWrite(fgBlue, "\t- Modules segment: ")
|
||||
if serialized.chunk.modules == chunk.modules:
|
||||
styledEcho fgGreen, "OK"
|
||||
else:
|
||||
styledEcho fgRed, "Corrupted"
|
||||
if run:
|
||||
vm.run(chunk, breakpoints, repl=false)
|
||||
else:
|
||||
discard
|
||||
except LexingError as exc:
|
||||
print(exc)
|
||||
except ParseError as exc:
|
||||
print(exc)
|
||||
except TypeCheckError as exc:
|
||||
print(exc)
|
||||
except CodeGenError as exc:
|
||||
var file = exc.file
|
||||
if file notin ["<string>", ""]:
|
||||
file = relativePath(file, getCurrentDir())
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error while generating code for ", fgYellow, file, fgDefault, &": {exc.msg}")
|
||||
except SerializationError as exc:
|
||||
var file = exc.file
|
||||
if file notin ["<string>", ""]:
|
||||
file = relativePath(file, getCurrentDir())
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error while (de-)serializing ", fgYellow, file, fgDefault, &": {exc.msg}")
|
||||
except IOError as exc:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error while trying to read ", fgYellow, filename, fgDefault, &": {exc.msg}")
|
||||
except OSError as exc:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error while trying to read ", fgYellow, filename, fgDefault, &": {exc.msg} ({osErrorMsg(osLastError())})",
|
||||
fgRed, "[errno ", fgYellow, $osLastError(), fgRed, "]")
|
||||
|
||||
#[
|
||||
proc repl(warnings: seq[WarningKind] = @[], showMismatches: bool = false) =
|
||||
var
|
||||
keep = true
|
||||
tokens: seq[Token]
|
||||
tree: ParseTree
|
||||
typeChecker = newTypeChecker()
|
||||
lexer = newLexer()
|
||||
parser = newParser()
|
||||
editor = getLineEditor()
|
||||
input: string
|
||||
lexer.fillSymbolTable()
|
||||
editor.bindEvent(jeQuit):
|
||||
stdout.styledWriteLine(fgGreen, "Goodbye!")
|
||||
keep = false
|
||||
input = ""
|
||||
editor.bindKey("ctrl+a"):
|
||||
editor.content.home()
|
||||
editor.bindKey("ctrl+e"):
|
||||
editor.content.`end`()
|
||||
while keep:
|
||||
try:
|
||||
input = editor.read()
|
||||
if input == "#clear":
|
||||
stdout.write("\x1Bc")
|
||||
continue
|
||||
elif input == "":
|
||||
continue
|
||||
tokens = lexer.lex(input, "stdin")
|
||||
if tokens.len() == 0:
|
||||
continue
|
||||
if debugLexer:
|
||||
styledEcho fgCyan, "Tokenizer output:"
|
||||
for i, token in tokens:
|
||||
if i == tokens.high():
|
||||
# Who cares about EOF?
|
||||
break
|
||||
styledEcho fgGreen, "\t", $token
|
||||
echo ""
|
||||
tree = parser.parse(tokens, "stdin", lexer.getLines(), input, persist=true)
|
||||
if tree.len() == 0:
|
||||
continue
|
||||
if debugParser:
|
||||
styledEcho fgCyan, "Parser output:"
|
||||
for node in tree:
|
||||
styledEcho fgGreen, "\t", $node
|
||||
echo ""
|
||||
if debugTypeChecker:
|
||||
styledEcho fgCyan, "Typechecker output:"
|
||||
for typedNode in typeChecker.validate(parser.parse(lexer.lex(input, "<stdin>"), lexer.getFile(), lexer.getLines(), lexer.getSource()),
|
||||
lexer.getFile(), lexer.getSource(), showMismatches=showMismatches, disabledWarnings=warnings):
|
||||
if debugTypeChecker:
|
||||
styledEcho fgGreen, &"\t{typedNode.node} -> {typeChecker.stringify(typedNode)}\n"
|
||||
echo ""
|
||||
except LexingError:
|
||||
print(LexingError(getCurrentException()))
|
||||
except ParseError:
|
||||
print(ParseError(getCurrentException()))
|
||||
except TypeCheckError:
|
||||
print(TypeCheckError(getCurrentException()))
|
||||
quit(0)
|
||||
]#
|
||||
|
||||
|
||||
when isMainModule:
|
||||
setControlCHook(proc () {.noconv.} = quit(0))
|
||||
var
|
||||
optParser = initOptParser(commandLineParams())
|
||||
file: string
|
||||
fromString: bool
|
||||
dump = true
|
||||
warnings: seq[WarningKind] = @[]
|
||||
showMismatches = false
|
||||
cachePath: string = ".buildcache"
|
||||
#mode: CompileMode = CompileMode.Debug
|
||||
run = true
|
||||
generateCode = true
|
||||
backend: PeonBackend
|
||||
output: string
|
||||
breakpoints: seq[uint64]
|
||||
for kind, key, value in optParser.getopt():
|
||||
case kind:
|
||||
of cmdArgument:
|
||||
file = key
|
||||
of cmdLongOption:
|
||||
case key:
|
||||
#[
|
||||
of "mode":
|
||||
if value.toLowerAscii() == "release":
|
||||
mode = CompileMode.Release
|
||||
elif value.toLowerAscii() == "debug":
|
||||
discard
|
||||
else:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid value for option 'mode' (valid options are: debug, release)")
|
||||
quit()
|
||||
]#
|
||||
of "help":
|
||||
echo HELP_MESSAGE
|
||||
quit()
|
||||
of "version":
|
||||
echo PEON_VERSION_STRING
|
||||
quit()
|
||||
of "string":
|
||||
file = key
|
||||
fromString = true
|
||||
of "noGen":
|
||||
generateCode = false
|
||||
of "noDump":
|
||||
dump = false
|
||||
of "warnings":
|
||||
if value.toLowerAscii() in ["yes", "on"]:
|
||||
warnings = @[]
|
||||
elif value.toLowerAscii() in ["no", "off"]:
|
||||
for warning in WarningKind:
|
||||
warnings.add(warning)
|
||||
else:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid value for option 'warnings' (valid options are: yes, on, no, off)")
|
||||
quit()
|
||||
of "showMismatches":
|
||||
showMismatches = true
|
||||
of "noWarn":
|
||||
case value:
|
||||
of "UserWarning":
|
||||
warnings.add(UserWarning)
|
||||
else:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid warning name for option 'noWarn'")
|
||||
quit()
|
||||
of "listWarns":
|
||||
echo "Currently supported warnings: "
|
||||
for warning in WarningKind:
|
||||
echo &" - {warning}"
|
||||
quit(0)
|
||||
of "debugTypeChecker":
|
||||
debugTypeChecker = true
|
||||
of "debugCompiler":
|
||||
debugCompiler = true
|
||||
of "debugSerializer":
|
||||
debugSerializer = true
|
||||
of "compile":
|
||||
run = false
|
||||
of "output":
|
||||
output = value
|
||||
of "backend":
|
||||
case value:
|
||||
of "bytecode":
|
||||
backend = PeonBackend.Bytecode
|
||||
of "c":
|
||||
backend = PeonBackend.NativeC
|
||||
of "debug-dump":
|
||||
debugSerializer = true
|
||||
of "debugLexer":
|
||||
debugLexer = true
|
||||
of "debugParser":
|
||||
debugParser = true
|
||||
of "cachePath":
|
||||
cachePath = value
|
||||
of "breakpoints":
|
||||
when debugVM:
|
||||
for point in value.strip(chars={' '}).split(","):
|
||||
try:
|
||||
breakpoints.add(parseBiggestUInt(point))
|
||||
except ValueError:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"error: invalid breakpoint value '{point}'")
|
||||
quit()
|
||||
when not debugVM:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "VM debugging is off, cannot set breakpoints (recompile with -d:debugVM to fix this)")
|
||||
quit()
|
||||
else:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"unkown long option '{key}'")
|
||||
quit()
|
||||
of cmdShortOption:
|
||||
case key:
|
||||
of "o":
|
||||
output = value
|
||||
of "h":
|
||||
echo HELP_MESSAGE
|
||||
quit()
|
||||
of "v":
|
||||
echo PEON_VERSION_STRING
|
||||
quit()
|
||||
of "s":
|
||||
file = key
|
||||
fromString = true
|
||||
of "n":
|
||||
dump = false
|
||||
of "w":
|
||||
if value.toLowerAscii() in ["yes", "on"]:
|
||||
warnings = @[]
|
||||
elif value.toLowerAscii() in ["no", "off"]:
|
||||
for warning in WarningKind:
|
||||
warnings.add(warning)
|
||||
else:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, "invalid value for option 'w' (valid options are: yes, on, no, off)")
|
||||
quit()
|
||||
of "c":
|
||||
run = false
|
||||
else:
|
||||
stderr.styledWriteLine(fgRed, styleBright, "Error: ", fgDefault, &"unkown short option '{key}'")
|
||||
quit()
|
||||
else:
|
||||
echo "usage: peon [options] [filename.pn]"
|
||||
quit()
|
||||
if file == "":
|
||||
echo "Sorry, the REPL is broken :("
|
||||
# repl(warnings, showMismatches, backend, dump)
|
||||
else:
|
||||
runFile(file, fromString, dump, generateCode, breakpoints, warnings, showMismatches, run, backend, output, cachePath)
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -12,12 +12,15 @@
|
|||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## Utilities to print formatted error messages to stderr
|
||||
import frontend/compiler/compiler
|
||||
## Utilities to format peon exceptions into human-readable error messages
|
||||
## and print them
|
||||
import frontend/compiler/typechecker
|
||||
import frontend/parsing/parser
|
||||
import frontend/parsing/lexer
|
||||
import errors
|
||||
|
||||
export errors
|
||||
|
||||
|
||||
import std/os
|
||||
import std/terminal
|
||||
|
@ -25,36 +28,37 @@ import std/strutils
|
|||
import std/strformat
|
||||
|
||||
|
||||
proc printError(file, line: string, lineNo: int, pos: tuple[start, stop: int], fn: Declaration, msg: string) =
|
||||
## Internal helper to print a formatted error message
|
||||
## to stderr
|
||||
stderr.styledWrite(fgRed, styleBright, "Error in ", fgYellow, &"{file}:{lineNo}:{pos.start}")
|
||||
proc formatError*(outFile = stderr, file, line: string, lineNo: int, pos: tuple[start, stop: int], fn: Declaration, msg: string, includeSource = true) =
|
||||
## Helper to write a formatted error message to the given file object
|
||||
outFile.styledWrite(fgRed, styleBright, "Error in ", fgYellow, &"{file}:{lineNo}:{pos.start}")
|
||||
if not fn.isNil() and fn.kind == funDecl:
|
||||
# Error occurred inside a (named) function
|
||||
stderr.styledWrite(fgRed, styleBright, " in function ", fgYellow, FunDecl(fn).name.token.lexeme)
|
||||
stderr.styledWriteLine(styleBright, fgDefault, ": ", msg)
|
||||
if line.len() > 0:
|
||||
stderr.styledWrite(fgRed, styleBright, "Source line: ", resetStyle, fgDefault, line[0..<pos.start])
|
||||
if pos.stop == line.len():
|
||||
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..<pos.stop])
|
||||
stderr.styledWriteLine(fgDefault, line[pos.stop..^1])
|
||||
outFile.styledWriteLine(styleBright, fgDefault, ": ", msg)
|
||||
if line.len() > 0 and includeSource:
|
||||
# Print the line where the error occurred and underline the exact node that caused
|
||||
# the error. Might be inaccurate, but definitely better than nothing
|
||||
outFile.styledWrite(fgRed, styleBright, "Source line: ", resetStyle, fgDefault, line[0..<pos.start])
|
||||
outFile.styledWrite(fgRed, styleUnderscore, line[pos.start..pos.stop])
|
||||
if pos.stop + 1 <= line.high():
|
||||
outFile.styledWriteLine(fgDefault, line[pos.stop + 1..^1])
|
||||
else:
|
||||
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..pos.stop])
|
||||
stderr.styledWriteLine(fgDefault, line[pos.stop + 1..^1])
|
||||
outFile.styledWriteLine(fgDefault, "")
|
||||
|
||||
|
||||
proc print*(exc: CompileError) =
|
||||
proc print*(exc: TypeCheckError, includeSource = true) =
|
||||
## Prints a formatted error message
|
||||
## for compilation errors to stderr
|
||||
## for type checking errors to stderr
|
||||
var file = exc.file
|
||||
var contents = ""
|
||||
case exc.line:
|
||||
of -1: discard
|
||||
of 0: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line]
|
||||
else: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
printError(file, contents, exc.line, exc.node.getRelativeBoundaries(), exc.function, exc.msg)
|
||||
of 0: contents = exc.instance.getSource().strip(chars={'\n'}).splitLines()[exc.line]
|
||||
else: contents = exc.instance.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
formatError(stderr, file, contents, exc.line, exc.node.getRelativeBoundaries(), exc.function, exc.msg, includeSource)
|
||||
|
||||
|
||||
proc print*(exc: ParseError) =
|
||||
proc print*(exc: ParseError, includeSource = true) =
|
||||
## Prints a formatted error message
|
||||
## for parsing errors to stderr
|
||||
var file = exc.file
|
||||
|
@ -65,10 +69,10 @@ proc print*(exc: ParseError) =
|
|||
contents = exc.parser.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
else:
|
||||
contents = ""
|
||||
printError(file, contents, exc.line, exc.token.relPos, exc.parser.getCurrentFunction(), exc.msg)
|
||||
formatError(stderr, file, contents, exc.line, exc.token.relPos, nil, exc.msg, includeSource)
|
||||
|
||||
|
||||
proc print*(exc: LexingError) =
|
||||
proc print*(exc: LexingError, includeSource = true) =
|
||||
## Prints a formatted error message
|
||||
## for lexing errors to stderr
|
||||
var file = exc.file
|
||||
|
@ -76,8 +80,8 @@ proc print*(exc: LexingError) =
|
|||
file = relativePath(exc.file, getCurrentDir())
|
||||
var contents = ""
|
||||
if exc.line != -1:
|
||||
contents = exc.lexer.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
contents = exc.lexer.getSource().splitLines()[exc.line - 1]
|
||||
else:
|
||||
contents = ""
|
||||
printError(file, contents, exc.line, exc.pos, nil, exc.msg)
|
||||
formatError(stderr, file, contents, exc.line, exc.pos, nil, exc.msg, includeSource)
|
||||
|
||||
|
|
|
@ -1,47 +1,131 @@
|
|||
import ../frontend/parsing/lexer
|
||||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import frontend/parsing/lexer
|
||||
|
||||
|
||||
import std/tables
|
||||
|
||||
export tables
|
||||
|
||||
|
||||
var tokens* = {"{": TokenType.LeftBrace,
|
||||
"}": TokenType.RightBrace,
|
||||
"(": TokenType.LeftParen,
|
||||
")": TokenType.RightParen,
|
||||
"[": TokenType.LeftBracket,
|
||||
"]": TokenType.RightBracket,
|
||||
".": TokenType.Dot,
|
||||
",": TokenType.Comma,
|
||||
":": TokenType.Semicolon,
|
||||
"type": TokenType.Type,
|
||||
"enum": TokenType.Enum,
|
||||
"case": TokenType.Case,
|
||||
"operator": TokenType.Operator,
|
||||
"generator": TokenType.Generator,
|
||||
"fn": TokenType.Function,
|
||||
"coroutine": TokenType.Coroutine,
|
||||
"break": TokenType.Break,
|
||||
"continue": TokenType.Continue,
|
||||
"while": TokenType.While,
|
||||
"for": TokenType.For,
|
||||
"foreach": TokenType.Foreach,
|
||||
"if": TokenType.If,
|
||||
"else": TokenType.Else,
|
||||
"await": TokenType.Await,
|
||||
"assert": TokenType.Assert,
|
||||
"const": TokenType.Const,
|
||||
"let": TokenType.Let,
|
||||
"var": TokenType.Var,
|
||||
"import": TokenType.Import,
|
||||
"yield": TokenType.Yield,
|
||||
"return": TokenType.Return,
|
||||
"object": TokenType.Object,
|
||||
"export": TokenType.Export,
|
||||
"block": TokenType.Block,
|
||||
"switch": TokenType.Switch,
|
||||
"lent": TokenType.Lent,
|
||||
"true": TokenType.True,
|
||||
"false": TokenType.False,
|
||||
"inf": TokenType.Inf,
|
||||
"ptr": TokenType.Ptr,
|
||||
"nan": TokenType.Nan,
|
||||
"inf": TokenType.Inf,
|
||||
}.toTable()
|
||||
|
||||
|
||||
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
|
||||
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
|
||||
">>", "<<"]:
|
||||
tokens[sym] = TokenType.Symbol
|
||||
|
||||
|
||||
proc fillSymbolTable*(tokenizer: Lexer) =
|
||||
## Initializes the Lexer's symbol
|
||||
## table with builtin symbols and
|
||||
## keywords
|
||||
|
||||
# Specialized symbols for which we need a specific token type
|
||||
# for easier handling in the parser (it's nicer to use enum members
|
||||
# rather than strings whenever possible)
|
||||
tokenizer.symbols.addSymbol("{", TokenType.LeftBrace)
|
||||
tokenizer.symbols.addSymbol("}", TokenType.RightBrace)
|
||||
tokenizer.symbols.addSymbol("(", TokenType.LeftParen)
|
||||
tokenizer.symbols.addSymbol(")", TokenType.RightParen)
|
||||
tokenizer.symbols.addSymbol("[", TokenType.LeftBracket)
|
||||
tokenizer.symbols.addSymbol("]", TokenType.RightBracket)
|
||||
tokenizer.symbols.addSymbol(".", TokenType.Dot)
|
||||
tokenizer.symbols.addSymbol(",", TokenType.Comma)
|
||||
tokenizer.symbols.addSymbol(";", TokenType.Semicolon)
|
||||
|
||||
# 1-byte symbols
|
||||
tokenizer.symbols.addSymbol("{", LeftBrace)
|
||||
tokenizer.symbols.addSymbol("}", RightBrace)
|
||||
tokenizer.symbols.addSymbol("(", LeftParen)
|
||||
tokenizer.symbols.addSymbol(")", RightParen)
|
||||
tokenizer.symbols.addSymbol("[", LeftBracket)
|
||||
tokenizer.symbols.addSymbol("]", RightBracket)
|
||||
tokenizer.symbols.addSymbol(".", Dot)
|
||||
tokenizer.symbols.addSymbol(",", Comma)
|
||||
tokenizer.symbols.addSymbol(";", Semicolon)
|
||||
# Keywords
|
||||
# Generic symbols avoid us the need to create a gazillion members of the
|
||||
# TokenType enum. These are also not handled directly in the parser, but
|
||||
# rather processed as classes of operators based on precedence, so using
|
||||
# strings is less of a concern
|
||||
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
|
||||
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
|
||||
">>", "<<"]:
|
||||
tokenizer.symbols.addSymbol(sym, TokenType.Symbol)
|
||||
# Keywords. We differentiate keywords from symbols because they have priority
|
||||
# over the latter, and also because the lexer internally uses the symbol map to do
|
||||
# maximal matching and it's helpful not to increase the amount of substrings we
|
||||
# need to check (especially because keywords match exactly and uniquely, while symbols
|
||||
# can share substrings)
|
||||
tokenizer.symbols.addKeyword("type", TokenType.Type)
|
||||
tokenizer.symbols.addKeyword("enum", Enum)
|
||||
tokenizer.symbols.addKeyword("case", Case)
|
||||
tokenizer.symbols.addKeyword("operator", Operator)
|
||||
tokenizer.symbols.addKeyword("generator", Generator)
|
||||
tokenizer.symbols.addKeyword("enum", TokenType.Enum)
|
||||
tokenizer.symbols.addKeyword("case", TokenType.Case)
|
||||
tokenizer.symbols.addKeyword("operator", TokenType.Operator)
|
||||
tokenizer.symbols.addKeyword("generator", TokenType.Generator)
|
||||
tokenizer.symbols.addKeyword("fn", TokenType.Function)
|
||||
tokenizer.symbols.addKeyword("coroutine", Coroutine)
|
||||
tokenizer.symbols.addKeyword("coroutine", TokenType.Coroutine)
|
||||
tokenizer.symbols.addKeyword("break", TokenType.Break)
|
||||
tokenizer.symbols.addKeyword("continue", Continue)
|
||||
tokenizer.symbols.addKeyword("while", While)
|
||||
tokenizer.symbols.addKeyword("for", For)
|
||||
tokenizer.symbols.addKeyword("foreach", Foreach)
|
||||
tokenizer.symbols.addKeyword("if", If)
|
||||
tokenizer.symbols.addKeyword("else", Else)
|
||||
tokenizer.symbols.addKeyword("continue", TokenType.Continue)
|
||||
tokenizer.symbols.addKeyword("while", TokenType.While)
|
||||
tokenizer.symbols.addKeyword("for", TokenType.For)
|
||||
tokenizer.symbols.addKeyword("foreach", TokenType.Foreach)
|
||||
tokenizer.symbols.addKeyword("if", TokenType.If)
|
||||
tokenizer.symbols.addKeyword("else", TokenType.Else)
|
||||
tokenizer.symbols.addKeyword("await", TokenType.Await)
|
||||
tokenizer.symbols.addKeyword("raise", TokenType.Raise)
|
||||
tokenizer.symbols.addKeyword("assert", TokenType.Assert)
|
||||
tokenizer.symbols.addKeyword("const", Const)
|
||||
tokenizer.symbols.addKeyword("let", Let)
|
||||
tokenizer.symbols.addKeyword("const", TokenType.Const)
|
||||
tokenizer.symbols.addKeyword("let", TokenType.Let)
|
||||
tokenizer.symbols.addKeyword("var", TokenType.Var)
|
||||
tokenizer.symbols.addKeyword("import", Import)
|
||||
tokenizer.symbols.addKeyword("import", TokenType.Import)
|
||||
tokenizer.symbols.addKeyword("yield", TokenType.Yield)
|
||||
tokenizer.symbols.addKeyword("return", TokenType.Return)
|
||||
tokenizer.symbols.addKeyword("object", Object)
|
||||
tokenizer.symbols.addKeyword("export", Export)
|
||||
tokenizer.symbols.addKeyword("object", TokenType.Object)
|
||||
tokenizer.symbols.addKeyword("export", TokenType.Export)
|
||||
tokenizer.symbols.addKeyword("block", TokenType.Block)
|
||||
tokenizer.symbols.addKeyword("switch", TokenType.Switch)
|
||||
tokenizer.symbols.addKeyword("lent", TokenType.Lent)
|
||||
|
@ -50,11 +134,9 @@ proc fillSymbolTable*(tokenizer: Lexer) =
|
|||
# but we don't need to care about that until
|
||||
# we're in the parsing/compilation steps so
|
||||
# it's fine
|
||||
tokenizer.symbols.addKeyword("true", True)
|
||||
tokenizer.symbols.addKeyword("false", False)
|
||||
tokenizer.symbols.addKeyword("true", TokenType.True)
|
||||
tokenizer.symbols.addKeyword("false", TokenType.False)
|
||||
tokenizer.symbols.addKeyword("ref", TokenType.Ref)
|
||||
tokenizer.symbols.addKeyword("ptr", TokenType.Ptr)
|
||||
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
|
||||
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
|
||||
">>", "<<"]:
|
||||
tokenizer.symbols.addSymbol(sym, Symbol)
|
||||
tokenizer.symbols.addKeyword("nan", TokenType.Nan)
|
||||
tokenizer.symbols.addKeyword("inf", TokenType.Inf)
|
||||
|
|
|
@ -0,0 +1,286 @@
|
|||
# Copyright 2024 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## Peon's own custom test suite. Because it's much better to spend a month rolling your
|
||||
## own solution rather than spending 2 hours learning testament. Yeah, I suffer from NIH
|
||||
## syndrome, so?
|
||||
|
||||
import std/strformat
|
||||
import std/strutils
|
||||
import std/sequtils
|
||||
|
||||
import frontend/parsing/lexer
|
||||
import util/symbols
|
||||
|
||||
|
||||
type
|
||||
TestStatus* = enum
|
||||
## Test status enumeration
|
||||
Init, Running, Success,
|
||||
Failed, Crashed,
|
||||
TimedOut, Skipped
|
||||
|
||||
TestKind* = enum
|
||||
## Test type enumeration
|
||||
Tokenizer, Parser, TypeChecker,
|
||||
Runtime
|
||||
|
||||
TestRunner = proc (suite: TestSuite, test: Test)
|
||||
|
||||
# Represents a test outcome. The exc field contains
|
||||
# the exception raised during the test, if any. The
|
||||
# error field indicates whether the test errored out
|
||||
# or not. If exc is non-null and error is false, this
|
||||
# means the error was expected behavior
|
||||
TestOutcome = tuple[error: bool, exc: ref Exception, line: int, location: tuple[start, stop: int]]
|
||||
|
||||
Test* {.inheritable.} = ref object
|
||||
## A generic test object
|
||||
|
||||
skip*: bool # Skip running this test if true
|
||||
name*: string # Test name. Only useful for displaying purposes
|
||||
kind*: TestKind # Test kind (tokenizer, parser, compiler, etc.)
|
||||
source*: string # The source input of the test. Usually peon code
|
||||
status*: TestStatus # The test's current state
|
||||
expected*: TestStatus # The test's expected final state after run()
|
||||
outcome*: TestOutcome # The test's outcome
|
||||
runnerFunc: TestRunner # The test's internal runner function
|
||||
reason*: string # A human readable reason why the test failed
|
||||
|
||||
|
||||
TokenizerTest* = ref object of Test
|
||||
## A tokenization test. Allows to specify
|
||||
## a desired error message and error location
|
||||
## upon tokenization failure
|
||||
message: string
|
||||
location: tuple[start, stop: int]
|
||||
line: int
|
||||
lexer: Lexer
|
||||
tokens: seq[TokenType]
|
||||
|
||||
TestSuite* = ref object
|
||||
## A suite of tests
|
||||
tests*: seq[Test]
|
||||
|
||||
proc `$`(self: tuple[start, stop: int]): string =
|
||||
if self == (-1, -1):
|
||||
result = "none"
|
||||
else:
|
||||
result = &"(start={self.start}, stop={self.stop})"
|
||||
|
||||
|
||||
proc `$`(self: TestOutcome): string =
|
||||
result &= &"Outcome(error={self.error}"
|
||||
if not self.exc.isNil():
|
||||
var name = ($self.exc.name).split(":")[0]
|
||||
result &= &", exc=(name='{name}', msg='{self.exc.msg}')"
|
||||
if self.line != -1:
|
||||
result &= &", line={self.line}"
|
||||
if self.location != (-1, -1):
|
||||
result &= &", location={self.location}"
|
||||
result &= ")"
|
||||
|
||||
|
||||
|
||||
proc `$`*(self: Test): string =
|
||||
case self.kind:
|
||||
of Tokenizer:
|
||||
var self = TokenizerTest(self)
|
||||
return &"TokenizerTest(name='{self.name}', status={self.status}, outcome={self.outcome}, source='{self.source.escape()}', location={self.location}, message='{self.message}')"
|
||||
else:
|
||||
# TODO
|
||||
return ""
|
||||
|
||||
|
||||
proc setup(self: TokenizerTest) =
|
||||
self.lexer = newLexer()
|
||||
self.lexer.fillSymbolTable()
|
||||
|
||||
|
||||
proc tokenizeSucceedsRunner(suite: TestSuite, test: Test) =
|
||||
## Runs a tokenitazion test that is expected to succeed
|
||||
## and checks that it returns the tokens we expect
|
||||
var test = TokenizerTest(test)
|
||||
test.setup()
|
||||
try:
|
||||
let tokens = test.lexer.lex(test.source, test.name)
|
||||
if tokens.len() != test.tokens.len() :
|
||||
test.status = Failed
|
||||
test.reason = &"Number of provided tokens ({test.tokens.len()}) does not match number of returned tokens ({tokens.len()})"
|
||||
return
|
||||
var i = 0
|
||||
for (token, kind) in zip(tokens, test.tokens):
|
||||
if token.kind != kind:
|
||||
test.status = Failed
|
||||
test.reason = &"Token type mismatch at #{i}: expected {kind}, got {token.kind}"
|
||||
return
|
||||
inc(i)
|
||||
except LexingError:
|
||||
var exc = LexingError(getCurrentException())
|
||||
test.outcome.location = exc.pos
|
||||
test.outcome.line = exc.line
|
||||
test.status = Failed
|
||||
test.outcome.error = true
|
||||
test.outcome.exc = getCurrentException()
|
||||
return
|
||||
except CatchableError:
|
||||
test.status = Crashed
|
||||
test.outcome.error = true
|
||||
test.outcome.exc = getCurrentException()
|
||||
return
|
||||
test.status = Success
|
||||
|
||||
|
||||
proc tokenizeFailsRunner(suite: TestSuite, test: Test) =
|
||||
## Runs a tokenitazion test that is expected to fail
|
||||
## and checks that it does so in the way we expect
|
||||
var test = TokenizerTest(test)
|
||||
test.setup()
|
||||
try:
|
||||
discard test.lexer.lex(test.source, test.name)
|
||||
except LexingError:
|
||||
var exc = LexingError(getCurrentException())
|
||||
test.outcome.location = exc.pos
|
||||
test.outcome.line = exc.line
|
||||
if exc.pos == test.location and exc.line == test.line and exc.msg == test.message:
|
||||
test.status = Success
|
||||
else:
|
||||
test.status = Failed
|
||||
test.outcome.error = true
|
||||
test.outcome.exc = getCurrentException()
|
||||
return
|
||||
except CatchableError:
|
||||
test.status = Crashed
|
||||
test.outcome.error = true
|
||||
test.outcome.exc = getCurrentException()
|
||||
return
|
||||
test.status = Failed
|
||||
|
||||
|
||||
proc newTestSuite*: TestSuite =
|
||||
## Creates a new test suite
|
||||
new(result)
|
||||
|
||||
|
||||
proc addTest*(self: TestSuite, test: Test) =
|
||||
## Adds a test to the test suite
|
||||
self.tests.add(test)
|
||||
|
||||
|
||||
proc addTests*(self: TestSuite, tests: openarray[Test]) =
|
||||
## Adds the given tests to the test suite
|
||||
for test in tests:
|
||||
self.addTest(test)
|
||||
|
||||
|
||||
proc removeTest*(self: TestSuite, test: Test) =
|
||||
## Removes the given test from the test suite
|
||||
self.tests.delete(self.tests.find(test))
|
||||
|
||||
|
||||
proc removeTests*(self: TestSuite, tests: openarray[Test]) =
|
||||
## Removes the given tests from the test suite
|
||||
for test in tests:
|
||||
self.removeTest(test)
|
||||
|
||||
|
||||
proc newTokenizeTest(name, source: string, skip = false): TokenizerTest =
|
||||
## Internal helper to initialize a tokenization test
|
||||
new(result)
|
||||
result.name = name
|
||||
result.kind = Tokenizer
|
||||
result.status = Init
|
||||
result.source = source
|
||||
result.skip = skip
|
||||
result.line = -1
|
||||
result.outcome.line = -1
|
||||
result.outcome.location = (-1, -1)
|
||||
result.location = (-1, -1)
|
||||
result.message = ""
|
||||
|
||||
|
||||
proc testTokenizeSucceeds*(name, source: string, tokens: seq[TokenType], skip = false): Test =
|
||||
## Creates a new tokenizer test that is expected to succeed.
|
||||
## The type of each token returned by the tokenizer is matched
|
||||
## against the given list of token types: the test only succeeds
|
||||
## if no discrepancies are found
|
||||
var test = newTokenizeTest(name, source, skip)
|
||||
test.runnerFunc = tokenizeSucceedsRunner
|
||||
test.tokens = tokens
|
||||
result = Test(test)
|
||||
result.expected = Success
|
||||
|
||||
|
||||
proc testTokenizeFails*(name, source: string, message: string, line: int, location: tuple[start, stop: int], skip = false): Test =
|
||||
## Creates a new tokenizer test that is expected to fail with the
|
||||
## given error message and at the given location
|
||||
var test = newTokenizeTest(name, source, skip)
|
||||
test.runnerFunc = tokenizeFailsRunner
|
||||
test.message = message
|
||||
test.location = location
|
||||
test.line = line
|
||||
result = Test(test)
|
||||
result.expected = Failed
|
||||
|
||||
|
||||
proc run*(self: TestSuite) =
|
||||
## Runs the test suite to completion,
|
||||
## sequentially
|
||||
for test in self.tests:
|
||||
if test.skip:
|
||||
test.status = Skipped
|
||||
continue
|
||||
test.runnerFunc(self, test)
|
||||
|
||||
|
||||
proc successful*(self: TestSuite): bool =
|
||||
## Returns whether the test suite completed
|
||||
## successfully or not. If called before run(),
|
||||
## this function returns false. Skipped tests
|
||||
## do not affect the outcome of this function
|
||||
result = true
|
||||
for test in self.tests:
|
||||
if test.status in [Skipped, Success]:
|
||||
continue
|
||||
result = false
|
||||
break
|
||||
|
||||
|
||||
proc getExpectedException(self: TokenizerTest): ref Exception =
|
||||
## Gets the exception that we expect to be
|
||||
## raised by the test. Could be nil if we
|
||||
## expect no errors
|
||||
if self.expected == Success:
|
||||
return nil
|
||||
return LexingError(msg: self.message, line: self.line, file: self.name, lexer: self.lexer, pos: self.location)
|
||||
|
||||
|
||||
|
||||
proc getExpectedOutcome(self: TokenizerTest): TestOutcome =
|
||||
## Gets the expected outcome of a tokenization test
|
||||
if self.expected == Success:
|
||||
return (false, self.getExpectedException(), -1, (-1, -1))
|
||||
else:
|
||||
return (false, self.getExpectedException, self.line, self.location)
|
||||
|
||||
|
||||
proc getExpectedOutcome*(self: Test): TestOutcome =
|
||||
## Returns the expected outcome of a test
|
||||
doAssert self.expected in [Success, Failed], "expected outcome is neither Success nor Failed: wtf?"
|
||||
case self.kind:
|
||||
of Tokenizer:
|
||||
return TokenizerTest(self).getExpectedOutcome()
|
||||
else:
|
||||
# TODO
|
||||
discard
|
|
@ -0,0 +1,52 @@
|
|||
type int64 = object {
|
||||
#pragma[magic: "int64"]
|
||||
}
|
||||
|
||||
# In peon, all objects are "first class", meaning they can be passed around as
|
||||
# values: Just like you can pass around instances of the int64 type (1, 2, etc.),
|
||||
# peon allows you to pass around the int64 type itself; This is awesome for expressiveness,
|
||||
# but it creates a few ambiguities when trying to figure out whether "int64" means "a value
|
||||
# of type int64" or "the type int64 itself": for this reason, generic declarations split their
|
||||
# arguments into two parts, generic values and generic types. Generic types go in between
|
||||
# angle brackets, while generic values go in between square brackets. This means that the type described
|
||||
# below has one generic argument T that is the integer type itself, and another generic argument V that
|
||||
# is a value of type int64. This fixes the ambiguity and keeps the generic instantiation syntax as simple
|
||||
# as possible. This syntax is very useful in cases like the built-in array type: it allows the syntax for it
|
||||
# to be just array[T, N], where T is the type of its elements and N is its size
|
||||
type Test<T: int64>[V: int64] = object {
|
||||
## This structure holds both the int64 type
|
||||
## and a value of type int64
|
||||
typeObj: T;
|
||||
value: V;
|
||||
}
|
||||
|
||||
type Test2<T: Test>[V: Test] = object {
|
||||
## This structure holds both the Test
|
||||
## type and a (concrete) instance of Test
|
||||
typeObj: T;
|
||||
value: V;
|
||||
}
|
||||
|
||||
# Feel free to uncomment these and see how the typechecker reacts (hopefully it fails lol)
|
||||
|
||||
|
||||
Test[int64, 1]; # Works: int64 is a type and 1 is a value of type int64
|
||||
# Test[int64, int64]; # Error: expecting an expression of type int64, got typevar[int64] instead
|
||||
# Test[1, int64]; # Error: expecting an expression of type typevar[int64], got int64 instead
|
||||
Test2[Test, Test[int64, 1]]; # This also works. Nested generic instantiation go brrrr
|
||||
# Test2[Test[int64, 1], Test]; # Error: expecting an expression of type typevar[Test<T: typevar[int64]>[V: int64]], got Test<T: typevar[int64]>[V: int64] instead
|
||||
|
||||
|
||||
# P.S.: You might be wondering "what the hell is a typevar?". Good question!
|
||||
# A typevar is a special built-in type that represents a... type. Yeah, not very
|
||||
# useful hm? Think of it like this: when you declare an object Foo, the object you
|
||||
# get by referencing it is of type typevar[Foo]: this means "Foo is a type". When you
|
||||
# construct an instance, say x, of Foo, its type is just Foo. Typevars are mostly needed
|
||||
# in places where you want to enforce that some value must be a type and not a value. A
|
||||
# name of type typevar[int64 | int32], for example, means "I want this thing to either be
|
||||
# the int32 type or the int64 type, and NOT an instance of them". If you've ever used Python,
|
||||
# you can think of typevar as the "type" class, but on steroids
|
||||
|
||||
# P.P.S: A typevar is a generic type, so usually it wouldn't be possible to use it by itself.
|
||||
# For convenience purposes however, peon allows the use of a bare typevar by replacing it with
|
||||
# typevar[any] (any is another special built-in type that means "anything that has a type")
|
|
@ -0,0 +1,86 @@
|
|||
import util/testing
|
||||
import util/fmterr
|
||||
import util/symbols
|
||||
import frontend/parsing/lexer
|
||||
|
||||
|
||||
import std/strformat
|
||||
|
||||
|
||||
when isMainModule:
|
||||
var suite = newTestSuite()
|
||||
suite.addTests(
|
||||
[
|
||||
testTokenizeSucceeds("emptyFile", "", @[TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("newLine", "\n", @[TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("CarriageReturn", "\r", @[TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("emptyString", "\"\"", @[TokenType.String, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("escapedSingleQuote", "'\\''", @[TokenType.Char, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("escapedDoubleQuote", """ "\"" """, @[TokenType.String, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("bareUnicode", "🌎 😂 👩👩👦👦", @[TokenType.Symbol, TokenType.Symbol, TokenType.Symbol, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("stroppedSingleUnicode", "`🌎` `😂` `👩👩👦👦`", @[TokenType.Identifier, TokenType.Identifier, TokenType.Identifier, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("stroppedMultiUnicode", "`🌎🌎` `😂😂` `👩👩👦👦👩👩👦👦`", @[TokenType.Identifier, TokenType.Identifier, TokenType.Identifier, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("stringWithEscapes", """ "\n\t\r\e\f" """, @[TokenType.String, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("allIntegers", "1 0x1 0o1 0b1", @[TokenType.Integer, TokenType.Hex, TokenType.Octal, TokenType.Binary, TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("sizedNumbers", "1'u8 0x1'i8 0o1'i64 0b1'u32 2.0'f32 1e5'f64 1E5'f32 1.5e4'f64 1.5E4'f32",
|
||||
@[TokenType.Integer, TokenType.Hex, TokenType.Octal, TokenType.Binary,
|
||||
TokenType.Float, TokenType.Float, TokenType.Float, TokenType.Float, TokenType.Float,
|
||||
TokenType.EndOfFile]),
|
||||
testTokenizeSucceeds("allFloats", "1.0 1e5 1E5 1.5e4 1.5E4", @[TokenType.Float, TokenType.Float, TokenType.Float,
|
||||
TokenType.Float, TokenType.Float, TokenType.EndOfFile]),
|
||||
testTokenizeFails("invalidFloatEndsWithDot", "2.", "invalid float number literal", line=1, location=(0, 1)),
|
||||
testTokenizeFails("invalidFloatSpuriousChats", "2.f", "invalid float number literal", line=1, location=(0, 1)),
|
||||
testTokenizeFails("unterminatedChar", "'", "unexpected EOF while parsing character literal", line=1, location=(0, 0)),
|
||||
testTokenizeFails("emptyChar", "''", "character literal cannot be of length zero", line=1, location=(0, 1)),
|
||||
testTokenizeFails("charTooLong", "'ab'", "invalid character literal (length must be one!)", line=1, location=(0, 3)),
|
||||
testTokenizeFails("unterminatedString", "\"", "unexpected EOF while parsing string literal", line=1, location=(0, 0)),
|
||||
testTokenizeFails("unterminatedCharWithExtraContent", "'o;", "unexpected EOF while parsing character literal", line=1, location=(0, 2)),
|
||||
testTokenizeFails("unterminatedStringWithExtraContent", "\"o;", "unexpected EOF while parsing string literal", line=1, location=(0, 2)),
|
||||
testTokenizeFails("unterminatedCharWithNewline", "'\\n;", "unexpected EOF while parsing character literal", line=1, location=(0, 3)),
|
||||
testTokenizeFails("unterminatedStringWithNewline", "\"\\n;", "unexpected EOF while parsing string literal", line=1, location=(0, 3)),
|
||||
testTokenizeFails("illegalTabs", "\t", "tabs are not allowed in peon code, use spaces for indentation instead", line=1, location=(0, 0)),
|
||||
]
|
||||
)
|
||||
var allTokens = ""
|
||||
var allTokensList = newSeqOfCap[TokenType](symbols.tokens.len())
|
||||
for lexeme in symbols.tokens.keys():
|
||||
allTokens.add(&"{lexeme} ")
|
||||
if lexeme == "_":
|
||||
# Due to how the lexer is designed, a bare underscore is
|
||||
# parsed as an identifier rather than a symbol
|
||||
allTokensList.add(TokenType.Identifier)
|
||||
else:
|
||||
allTokensList.add(symbols.tokens[lexeme])
|
||||
allTokensList.add(TokenType.EndOfFile)
|
||||
suite.addTest(testTokenizeSucceeds("allTokens", allTokens, allTokensList))
|
||||
const skippedChars = [';', '\'', '\n', '\\', '\t', '\e', '\a', '\r'];
|
||||
var
|
||||
characters = ""
|
||||
charTokens = newSeqOfCap[TokenType](256)
|
||||
for value in 0..255:
|
||||
charTokens.add(Char)
|
||||
if char(value) in skippedChars:
|
||||
# These cases are special and we handle them separately
|
||||
continue
|
||||
characters.add(&"'{char(value)}'")
|
||||
charTokens.add(TokenType.EndOfFile)
|
||||
characters.add("""';' '\'' '\n' '\\' '\t' '\e' '\a' '\r'""")
|
||||
suite.addTest(testTokenizeSucceeds("allCharacters", characters, charTokens))
|
||||
suite.run()
|
||||
echo "Tokenization test results: "
|
||||
for test in suite.tests:
|
||||
echo &" - {test.name} -> {test.status}"
|
||||
if test.status in [Failed, Crashed]:
|
||||
echo &" Details:"
|
||||
echo &" - Outcome: {test.outcome}"
|
||||
echo &" - Expected state: {test.expected} "
|
||||
echo &" - Expected outcome: {test.getExpectedOutcome()}"
|
||||
echo &"\n The test failed for the following reason: {test.reason}\n"
|
||||
if not test.outcome.exc.isNil():
|
||||
echo &"\n Formatted error message follows\n"
|
||||
print(LexingError(test.outcome.exc))
|
||||
echo "\n Formatted error message ends here\n"
|
||||
if suite.successful():
|
||||
echo "OK: All tokenizer tests were successful"
|
||||
quit(0)
|
||||
quit(-1)
|
Loading…
Reference in New Issue