Initial rework additions

This commit is contained in:
Mattia Giambirtone 2023-07-20 14:32:42 +02:00
parent fe568afc68
commit e6e9b3965c
Signed by: nocturn9x
GPG Key ID: 8270F9F467971E59
20 changed files with 7378 additions and 2 deletions

201
LICENSE Normal file
View File

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

127
README.md
View File

@ -1,3 +1,126 @@
# peon-rewrite
# The peon programming language
Work in progress for Peon 0.2.x
Peon is a modern, multi-paradigm, async-first programming language with a focus on correctness and speed.
[Go to the Manual](docs/manual.md)
## What's peon?
__Note__: For simplicity reasons, the verbs in this section refer to the present even though part of what's described here is not implemented yet.
Peon is a multi-paradigm, statically-typed programming language inspired by C, Nim, Python, Rust and C++: it supports modern, high-level
features such as automatic type inference, parametrically polymorphic generic types, pure functions, closures, interfaces, single inheritance,
reference types, templates, coroutines, raw pointers and exceptions.
The memory management model is rather simple: a Mark and Sweep garbage collector is employed to reclaim unused memory, although more garbage
collection strategies (such as generational GC or deferred reference counting) are planned to be added in the future.
Peon features a native cooperative concurrency model designed to take advantage of the inherent waiting of typical I/O workloads, without the use of more than one OS thread (wherever possible), allowing for much greater efficiency and a smaller memory footprint. The asynchronous model used forces developers to write code that is both easy to reason about, thanks to the [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) model that is core to peon's async event loop implementation, and works as expected every time (without dropping signals, exceptions, or task return values).
Other notable features are the ability to define (and overload) custom operators with ease by implementing them as language-level functions, [Universal function call syntax](https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax), [Name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)) and named scopes.
In peon, all objects are first-class (this includes functions, iterators, closures and coroutines).
## Disclaimers
**Disclaimer 1**: The project is still in its very early days: lots of stuff is not implemented, a work in progress or
otherwise outright broken. Feel free to report bugs!
**Disclaimer 2**: Currently, the `std` module has to be _always_ imported explicitly for even the most basic snippets to work. This is because intrinsic types and builtin operators are defined within it: if it is not imported, peon won't even know how to parse `2 + 2` (and even if it could, it would have no idea what the type of the expression would be). You can have a look at the [peon standard library](src/peon/stdlib) to see how the builtins are defined (be aware that they heavily rely on compiler black magic to work) and can even provide your own implementation if you're so inclined.
### TODO List
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
- User-defined types
- Function calls ✅
- Control flow (if-then-else, switch) ✅
- Looping (while) ✅
- Iteration (foreach)
- Type conversions
- Type casting
- Intrinsics ✅
- Type unions ✅
- Functions ✅
- Closures
- Managed references
- Unmanaged references
- Named scopes/blocks ✅
- Inheritance
- Interfaces
- Generics ✅
- Automatic types ✅
- Iterators/Generators
- Coroutines
- Pragmas ✅
- Attribute resolution ✅
- Universal Function Call Syntax
- Import system ✅
- Exceptions
- Templates (_not_ like C++ templates) ✅
- Optimizations (constant folding, branch and dead code elimination, inlining)
## Feature wishlist
Here's a random list of high-level features I would like peon to have and that I think are kinda neat (some may
have been implemented alredady):
- Reference types are not nullable by default (must use `#pragma[nullable]`)
- The `commutative` pragma, which allows to define just one implementation of an operator
and have it become commutative
- Easy C/Nim interop via FFI
- C/C++ backend
- Nim backend
- [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) (must-have!)
- Simple OOP (with multiple dispatch!)
- RTTI, with methods that dispatch at runtime based on the true (aka runtime) type of a value
- Limited compile-time evaluation (embed the Peon VM in the C/C++/Nim backend and use that to execute peon code at compile time)
## The name
The name for peon comes from [Productive2's](https://git.nocturn9x.space/prod2) genius cute brain, and is a result of shortening
the name of the fastest animal on earth: the **Pe**regrine Falc**on**. I guess I wanted this to mean peon will be blazing fast (I
certainly hope so!)
# Peon needs you.
No, but really. I need help. This project is huge and (IMHO) awesome, but there's a lot of non-trivial work to do and doing
it with other people is just plain more fun and rewarding. If you want to get involved, definitely try [contacting](https://nocturn9x.space/contact) me
or open an issue/PR!
# Credits
- Araq, for creating the amazing language that is [Nim](https://nim-lang.org) (as well as all of its contributors!)
- Guido Van Rossum, aka the chad who created [Python](https://python.org) and its awesome community and resources
- The Nim community and contributors, for making Nim what it is today
- Bob Nystrom, for his amazing [book](https://craftinginterpreters.com) that inspired me
and taught me how to actually make a programming language (kinda, I'm still very dumb)
- [Njsmith](https://vorpus.org/), for his awesome articles on structured concurrency
- All the amazing people in the [r/ProgrammingLanguages](https://reddit.com/r/ProgrammingLanguages) subreddit and its [Discord](https://discord.gg/tuFCPmB7Un) server
- [Art](https://git.nocturn9x.space/art) <3
- Everyone to listened (and still listens to) me ramble about compilers, programming languages and the likes (and for giving me ideas and testing peon!)
- ... More? (I'd thank the contributors but it's just me :P)
- Me! I guess
## Ok, cool, how do I use it?
Great question! If this README somehow didn't turn you away already (thanks, by the way), then you may want to try peon
out for yourself. Fortunately, the process is quite straightforward:
- First, you're gonna have to install [Nim](https://nim-lang.org/), the language peon is written in. I highly recommend
using [choosenim](https://github.com/dom96/choosenim) to manage your Nim installations as it makes switching between them and updating them a breeze
- Then, clone this repository and compile peon in release mode with `nim c -d:release --passC:"-flto" -o:peon src/main`, which should produce`peon` binary
ready for you to play with (if your C toolchain doesn't support LTO then you can just omit the `--passC` option, although that would be pretty weird for
a modern linker)
- If you want to move the executable to a different directory (say, into your `PATH`), you should copy peon's standard
library (found in `/src/peon/stdlib`) into a known folder, edit the `moduleLookupPaths` variable inside `src/config.nim`
by adding said folder to it so that the peon compiler knows where to find modules when you `import std;` and then recompile
peon. Hopefully I will automate this soon, but as of right now the work is all manual
__Note__: On Linux, peon will also look into `~/.local/peon/stdlib` by default, so you can just create the `~/.local/peon` folder and copy `src/peon/stdlib` there

3
docs/.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,3 @@
{
"makefile.extensionOutputFolder": "./.vscode"
}

113
docs/bytecode.md Normal file
View File

@ -0,0 +1,113 @@
# Peon - Bytecode Specification
This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
other file-like objects. Note that the segments in a bytecode dump appear in the order they are listed
in this document.
## Code Structure
A peon program is compiled into a tightly packed sequence of bytes that contain all the necessary information
the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.
A peon bytecode file contains the following:
- Constants
- The program's code
- Debugging information (file and version metadata, module info. Optional)
## File Headers
A peon bytecode file starts with the header, which is structured as follows:
- The literal string `PEON_BYTECODE`
- A 3-byte version number (the major, minor and patch version numbers of the compiler that generated the file)
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
- The commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
## Debug information
The following segments contain extra information and metadata about the compiled bytecode to aid debugging, but they may be missing
in release builds.
### Line data segment
The line data segment contains information about each instruction in the code segment and associates them
1:1 with a line number in the original source file for easier debugging using run-length encoding. The segment's
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
in this segment can be decoded as explained in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L29), which is quoted
below:
```
[...]
## lines maps bytecode instructions to line numbers using Run
## Length Encoding. Instructions are encoded in groups whose structure
## follows the following schema:
## - The first integer represents the line number
## - The second integer represents the number of
## instructions on that line
## For example, if lines equals [1, 5], it means that there are 5 instructions
## at line 1, meaning that all instructions in code[0..4] belong to the same line.
## This is more efficient than using the naive approach, which would encode
## the same line number multiple times and waste considerable amounts of space.
[...]
```
### Functions segment
This segment contains details about each function in the original file. The segment's size is fixed and is encoded at the
beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data in this segment can be decoded as explained
in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L39), which is quoted below:
```
[...]
## functions encodes the following information:
## - Function name
## - Argument count
## - Function boundaries
## The encoding for functions is the following:
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
## - After that follows the argument count as a 1 byte integer
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
## its size as a 2-byte integer
[...]
```
### Modules segment
This segment contains details about the modules that make up the original source code which produced a given bytecode dump.
The data in this segment can be decoded as explained in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L49), which is quoted below:
```
[...]
## modules contains information about all the peon modules that the compiler has encountered,
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
## is contained within a single binary blob. While this simplifies the implementation and makes
## bytecode files entirely "self-hosted", it also means that the original module information is
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
[...]
```
## Constant segment
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
then be loaded by the appropriate `LoadInt32` instruction at runtime. The segment's size is fixed and is encoded at
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant segment may be empty, although in
real-world scenarios it likely won't be.
## Code segment
The code segment contains the linear sequence of bytecode instructions of a peon program to be fed directly to
peon's virtual machine. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
(i.e. a single 24 bit integer). All the instructions are documented [here](../src/frontend/compiler/targgets/bytecode/opcodes.nim)

32
docs/design.md Normal file
View File

@ -0,0 +1,32 @@
# Peon design scratchpad
This is just a random doc I made to keep track of all the design changes I have
in mind for Peon: with this being my first serious attempt at making a programming
language that's actually _useful_, I want to get the design right the first time
(no one wants to make JavaScript 2.0, right? _Right?_).
The basic idea is:
- Some peon code comes in (from a file or as command-line input, doesn't matter)
- It gets tokenized and parsed into a typeless AST
- The compiler processes the typeless AST into a typed one
- The typed AST is passed to an optional optimizer module, which spits
out another (potentially identical) typed AST representing the optimized
program. The optimizer is always run even when optimizations are disabled,
as it takes care of performing closure conversion and other cool stuff
- The typed AST is passed to a code generator module that is specific to every
backend/platform, which actually takes care of producing the code that will
then be executed
The current design is fairly modular and some parts of the codebase are more final
than others: for example, the lexer and parser are more or less complete and unlikely
to undergo massive changes in the future as opposed to the compiler which has been subject
to many major refactoring steps as the project went along, but I digress.
The typed AST format should ideally be serializable to binary files so that I can slot in
different optimizer/code generator modules written in different languages without the need
to use FFI. The format will serve a similar purpose to the IR used by gcc (GIMPLE), but instead
of being an RTL-like language it'll operate on a much higher level since we don't really need to
support any other programming language other than peon itself (while gcc has to be interoperable
with FORTRAN and other stuff).

179
docs/grammar.md Normal file
View File

@ -0,0 +1,179 @@
# Peon - Formal Grammar Specification
__Note__: This document is currently a draft and is therefore incomplete
## Rationale
The purpose of this document is to provide an unambiguous formal specification of peon's syntax for use in automated
compiler generators (known as "compiler compilers") and parsers.
Our grammar is inspired by (and extended from) the Lox language as described in Bob Nystrom's book "Crafting Interpreters",
available at https://craftinginterpreters.com, and follows the EBNF standard, but for clarity the relevant syntax will
be explained below.
## Disclaimer
----------------------------------------------
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119).
Literals in this document will be often surrounded by double quotes to make it obvious they're not part of a sentence. To
avoid ambiguity, this document will always specify explicitly if double quotes need to be considered as part of a term or not,
which means that if it is not otherwise stated they are to be considered part of said term. In addition to quotes, literals
may be formatted in monospace to make them stand out more in the document.
## EBNF Syntax & Formatting rules
----------------------------------------------
As a refresher to experienced users as well as to facilitate reading to newcomers, the variation of EBNF used in this document
is detailed below:
- The literal `"LF"` (without quotes) is a shorthand for "Line Feed". It symbolizes the end of a line and it's platform-independent
- A pair of 2 forward-slashes (character code 47) is used to mark comments. A comment lasts until the
the end of a line is encountered. It is RECOMMENDED to use them to clarify each rule, or a group of rules,
to simplify human inspection of the specification
- The name of non-terminal productions MUST be in lowercase (such as `foo`), while for terminals it MUST be in uppercase (such as `FOO`)
- Whitespaces, tabs, newlines and form feeds (character codes 32, 9, 10 and 12 respectively) are not
relevant to the grammar and MUST be ignored by automated parsers and parser generators
- `"*"` (without quotes, character code 42) is used for repetition of a rule, meaning it MUST match 0 or more times
- `"?"` (without quotes, character code 63) means a rule can match 0 or 1 times
- `"+"` (character code 43) is used for repetition of a rule, meaning it MUST match 1 or more times
- `"|"` (without quotes, character code 123) is used to indicate alternatives and means a rule may match either the first or
the second rule. This operator can be chained to obtain something like `"foo" | "bar" | "baz"`, meaning that either
the literal strings foo, bar or baz are valid matches for the rule
- `"{x,y}"` (without quotes) is used for repetition, meaning a rule MUST match from x to y times (start to end, inclusive).
Omitting x means the rule MUST match at least 0 times and at most x times, while omitting y means the rule
MUST match exactly y times. Omitting both x and y is the same as using `*`
- Production rules are terminated with an ASCII semicolon (`COLON` without quotes, character code 59)
- Rules are listed in descending order: the last rule is the highest-precedence one. Think of it as a "more complex rules
come first"
- An "arrow" (character code 8594) MUST be used to separate rule names from their definition.
A rule definition, then, looks something like this (without quotes): `"name → rule definition here; // optional comment"`
- Literal numbers can be expressed in their decimal form (i.e. with arabic numbers). Other supported formats are
hexadecimal using the prefix `0x`, octal using the prefix `0o`, and binary using the prefix `0b`. For example,
the literals `0x7F`, `0b1111111` and `0o177` all represent the decimal number `127` in hexadecimal, binary and
octal respectively
- The literal `"EOF"` (without quotes), represents the end of the input stream and is a shorthand for "End Of File"
- Ranges can be defined by separating the start and the end of the range with three dots (character code 46) and
are inclusive at both ends. Both the start and the end of the range are mandatory and it is RECOMMENDED that they
be separated by the three dots with a space for ease of reading. Ranges can define numerical sets like in `"0 ... 9"`
(without quotes), or lexicographical ones such as `"'a' ... 'z'"` (without quotes), in which case the range should be
interpreted as a sequence of the character codes between the start and end of the range. Ranges are inclusive at both
ends. It is REQUIRED that the first element in the range is greater or equal to the last one: backwards ranges are illegal.
In addition to this, although numerical ranges can use any combination of the supported number representation
(meaning `'0 ... 0x10'` is a valid range encompassing all decimal numbers from 0 to 16) it is RECOMMENDED that
the representation used is consistent across the start and end of the range. Finally, ranges can have a character
and a number as either start or end of them, in which case the character is to be interpreted as its character code in decimal
- For readability purposes, it is RECOMMENTED that the grammar text be left aligned and that spaces are used between
operators
- Literal strings MUST be delimited by matching pairs of double or single quotes (character code 34 and 39) and SHOULD be separated
by any other term in the grammar by a space
- Characters inside strings can be escaped using backslashes. For example, to add a literal double quote inside a double-quoted string, one MUST
write `"\""` (without quotes), althoguh it is recommended to use single quotes in this case (i.e. `'"'` instead)
## EBNF Grammar
----------------------------------------------
Below you can find the EBNF specification of peon's grammar.
```
// Top-level code
program → declaration* EOF; // An entire program (Note: an empty program *is* a valid program)
// Declarations (rules that bind a name to an object in the current scope and produce no side effects)
// A program is composed by a list of declarations
declaration → funDecl | varDecl | coroDecl | statement;
// Function declarations
funDecl → "fn" function;
coroDecl → "coro" function;
// Constants still count as "variable" declarations in the grammar
varDecl → ("var" | "let" | "const") IDENTIFIER ( "=" expression )? COLON;
// Statements (rules that produce side effects, without binding a name. Well, mostly: import, foreach and others do, but they're exceptions to the rule)
statement → exprStmt | ifStmt | returnStmt| whileStmt| blockStmt; // The set of all statements
// Any expression followed by a semicolon is an expression statement
exprStmt → expression COLON;
// Returns from a function, illegal in top-level code. An empty return statement is illegal
// in non-void functions
returnStmt → "return" expression? COLON;
// Defers the evaluation of the given expression right before a function exits, illegal in top-level code.
// Semantically and functionally equivalent to wrapping a function in a big try block and executing the
// expression in the finally block, but less verbose
deferStmt → "defer" expression COLON;
// Breaks out of a loop or named block
breakStmt → "break" IDENTIFIER? COLON;
// Skips to the next iteration in a loop or jumps to the
// beginning of a named block
continueStmt → "continue" IDENTIFIER? COLON;
importStmt -> ("from" IDENTIFIER)? "import" (IDENTIFIER ("as" IDENTIFIER)? ","?)+ COLON; // Imports one or more modules in the current scope. Creates a namespace
assertStmt → "assert" expression COLON;
yieldStmt → "yield" expression? COLON;
// Pauses the execution of the calling coroutine and calls the given coroutine. Execution continues when the callee returns
awaitStmt → "await" expression COLON;
// Exception handling
tryStmt → "try" "{" statement* "}" (except+ "finally" statement | "finally" statement | "else" statement | except+ "else" statement | except+ "else" statement "finally" statement);
// Blocks create a new scope that lasts until they're closed
blockStmt → "{" declaration* "}";
// Named blocks are useful for breaking out of deeply nested loops
namedBlock → "block" IDENTIFIER "{" declaration* "}";
// If statements are conditional jumps
ifStmt → "if" expression "{" statement* "}" ("else" "{" statement* "}")?;
// While loops run until their condition is true
whileStmt → "while" expression "{" statement* "}";
// For-each loops iterate over a collection type
foreachStmt → "foreach" "(" (IDENTIFIER ":" expression) ")" "{" statement* "}";
// Expressions (rules that produce a value and may have side effects)
// Assignment is the highest-level expression
expression → assignment;
assignment → (call ".")? IDENTIFIER ASSIGNTOKENS assignment | lambdaExpr;
lambdaExpr → "lambda" lambda; // Lambdas are anonymous functions, so they act as expressions
yieldExpr → "yield" expression?; // Empty yield equals yield nil
awaitExpr → "await" expression;
logic_or → logic_and ("and" logic_and)*;
logic_and → equality ("or" equality)*;
equality → comparison (("!=" | "==") comparison)*;
comparison → term ((">" | ">=" | "<" | "<=" | "as" | "is" | "of") term)*;
term → factor (("-" | "+") factor)*; // Precedence for + and - in operations
factor → unary (("/" | "*" | "**" | "^" | "&") unary)*; // All other binary operators have the same precedence
unary → ("!" | "-" | "~") unary | call;
slice → expression "[" expression (":" expression){0,2} "]"
call → primary ("(" arguments? ")" | "." IDENTIFIER)*;
// Below are some collection literals: lists, sets, dictionaries and tuples
listExpr → "[" arguments* "]";
// Note: "{}" is an empty dictionary, NOT an empty set
setExpr → "{" arguments? "}";
dictExpr → "{" (expression ":" expression ("," expression ":" expression)*)* "}"; // {key: value, ...}
tupleExpr → "(" arguments* ")";
primary → "nan" | "true" | "false" | "nil" | "inf" | NUMBER | STRING | IDENTIFIER | "(" expression ")" "." IDENTIFIER;
// Utility rules to avoid repetition
function → IDENTIFIER ("(" parameters? ")")? blockStmt;
lambda → ("(" parameters? ")")? blockStmt;
// ident: type [, ident2: type2, ...]
parameters → IDENTIFIER ":" IDENTIFIER ("," IDENTIFIER)*;
arguments → expression ("," expression)*;
except → ("except" expression? statement)
// These are all the terminals (i.e. productions defined non-recursively)
COMMENT → "#" UNICODE* LF;
COLON → COLON;
SINGLESTRING → QUOTE UNICODE* QUOTE;
DOUBLESTRING → DOUBLEQUOTE UNICODE* DOUBLEQUOTE;
SINGLEMULTI → QUOTE{3} UNICODE* QUOTE{3}; // Single quoted multi-line strings
DOUBLEMULTI → DOUBLEQUOTE{3} UNICODE* DOUBLEQUOTE{3}; // Double quoted multi-line string
DECIMAL → DIGIT+;
FLOAT → DIGIT+ ("." DIGIT+)? (("e" | "E") DIGIT+)?;
BIN → "0b" ("0" | "1")+;
OCT → "0o" ("0" ... "7")+;
HEX → "0x" ("0" ... "9" | "A" ... "F" | "a" ... "f")+;
NUMBER → DECIMAL | FLOAT | BIN | HEX | OCT;
STRING → ("r"|"b"|"f")? SINGLESTRING | DOUBLESTRING | SINGLEMULTI | DOUBLEMULTI;
IDENTIFIER → ALPHA (ALPHA | DIGIT)*; // Valid identifiers are only alphanumeric!
QUOTE → "'";
DOUBLEQUOTE → "\"";
IDENTCHARS → "a" ... "z" | "A" ... "Z" | "_";
UNICODE → 0x00 ... 0x10FFFD; // This covers the whole unicode range
DIGIT → "0" ... "9";
ASSIGNTOKENS → "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "**=" | "//=" | "="
```

301
docs/manual.md Normal file
View File

@ -0,0 +1,301 @@
# Peon - Manual
Peon is a statically typed, garbage-collected, C-like programming language with
a focus on speed and correctness, but whose main feature is the ability to natively
perform highly efficient parallel I/O operations by implementing the [structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/)
paradigm.
__Note__: Peon is currently a WIP (Work In Progress), and much of the content of this manual is purely theoretical as
of now. If you want to help make this into a reality, feel free to contribute!
## Table of contents
- [Manual](#peon---manual)
- [Design Goals](#design-goals)
- [Examples](#peon-by-example)
- [Grammar](grammar.md)
- [Bytecode](bytecode.md)
## Design Goals
While peon is inspired from Bob Nystrom's [book](https://craftinginterpreters.com), where he describes a simple toy language
named Lox, the aspiration for it is to become a programming language that could actually be used in the real world. For that
to happen, we need:
- Exceptions (`try/except/finally`)
- An import system (with namespaces, like Python)
- Multithreading support (with a global VM lock when GC'ing)
- Built-in collections (list, tuple, set, etc.)
- Coroutines (w/ structured concurrency)
- Generators
- Generics
- C/Nim FFI
- A C backend (for native speed)
- A package manager
Peon ~~steals~~ borrows many ideas from Python, Nim (the the language peon itself is written in), C and many others.
## Peon by Example
Here follow a few examples of peon code to make it clear what the end product should look like. Note that
not all examples represent working functionality and some of these examples might not be up to date either.
For somewhat more updated code snippets, check the [tests](../tests/) directory.
### Variable declarations
```
var x = 5; # Inferred type is int64
var y = 3'u16; # Type is specified as uint16
x = 6; # Works: type matches
x = 3.0; # Error: Cannot assign float64 to x
var x = 3.14; # Error: cannot re-declare x
const z = 6.28; # Constant declaration
let a = "hi!"; # Cannot be reassigned/mutated
var b: int32 = 5; # Explicit type declaration (TODO)
```
__Note__: Peon supports [name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)), meaning
that almost any ASCII sequence of characters can be used as an identifier, including language
keywords, but stropped names need to be enclosed by matching pairs of backticks (`\``)
### Comments
```
# This is a single-line comment
# Peon has no specific syntax for multi-line comments.
fn id[T: any](x: T): T {
## Documentation comments start
## with two dashes. They are currently
## unused, but will be semantically
## relevant in the future. They can
## be used to document types, modules
## and functions
return x;
}
```
### Functions
```
fn fib(n: int): int {
if n < 3 {
return n;
}
return fib(n - 1) + fib(n - 2);
}
fib(30);
```
### Type declarations (TODO)
```
type Foo = object { # Can also be "ref object" for reference types (managed automatically)
fieldOne*: int # Asterisk means the field is public outside the current module
fieldTwo*: int
}
```
### Enumeration types (TODO)
```
type SomeEnum = enum { # Can be mapped to an integer
KindOne,
KindTwo
}
```
### Operator overloading
```
operator `+`(a, b: Foo): Foo {
return Foo(fieldOne: a.fieldOne + b.fieldOne, fieldTwo: a.fieldTwo + b.fieldTwo);
}
Foo(fieldOne: 1, fieldTwo: 3) + Foo(fieldOne: 2, fieldTwo: 3); # Foo(fieldOne: 3, fieldTwo: 6)
```
__Note__: Custom operators (e.g. `foo`) can also be defined. The backticks around the plus sign serve to mark it
as an identifier instead of a symbol (which is a requirement for function names, since operators are basically
functions in peon). In fact, even the built-in peon operators are implemented partially in peon (actually, just
their stubs are) and they are then specialized in the compiler to get rid of unnecessary function call overhead.
### Function calls
```
foo(1, 2 + 3, 3.14, bar(baz));
```
__Note__: Operators can be called as functions; If their name is a symbol, just wrap it in backticks like so:
```
`+`(1, 2) # Identical to 1 + 2
```
__Note__: Code the likes of `a.b()` is (actually, will be) desugared to `b(a)` if there exists a function
`b` whose signature is compatible with the value of `a` (assuming `a` doesn't have a field named `b`,
in which case the attribute resolution takes precedence)
### Generics
```
fn genericSum[T: Number](a, b: T): T { # Note: "a, b: T" means that both a and b are of type T
return a + b;
}
# This allows for a single implementation to be
# re-used multiple times without any code duplication
genericSum(1, 2);
genericSum(3.14, 0.1);
genericSum(1'u8, 250'u8);
```
__Note__: Peon generics are implemented according to a paradigm called [parametric polymorphism](https://en.wikipedia.org/wiki/Parametric_polymorphism). In constrast to the model employed by other languages such as C++, called [ad hoc polymorphism](https://en.wikipedia.org/wiki/Ad_hoc_polymorphism),
where each time a generic function is called with a new type signature it is instantiated and
typechecked (and then compiled), peon checks generics at declaration time and only once: this
not only saves precious compilation time, but it also allows the compiler to generate a single
implementation for the function (although this is not a requirement) and catches type errors right
when they occur even if the function is never called, rather than having to wait for the function
to be called and specialized. Unfortunately, this means that some of the things that are possible
in, say, C++ templates are just not possible with peon generics. As an example, take this code snippet:
```
fn add[T: any](a, b: T): T {
return a + b;
}
```
While the intent of this code is clear and makes sense semantically speaking, peon will refuse
to compile it because it cannot prove that the `+` operator is defined on every type (in fact,
it's only defined for numbers): this is a feature. If peon allowed it, `any` could be used to
escape the safety of the type system (for example, calling `add` with `string`s, which may or
may not be what you want).
Since the goal for peon is to not constrain the developer into one specific programming paradigm,
it also implements a secondary, different, generic mechanism using the `auto` type. The above code
could be rewritten to work as follows:
```
fn add(a, b: auto): auto {
return a + b;
}
```
When using automatic types, peon will behave similarly to C++ (think: templates) and only specialize,
typecheck and compile the function once it is called with a given type signature. For this reason,
automatic and parametrically polymorphic types cannot be used together in peon code.
Another noteworthy concept to keep in mind is that of type unions. For example, take this snippet:
```
fn foo(x: int32): int32 {
return x;
}
fn foo(x: int): int {
return x;
}
fn identity[T: int | int32](x: T): T {
return foo(x);
}
```
This code will, again, fail to compile: this is because as far as peon is concerned, `foo` is not
defined for both `int` and `int32` _at the same time_. In order for that to work, `foo` would need
to be rewritten with `T: int32 | int` as its generic argument type in order to avoid the ambiguity
(or `identity` could be rewritten to use automatic types instead, both are viable options). Obviously,
the above snippet would fail to compile if `foo` were not defined for all the types specified in the
type constraint for `identity` as well (this is because, counterintuitively, matching a generic constraint
such as `int32 | int` does _not_ mean "either of these types", but rather "_both_ of these types at
once").
#### More generics
```
fn genericSth[T: someTyp, K: someTyp2](a: T, b: K) { # Note: no return type == void function
# code...
}
genericSth(1, 3.0);
```
#### Even more generics
```
type Box*[T: Number] = object {
num: T;
}
var boxFloat = Box[float](1.0);
var boxInt = Box[int](1);
```
__Note__: The `*` modifier to make a name visible outside the current module must be put
__before__ the generic constraints, so only `fn foo*[T](a: T) {}` is the correct syntax.
### Forward declarations
```
fn someF: int; # Semicolon and no body == forward declaration
print(someF()); # Prints 42
fn someF: int {
return 42;
}
```
__Note__: A function that is forward-declared __must__ be implemented in the same module as
the forward declaration.
### Generators
```
generator count(n: int): int {
while n > 0 {
yield n;
n -= 1;
}
}
foreach n in count(10) {
print(n);
}
```
### Coroutines
```
import concur;
import http;
coroutine req(url: string): string {
return (await http.AsyncClient().get(url)).content;
}
coroutine main(urls: list[string]) {
pool = concur.pool(); # Creates a task pool: like a nursery in njsmith's article
foreach url in urls {
pool.spawn(req, urls);
}
# The pool has internal machinery that makes the parent
# task wait until all child exit! When this function
# returns, ALL child tasks will have exited somehow.
# Exceptions and return values propagate neatly, too.
}
concur.run(main, newList[string]("https://google.com", "https://debian.org"))
```

1
nim.cfg Normal file
View File

@ -0,0 +1 @@
path="src"

1098
src/backend/bytecode/vm.nim Normal file

File diff suppressed because it is too large Load Diff

81
src/config.nim Normal file
View File

@ -0,0 +1,81 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import std/strformat
import std/os
# These variables can be tweaked to debug and test various components of the toolchain
var debugLexer* = false # Print the tokenizer's output
var debugParser* = false # Print the AST generated by the parser
var debugCompiler* = false # Disassemble and/or print the code generated by the compiler
const debugVM* {.booldefine.} = false # Enable the runtime debugger in the bytecode VM
const debugGC* {.booldefine.} = false # Debug the Garbage Collector (extremely verbose)
const debugAlloc* {.booldefine.} = false # Trace object allocation (extremely verbose)
const debugMem* {.booldefine.} = false # Debug the memory allocator (extremely verbose)
var debugSerializer* = false # Validate the bytecode serializer's output
const debugStressGC* {.booldefine.} = false # Make the GC run a collection at every allocation (VERY SLOW!)
const debugMarkGC* {.booldefine.} = false # Trace the marking phase object by object (extremely verbose)
const PeonBytecodeMarker* = "PEON_BYTECODE" # Magic value at the beginning of bytecode files
const HeapGrowFactor* = 2 # The growth factor used by the GC to schedule the next collection
const FirstGC* = 1024 * 1024; # How many bytes to allocate before running the first GC
const enableVMChecks* {.booldefine.} = true; # Enables all types of compiler (nim-wise) checks in the VM
# List of paths where peon looks for modules, in order (empty path means current directory, which always takes precedence)
const moduleLookupPaths*: seq[string] = @["", "src/peon/stdlib", absolutePath(joinPath(".local", "peon", "stdlib"), getenv("HOME"))]
when HeapGrowFactor <= 1:
{.fatal: "Heap growth factor must be > 1".}
const PeonVersion* = (major: 0, minor: 1, patch: 0)
const PeonRelease* = "alpha"
const PeonCommitHash* = staticExec("git rev-parse HEAD")
const PeonBranch* = staticExec("git symbolic-ref HEAD 2>/dev/null | cut -f 3 -d /")
const PeonVersionString* = &"Peon {PeonVersion.major}.{PeonVersion.minor}.{PeonVersion.patch} {PeonRelease} ({PeonBranch}, {CompileDate}, {CompileTime}, {PeonCommitHash[0..PeonCommitHash.high() mod 8]}) [Nim {NimVersion}] on {hostOS} ({hostCPU})"
const HelpMessage* = """The peon programming language, Copyright (C) 2023 Mattia Giambirtone & All Contributors
This program is free software, see the license distributed with this program or check
http://www.apache.org/licenses/LICENSE-2.0 for more info.
Basic Usage
-----------
$ peon file.pn Run the given Peon source file
$ peon file.pbc Run the given Peon bytecode file
Options
-------
-h, --help Show this help text and exit
-v, --version Print the current peon version and exit
-s, --string Execute the passed string as if it was a file
-n, --noDump Don't dump the result of compilation to a file.
Note that no dump is created when using -s/--string
-b, --breakpoints Run the debugger at specific bytecode offsets (comma-separated).
Only available with --target:bytecode and when compiled with VM
debugging on (-d:debugVM at build time)
-d, --disassemble Disassemble the output of compilation (only makes sense with --target:bytecode)
-m, --mode Set the compilation mode. Acceptable values are 'debug' and
'release'. Defaults to 'debug'
-c, --compile Compile the code, but do not execute it. Useful along with -d
-w, --warnings Turn warnings on or off (default: on). Acceptable values are
yes/on and no/off
--noWarn Disable a specific warning (for example, --noWarn:unusedVariable)
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
--target Select the compilation target (valid values are: 'c' and 'bytecode'). Defaults to
'bytecode'
-o, --output Rename the output file with this value (with --target:bytecode, a '.pbc' extension
is added if not already present)
--debug-dump Debug the bytecode serializer. Only makes sense with --target:bytecode
--debug-lexer Show the lexer's output
--debug-parser Show the parser's output
"""

21
src/errors.nim Normal file
View File

@ -0,0 +1,21 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
type
PeonException* = ref object of CatchableError
## A Nim exception for a generic internal
## peon failure (not to be used directly)
file*: string # The file where the error occurred
line*: int # The line where the error occurred

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,71 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## The code generator for translating peon to C code
import std/tables
import std/strformat
import std/algorithm
import std/parseutils
import std/strutils
import std/sequtils
import std/sets
import std/os
import frontend/compiler/compiler
import frontend/parsing/lexer
import frontend/parsing/parser
import frontend/parsing/ast
type
CompilerFunc = object
## An internal compiler function called
## by pragmas
kind: PragmaKind
handler: proc (self: NativeCCompiler, pragma: Pragma, name: Name)
NativeCCompiler* = ref object of Compiler
## The peon to C compiler
# Compiler procedures called by pragmas
compilerProcs: TableRef[string, CompilerFunc]
proc newNativeCCompiler*(replMode: bool = false): NativeCCompiler =
## Initializes a new, blank, NativeCCompiler
## object
new(result)
result.ast = @[]
result.current = 0
result.file = ""
result.names = @[]
result.depth = 0
result.lines = @[]
result.currentFunction = nil
result.replMode = replMode
result.currentModule = nil
result.compilerProcs = newTable[string, CompilerFunc]()
result.source = ""
result.lexer = newLexer()
result.lexer.fillSymbolTable()
result.parser = newParser()
result.isMainModule = false
result.disabledWarnings = @[]
method literal*(self: Compiler, node: ASTNode, compile: bool = true): Type {.discardable.} =
## Compiles literal expressions

View File

@ -0,0 +1,849 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## An Abstract Syntax Tree (AST) structure for our recursive-descent
## top-down parser. For more info, check out docs/grammar.md
import std/strformat
import std/strutils
import token
export token
type
NodeKind* = enum
## Enumeration of the AST
## node types, sorted by
## precedence
# Declarations
typeDecl = 0'u8
funDecl,
varDecl,
# Statements
ifStmt,
returnStmt,
breakStmt,
continueStmt,
whileStmt,
forEachStmt,
blockStmt,
namedBlockStmt,
raiseStmt,
assertStmt,
tryStmt,
yieldStmt,
awaitStmt,
importStmt,
exportStmt,
deferStmt,
# An expression followed by a semicolon
exprStmt,
# Expressions
assignExpr,
lambdaExpr,
awaitExpr,
yieldExpr,
setItemExpr, # Set expressions like a.b = "c"
binaryExpr,
unaryExpr,
sliceExpr,
callExpr,
getItemExpr, # Get expressions like a.b
# Primary expressions
groupingExpr, # Parenthesized expressions such as (true) and (3 + 4)
trueExpr,
falseExpr,
strExpr,
charExpr,
intExpr,
floatExpr,
hexExpr,
octExpr,
binExpr,
identExpr, # Identifier
pragmaExpr,
refExpr,
ptrExpr,
genericExpr,
switchStmt,
lentExpr
# Here I would've rather used object variants, and in fact that's what was in
# place before, but not being able to re-declare a field of the same type in
# another case branch is kind of a deal breaker long-term, so until that is
# fixed (check out https://github.com/nim-lang/RFCs/issues/368 for more info).
# I'll stick to using inheritance instead
# Generic AST node types
ASTNode* = ref object of RootObj
## An AST node
kind*: NodeKind
# Regardless of the type of node, we keep the token in the AST node for internal usage.
# This is not shown when the node is printed, but makes it a heck of a lot easier to report
# errors accurately even deep in the compilation pipeline
token*: Token
file*: string
# This weird inheritance chain is needed for the parser to
# work properly
Declaration* = ref object of ASTNode
## A declaration
isPrivate*: bool
pragmas*: seq[Pragma]
generics*: seq[tuple[name: IdentExpr, cond: Expression]]
Statement* = ref object of Declaration
## A statement
Expression* = ref object of Statement
## An expression
LiteralExpr* = ref object of Expression
# Using a string for literals makes it much easier to handle numeric types, as
# there is no overflow nor underflow or float precision issues during parsing.
# Numbers are just serialized as strings and then converted back to numbers
# before being passed to the VM, which also keeps the door open in the future
# to implementing bignum arithmetic that can take advantage of natively supported
# machine types, meaning that if a numeric type fits into a 64 bit signed/unsigned
# int then it is stored in such a type to save space, otherwise it is just converted
# to a bigint. Bigfloats with arbitrary-precision arithmetic would also be nice,
# although arguably less useful (and probably significantly slower than bigints)
literal*: Token
IntExpr* = ref object of LiteralExpr
OctExpr* = ref object of LiteralExpr
HexExpr* = ref object of LiteralExpr
BinExpr* = ref object of LiteralExpr
FloatExpr* = ref object of LiteralExpr
StrExpr* = ref object of LiteralExpr
CharExpr* = ref object of LiteralExpr
TrueExpr* = ref object of LiteralExpr
FalseExpr* = ref object of LiteralExpr
IdentExpr* = ref object of Expression
name*: Token
depth*: int
GroupingExpr* = ref object of Expression
expression*: Expression
GetItemExpr* = ref object of Expression
obj*: Expression
name*: IdentExpr
SetItemExpr* = ref object of GetItemExpr
# Since a setItem expression is just
# a getItem one followed by an assignment,
# inheriting it from getItem makes sense
value*: Expression
CallExpr* = ref object of Expression
callee*: Expression # The object being called
arguments*: tuple[positionals: seq[Expression], keyword: seq[tuple[
name: IdentExpr, value: Expression]]]
closeParen*: Token # Needed for error reporting
GenericExpr* = ref object of Expression
ident*: IdentExpr
args*: seq[Expression]
UnaryExpr* = ref object of Expression
operator*: Token
a*: Expression
BinaryExpr* = ref object of UnaryExpr
# Binary expressions can be seen here as unary
# expressions with an extra operand so we just
# inherit from that and add a second operand
b*: Expression
YieldExpr* = ref object of Expression
expression*: Expression
AwaitExpr* = ref object of Expression
expression*: Expression
LambdaExpr* = ref object of Expression
body*: Statement
arguments*: seq[tuple[name: IdentExpr, valueType: Expression]]
defaults*: seq[Expression]
isGenerator*: bool
isAsync*: bool
isPure*: bool
returnType*: Expression
depth*: int
SliceExpr* = ref object of Expression
expression*: Expression
ends*: seq[Expression]
AssignExpr* = ref object of Expression
name*: IdentExpr
value*: Expression
ExprStmt* = ref object of Statement
expression*: Expression
ImportStmt* = ref object of Statement
moduleName*: IdentExpr
ExportStmt* = ref object of Statement
name*: IdentExpr
AssertStmt* = ref object of Statement
expression*: Expression
RaiseStmt* = ref object of Statement
exception*: Expression
BlockStmt* = ref object of Statement
code*: seq[Declaration]
NamedBlockStmt* = ref object of BlockStmt
name*: IdentExpr
ForStmt* = ref object of Statement
discard # Unused
ForEachStmt* = ref object of Statement
identifier*: IdentExpr
expression*: Expression
body*: Statement
WhileStmt* = ref object of Statement
condition*: Expression
body*: BlockStmt
AwaitStmt* = ref object of Statement
expression*: Expression
BreakStmt* = ref object of Statement
label*: IdentExpr
ContinueStmt* = ref object of Statement
label*: IdentExpr
ReturnStmt* = ref object of Statement
value*: Expression
IfStmt* = ref object of Statement
condition*: Expression
thenBranch*: Statement
elseBranch*: Statement
YieldStmt* = ref object of Statement
expression*: Expression
VarDecl* = ref object of Declaration
name*: IdentExpr
value*: Expression
isConst*: bool
isLet*: bool
valueType*: Expression
FunDecl* = ref object of Declaration
name*: IdentExpr
body*: Statement
arguments*: seq[tuple[name: IdentExpr, valueType: Expression]]
defaults*: seq[Expression]
isAsync*: bool
isGenerator*: bool
isPure*: bool
returnType*: Expression
depth*: int
TypeDecl* = ref object of Declaration
name*: IdentExpr
# Empty if type is an enum
fields*: seq[tuple[name: IdentExpr, valueType: Expression, isPrivate: bool]]
# Empty if type is a structure
members*: seq[TypeDecl]
isEnum*: bool
isRef*: bool
parent*: Expression
value*: Expression
Pragma* = ref object of Expression
name*: IdentExpr
args*: seq[LiteralExpr]
Var* = ref object of Expression
value*: Expression
Ref* = ref object of Expression
value*: Expression
Ptr* = ref object of Expression
value*: Expression
Lent* = ref object of Expression
value*: Expression
SwitchStmt* = ref object of Statement
switch*: Expression
branches*: seq[tuple[cond: Expression, body: BlockStmt]]
default*: BlockStmt
proc isConst*(self: ASTNode): bool =
## Returns true if the given
## AST node represents a value
## of constant type. All integers,
## strings and singletons count as
## constants
case self.kind:
of intExpr, hexExpr, binExpr, octExpr, strExpr, falseExpr, trueExpr,
floatExpr:
return true
else:
return false
## AST node constructors
proc newASTNode*(kind: NodeKind, token: Token): ASTNode =
## Initializes a new generic ASTNode object
new(result)
result.kind = kind
result.token = token
proc newPragma*(name: IdentExpr, args: seq[LiteralExpr]): Pragma =
new(result)
result.kind = pragmaExpr
result.args = args
result.name = name
result.token = name.token
proc newRefExpr*(expression: Expression, token: Token): Ref =
new(result)
result.kind = refExpr
result.value = expression
result.token = token
proc newPtrExpr*(expression: Expression, token: Token): Ptr =
new(result)
result.kind = ptrExpr
result.value = expression
result.token = token
proc newLentExpr*(expression: Expression, token: Token): Lent =
new(result)
result.kind = lentExpr
result.value = expression
result.token = token
proc newSwitchStmt*(switch: Expression, branches: seq[tuple[cond: Expression, body: BlockStmt]], default: BlockStmt, token: Token): SwitchStmt =
new(result)
result.kind = switchStmt
result.switch = switch
result.branches = branches
result.token = token
result.default = default
proc newIntExpr*(literal: Token): IntExpr =
result = IntExpr(kind: intExpr)
result.literal = literal
result.token = literal
proc newOctExpr*(literal: Token): OctExpr =
result = OctExpr(kind: octExpr)
result.literal = literal
result.token = literal
proc newHexExpr*(literal: Token): HexExpr =
result = HexExpr(kind: hexExpr)
result.literal = literal
result.token = literal
proc newBinExpr*(literal: Token): BinExpr =
result = BinExpr(kind: binExpr)
result.literal = literal
result.token = literal
proc newFloatExpr*(literal: Token): FloatExpr =
result = FloatExpr(kind: floatExpr)
result.literal = literal
result.token = literal
proc newTrueExpr*(token: Token): LiteralExpr = LiteralExpr(kind: trueExpr,
token: token, literal: token)
proc newFalseExpr*(token: Token): LiteralExpr = LiteralExpr(kind: falseExpr,
token: token, literal: token)
proc newStrExpr*(literal: Token): StrExpr =
result = StrExpr(kind: strExpr)
result.literal = literal
result.token = literal
proc newCharExpr*(literal: Token): CharExpr =
result = CharExpr(kind: charExpr)
result.literal = literal
result.token = literal
proc newIdentExpr*(name: Token, depth: int = 0): IdentExpr =
result = IdentExpr(kind: identExpr)
result.name = name
result.token = name
result.depth = depth
proc newGroupingExpr*(expression: Expression, token: Token): GroupingExpr =
result = GroupingExpr(kind: groupingExpr)
result.expression = expression
result.token = token
proc newLambdaExpr*(arguments: seq[tuple[name: IdentExpr, valueType: Expression]], defaults: seq[Expression],
body: Statement, isAsync, isGenerator: bool,
token: Token, depth: int, pragmas: seq[Pragma] = @[],
returnType: Expression, generics: seq[tuple[name: IdentExpr, cond: Expression]] = @[]): LambdaExpr =
result = LambdaExpr(kind: lambdaExpr)
result.body = body
result.arguments = arguments
result.defaults = defaults
result.isGenerator = isGenerator
result.isAsync = isAsync
result.token = token
result.returnType = returnType
result.isPure = false
result.pragmas = pragmas
result.generics = generics
result.depth = depth
proc newGetItemExpr*(obj: Expression, name: IdentExpr,
token: Token): GetItemExpr =
result = GetItemExpr(kind: getItemExpr)
result.obj = obj
result.name = name
result.token = token
proc newSetItemExpr*(obj: Expression, name: IdentExpr, value: Expression,
token: Token): SetItemExpr =
result = SetItemExpr(kind: setItemExpr)
result.obj = obj
result.name = name
result.value = value
result.token = token
proc newCallExpr*(callee: Expression, arguments: tuple[positionals: seq[
Expression], keyword: seq[tuple[name: IdentExpr, value: Expression]]],
token: Token): CallExpr =
result = CallExpr(kind: callExpr)
result.callee = callee
result.arguments = arguments
result.token = token
proc newGenericExpr*(ident: IdentExpr, args: seq[Expression]): GenericExpr =
result = GenericExpr(kind: genericExpr)
result.ident = ident
result.args = args
result.token = ident.token
proc newSliceExpr*(expression: Expression, ends: seq[Expression], token: Token): SliceExpr =
result = SliceExpr(kind: sliceExpr)
result.expression = expression
result.ends = ends
result.token = token
proc newUnaryExpr*(operator: Token, a: Expression): UnaryExpr =
result = UnaryExpr(kind: unaryExpr)
result.operator = operator
result.a = a
result.token = result.operator
proc newBinaryExpr*(a: Expression, operator: Token, b: Expression): BinaryExpr =
result = BinaryExpr(kind: binaryExpr)
result.operator = operator
result.a = a
result.b = b
result.token = operator
proc newYieldExpr*(expression: Expression, token: Token): YieldExpr =
result = YieldExpr(kind: yieldExpr)
result.expression = expression
result.token = token
proc newAssignExpr*(name: IdentExpr, value: Expression,
token: Token): AssignExpr =
result = AssignExpr(kind: assignExpr)
result.name = name
result.value = value
result.token = token
proc newAwaitExpr*(expression: Expression, token: Token): AwaitExpr =
result = AwaitExpr(kind: awaitExpr)
result.expression = expression
result.token = token
proc newExprStmt*(expression: Expression, token: Token): ExprStmt =
result = ExprStmt(kind: exprStmt)
result.expression = expression
result.token = token
proc newImportStmt*(moduleName: IdentExpr, token: Token): ImportStmt =
result = ImportStmt(kind: importStmt)
result.moduleName = moduleName
result.token = token
proc newExportStmt*(name: IdentExpr, token: Token): ExportStmt =
result = ExportStmt(kind: exportStmt)
result.name = name
result.token = token
proc newYieldStmt*(expression: Expression, token: Token): YieldStmt =
result = YieldStmt(kind: yieldStmt)
result.expression = expression
result.token = token
proc newAwaitStmt*(expression: Expression, token: Token): AwaitStmt =
result = AwaitStmt(kind: awaitStmt)
result.expression = expression
result.token = token
proc newAssertStmt*(expression: Expression, token: Token): AssertStmt =
result = AssertStmt(kind: assertStmt)
result.expression = expression
result.token = token
proc newRaiseStmt*(exception: Expression, token: Token): RaiseStmt =
result = RaiseStmt(kind: raiseStmt)
result.exception = exception
result.token = token
proc newBlockStmt*(code: seq[Declaration], token: Token): BlockStmt =
result = BlockStmt(kind: blockStmt)
result.code = code
result.token = token
proc newNamedBlockStmt*(code: seq[Declaration], name: IdentExpr, token: Token): NamedBlockStmt =
result = NamedBlockStmt(kind: namedBlockStmt)
result.code = code
result.token = token
result.name = name
proc newWhileStmt*(condition: Expression, body: BlockStmt,
token: Token): WhileStmt =
result = WhileStmt(kind: whileStmt)
result.condition = condition
result.body = body
result.token = token
proc newForEachStmt*(identifier: IdentExpr, expression: Expression,
body: Statement, token: Token): ForEachStmt =
result = ForEachStmt(kind: forEachStmt)
result.identifier = identifier
result.expression = expression
result.body = body
result.token = token
proc newBreakStmt*(token: Token, label: IdentExpr = nil): BreakStmt =
result = BreakStmt(kind: breakStmt)
result.token = token
result.label = label
proc newContinueStmt*(token: Token, label: IdentExpr = nil): ContinueStmt =
result = ContinueStmt(kind: continueStmt)
result.token = token
result.label = label
proc newReturnStmt*(value: Expression, token: Token): ReturnStmt =
result = ReturnStmt(kind: returnStmt)
result.value = value
result.token = token
proc newIfStmt*(condition: Expression, thenBranch, elseBranch: Statement,
token: Token): IfStmt =
result = IfStmt(kind: ifStmt)
result.condition = condition
result.thenBranch = thenBranch
result.elseBranch = elseBranch
result.token = token
proc newVarDecl*(name: IdentExpr, value: Expression, isConst: bool = false,
isPrivate: bool = true, token: Token, isLet: bool = false,
valueType: Expression, pragmas: seq[Pragma]): VarDecl =
result = VarDecl(kind: varDecl)
result.name = name
result.value = value
result.isConst = isConst
result.isPrivate = isPrivate
result.token = token
result.isLet = isLet
result.valueType = valueType
result.pragmas = pragmas
proc newFunDecl*(name: IdentExpr, arguments: seq[tuple[name: IdentExpr, valueType: Expression]], defaults: seq[Expression],
body: Statement, isAsync, isGenerator: bool,
isPrivate: bool, token: Token, depth: int,
pragmas: seq[Pragma] = @[], returnType: Expression,
generics: seq[tuple[name: IdentExpr, cond: Expression]] = @[]): FunDecl =
result = FunDecl(kind: funDecl)
result.name = name
result.arguments = arguments
result.defaults = defaults
result.body = body
result.isAsync = isAsync
result.isGenerator = isGenerator
result.isPrivate = isPrivate
result.token = token
result.pragmas = pragmas
result.returnType = returnType
result.isPure = false
result.generics = generics
result.depth = depth
proc newTypeDecl*(name: IdentExpr, fields: seq[tuple[name: IdentExpr, valueType: Expression, isPrivate: bool]],
defaults: seq[Expression], isPrivate: bool, token: Token, pragmas: seq[Pragma],
generics: seq[tuple[name: IdentExpr, cond: Expression]], parent: IdentExpr, isEnum: bool, isRef: bool): TypeDecl =
result = TypeDecl(kind: typeDecl)
result.name = name
result.fields = fields
result.isPrivate = isPrivate
result.token = token
result.pragmas = pragmas
result.generics = generics
result.parent = parent
result.isEnum = isEnum
result.isRef = isRef
result.members = @[]
proc `$`*(self: ASTNode): string =
if self.isNil():
return "nil"
case self.kind:
of intExpr, floatExpr, hexExpr, binExpr, octExpr, strExpr, trueExpr,
falseExpr:
if self.kind in {trueExpr, falseExpr}:
result &= &"Literal({($self.kind)[0..^5]})"
elif self.kind == strExpr:
result &= &"Literal({LiteralExpr(self).literal.lexeme[1..^2].escape()})"
else:
result &= &"Literal({LiteralExpr(self).literal.lexeme})"
of identExpr:
result &= &"Identifier('{IdentExpr(self).name.lexeme}')"
of groupingExpr:
result &= &"Grouping({GroupingExpr(self).expression})"
of getItemExpr:
var self = GetItemExpr(self)
result &= &"GetItem(obj={self.obj}, name={self.name})"
of setItemExpr:
var self = SetItemExpr(self)
result &= &"SetItem(obj={self.obj}, name={self.value}, value={self.value})"
of callExpr:
var self = CallExpr(self)
result &= &"""Call({self.callee}, arguments=(positionals=[{self.arguments.positionals.join(", ")}], keyword=[{self.arguments.keyword.join(", ")}]))"""
of unaryExpr:
var self = UnaryExpr(self)
result &= &"Unary(Operator('{self.operator.lexeme}'), {self.a})"
of binaryExpr:
var self = BinaryExpr(self)
result &= &"Binary({self.a}, Operator('{self.operator.lexeme}'), {self.b})"
of assignExpr:
var self = AssignExpr(self)
result &= &"Assign(name={self.name}, value={self.value})"
of exprStmt:
var self = ExprStmt(self)
result &= &"ExpressionStatement({self.expression})"
of breakStmt:
result = "Break()"
of importStmt:
var self = ImportStmt(self)
result &= &"Import({self.moduleName})"
of assertStmt:
var self = AssertStmt(self)
result &= &"Assert({self.expression})"
of raiseStmt:
var self = RaiseStmt(self)
result &= &"Raise({self.exception})"
of blockStmt:
var self = BlockStmt(self)
result &= &"""Block([{self.code.join(", ")}])"""
of namedBlockStmt:
var self = NamedBlockStmt(self)
result &= &"""Block(name={self.name}, [{self.code.join(", ")}])"""
of whileStmt:
var self = WhileStmt(self)
result &= &"While(condition={self.condition}, body={self.body})"
of forEachStmt:
var self = ForEachStmt(self)
result &= &"ForEach(identifier={self.identifier}, expression={self.expression}, body={self.body})"
of returnStmt:
var self = ReturnStmt(self)
result &= &"Return({self.value})"
of yieldExpr:
var self = YieldExpr(self)
result &= &"Yield({self.expression})"
of awaitExpr:
var self = AwaitExpr(self)
result &= &"Await({self.expression})"
of ifStmt:
var self = IfStmt(self)
if self.elseBranch == nil:
result &= &"If(condition={self.condition}, thenBranch={self.thenBranch}, elseBranch=nil)"
else:
result &= &"If(condition={self.condition}, thenBranch={self.thenBranch}, elseBranch={self.elseBranch})"
of yieldStmt:
var self = YieldStmt(self)
result &= &"YieldStmt({self.expression})"
of awaitStmt:
var self = AwaitStmt(self)
result &= &"AwaitStmt({self.expression})"
of varDecl:
var self = VarDecl(self)
result &= &"Var(name={self.name}, value={self.value}, const={self.isConst}, private={self.isPrivate}, type={self.valueType}, pragmas={self.pragmas})"
of funDecl:
var self = FunDecl(self)
result &= &"""FunDecl(name={self.name}, body={self.body}, type={self.returnType}, arguments=[{self.arguments.join(", ")}], defaults=[{self.defaults.join(", ")}], generics=[{self.generics.join(", ")}], async={self.isAsync}, generator={self.isGenerator}, private={self.isPrivate}, pragmas={self.pragmas})"""
of typeDecl:
var self = TypeDecl(self)
result &= &"""TypeDecl(name={self.name}, fields={self.fields}, members={self.members}, private={self.isPrivate}, pragmas={self.pragmas}, generics={self.generics}, parent={self.parent}, ref={self.isRef}, enum={self.isEnum}, value={self.value})"""
of lambdaExpr:
var self = LambdaExpr(self)
result &= &"""Lambda(body={self.body}, type={self.returnType}, arguments=[{self.arguments.join(", ")}], defaults=[{self.defaults.join(", ")}], generator={self.isGenerator}, async={self.isAsync}, pragmas={self.pragmas})"""
of sliceExpr:
var self = SliceExpr(self)
result &= &"""Slice({self.expression}, ends=[{self.ends.join(", ")}])"""
of pragmaExpr:
var self = Pragma(self)
result &= &"Pragma(name={self.name}, args={self.args})"
of refExpr:
result &= &"Ref({Ref(self).value})"
of ptrExpr:
result &= &"Ptr({Ptr(self).value})"
of lentExpr:
result &= &"Lent({Lent(self).value})"
of genericExpr:
var self = GenericExpr(self)
result &= &"Generic(ident={self.ident}, args={self.args})"
else:
discard
proc `==`*(self, other: IdentExpr): bool {.inline.} = self.token == other.token
proc getRelativeBoundaries*(self: ASTNode): tuple[start, stop: int] =
## Recursively computes the position of a node relative
## to its containing line
case self.kind:
of varDecl:
var self = VarDecl(self)
let start = self.token.relPos.start
var stop = self.name.token.relPos.stop
if not self.value.isNil():
stop = self.value.token.relPos.stop
if self.pragmas.len() > 0:
stop = getRelativeBoundaries(self.pragmas[^1]).stop
result = (start, stop)
of typeDecl:
result = (self.token.relPos.start, TypeDecl(self).name.getRelativeBoundaries().stop)
of breakStmt, returnStmt, continueStmt:
result = self.token.relPos
of importStmt:
result = (self.token.relPos.start, getRelativeBoundaries(ImportStmt(self).moduleName).stop)
of exprStmt:
result = getRelativeBoundaries(ExprStmt(self).expression)
of unaryExpr:
var self = UnaryExpr(self)
result = (self.operator.relPos.start, getRelativeBoundaries(self.a).stop)
of binaryExpr:
var self = BinaryExpr(self)
result = (getRelativeBoundaries(self.a).start, getRelativeBoundaries(self.b).stop)
of intExpr, binExpr, hexExpr, octExpr, strExpr, floatExpr:
var self = LiteralExpr(self)
result = self.literal.relPos
of identExpr:
var self = IdentExpr(self)
result = self.token.relPos
of assignExpr:
var self = AssignExpr(self)
result = (getRelativeBoundaries(self.name).start, getRelativeBoundaries(self.value).stop)
of callExpr:
var self = CallExpr(self)
result = (getRelativeBoundaries(self.callee).start, self.closeParen.relPos.stop)
of getItemExpr:
var self = GetItemExpr(self)
result = (getRelativeBoundaries(self.obj).start, getRelativeBoundaries(self.name).stop)
of pragmaExpr:
var self = Pragma(self)
let start = self.token.relPos.start
var stop = 0
if self.args.len() > 0:
stop = self.args[^1].token.relPos.stop + 1
else:
stop = self.token.relPos.stop + 1
# -8 so the error highlights the #pragma[ part as well
result = (self.token.relPos.start - 8, stop)
of genericExpr:
var self = GenericExpr(self)
let ident = getRelativeBoundaries(self.ident)
var stop: int = ident.stop + 2
if self.args.len() > 0:
stop = getRelativeBoundaries(self.args[^1]).stop
result = (ident.start, stop)
of refExpr:
var self = Ref(self)
result = (self.token.relPos.start, self.value.getRelativeBoundaries().stop)
of ptrExpr:
var self = Ptr(self)
result = (self.token.relPos.start, self.value.getRelativeBoundaries().stop)
else:
result = (0, 0)

View File

@ -0,0 +1,669 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## A simple and modular tokenizer implementation with arbitrary lookahead
## using a customizable symbol table
import std/strutils
import std/parseutils
import std/strformat
import std/tables
import token
import errors
export token, errors
type
SymbolTable* = ref object
## A table of symbols used
## to lex a source file
# Although we don't parse keywords
# as symbols, but rather as identifiers,
# we keep them here for consistency
# purposes
keywords: TableRef[string, TokenType]
symbols: TableRef[string, TokenType]
Lexer* = ref object
## A lexer object
symbols*: SymbolTable
source: string
tokens: seq[Token]
line: int
start: int
current: int
file: string
lines: seq[tuple[start, stop: int]]
lastLine: int
linePos: int
lineCurrent: int
spaces: int
LexingError* = ref object of PeonException
## A lexing exception
lexer*: Lexer
pos*: tuple[start, stop: int]
proc newSymbolTable: SymbolTable =
## Initializes a new symbol table
new(result)
result.keywords = newTable[string, TokenType]()
result.symbols = newTable[string, TokenType]()
proc addSymbol*(self: SymbolTable, lexeme: string, token: TokenType) =
## Adds a symbol to the symbol table. Overwrites
## any previous entries
self.symbols[lexeme] = token
proc removeSymbol*(self: SymbolTable, lexeme: string) =
## Removes a symbol from the symbol table
## (does nothing if it does not exist)
self.symbols.del(lexeme)
proc addKeyword*(self: SymbolTable, lexeme: string, token: TokenType) =
## Adds a keyword to the symbol table. Overwrites
## any previous entries
self.keywords[lexeme] = token
proc removeKeyword*(self: SymbolTable, lexeme: string) =
## Removes a keyword from the symbol table
## (does nothing if it does not exist)
self.keywords.del(lexeme)
proc existsSymbol*(self: SymbolTable, lexeme: string): bool {.inline.} =
## Returns true if a given symbol exists
## in the symbol table already
lexeme in self.symbols
proc existsKeyword*(self: SymbolTable, lexeme: string): bool {.inline.} =
## Returns true if a given keyword exists
## in the symbol table already
lexeme in self.keywords
proc getToken(self: Lexer, lexeme: string): Token =
## Gets the matching token object for a given
## string according to the symbol table or
## returns nil if there's no match
let table = self.symbols
var kind = table.symbols.getOrDefault(lexeme, table.keywords.getOrDefault(
lexeme, NoMatch))
if kind == NoMatch:
return nil
new(result)
result.kind = kind
result.lexeme = self.source[self.start..<self.current]
result.line = self.line
result.pos = (start: self.start, stop: self.current - 1)
result.relPos = (start: self.linePos - result.lexeme.high() - 1, stop: self.linePos - 1)
result.spaces = self.spaces
self.spaces = 0
proc getMaxSymbolSize(self: SymbolTable): int =
## Returns the maximum length of all the symbols
## currently in the table. Note that keywords are
## not symbols, they're identifiers (or at least
## are parsed the same way in Lexer.parseIdentifier)
for lexeme in self.symbols.keys():
if len(lexeme) > result:
result = len(lexeme)
proc getSymbols(self: SymbolTable, n: int): seq[string] =
## Returns all n-bytes symbols
## in the symbol table
for lexeme in self.symbols.keys():
if len(lexeme) == n:
result.add(lexeme)
# Wrappers around isDigit and isAlphanumeric for
# strings
proc isDigit(s: string): bool =
for c in s:
if not c.isDigit():
return false
return true
proc isAlphaNumeric(s: string): bool =
for c in s:
if not c.isAlphaNumeric():
return false
return true
# Forward declaration
proc incLine(self: Lexer)
# Simple public getters used for error
# formatting and whatnot
proc getStart*(self: Lexer): int = self.start
proc getFile*(self: Lexer): string = self.file
proc getCurrent*(self: Lexer): int = self.current
proc getCurrentLinePos*(self: Lexer): tuple[start, stop: int] = (self.lastLine, self.linePos)
proc getLine*(self: Lexer): int = self.line
proc getLines*(self: Lexer): seq[tuple[start, stop: int]] = self.lines
proc getSource*(self: Lexer): string = self.source
proc getRelPos*(self: Lexer, line: int): tuple[start, stop: int] =
if self.tokens.len() == 0 or self.tokens[^1].kind != EndOfFile:
self.incLine()
return self.lines[line - 1]
proc newLexer*(self: Lexer = nil): Lexer =
## Initializes the lexer or resets
## the state of an existing one
new(result)
if self != nil:
result = self
result.source = ""
result.tokens = @[]
result.line = 1
result.start = 0
result.current = 0
result.file = ""
result.lines = @[]
result.lastLine = 0
result.linePos = 0
result.lineCurrent = 0
result.symbols = newSymbolTable()
result.spaces = 0
proc done(self: Lexer): bool =
## Returns true if we reached EOF
result = self.current >= self.source.len
proc incLine(self: Lexer) =
## Increments the lexer's line
## counter and updates internal
## line metadata
self.lines.add((self.lastLine, self.current))
self.lastLine = self.current
self.line += 1
self.linePos = 0
proc step(self: Lexer, n: int = 1): string =
## Steps n characters forward in the
## source file (default = 1). A string
## of at most n bytes is returned. If n
## exceeds EOF, the string will be shorter
while len(result) < n:
if self.done() or self.current > self.source.high():
break
else:
result.add(self.source[self.current])
inc(self.current)
inc(self.linePos)
proc peek(self: Lexer, distance: int = 0, length: int = 1): string =
## Returns a stream of characters of
## at most length bytes from the source
## file, starting at the given distance,
## without consuming it. The distance
## parameter may be negative to retrieve
## previously consumed tokens. If the
## distance and/or the length are beyond
## EOF (even partially), the resulting string
## will be shorter than length bytes. The string
## may be empty
var i = distance
while len(result) < length:
if self.done() or self.current + i > self.source.high() or
self.current + i < 0:
break
else:
result.add(self.source[self.current + i])
inc(i)
proc error(self: Lexer, message: string) =
## Raises a lexing error with the
## appropriate metadata
raise LexingError(msg: message, line: self.line, file: self.file, lexer: self, pos: (self.lineCurrent, self.linePos - 1))
proc check(self: Lexer, s: string, distance: int = 0): bool =
## Behaves like self.match(), without consuming the
## token. False is returned if we're at EOF
## regardless of what the token to check is.
## The distance is passed directly to self.peek()
if self.done():
return false
return self.peek(distance, len(s)) == s
proc check(self: Lexer, args: openarray[string], distance: int = 0): bool =
## Calls self.check() in a loop with
## each character from the given set of
## strings and returns at the first match.
## Useful to check multiple tokens in a situation
## where only one of them may match at one time
for s in args:
if self.check(s, distance):
return true
return false
proc match(self: Lexer, s: string): bool =
## Returns true if the next len(s) bytes
## of the source file match the provided
## string. If the match is successful,
## len(s) bytes are consumed, otherwise
## false is returned
if not self.check(s):
return false
discard self.step(len(s))
return true
proc match(self: Lexer, args: openarray[string]): bool =
## Calls self.match() in a loop with
## each character from the given set of
## strings and returns at the first match.
## Useful to match multiple tokens in a situation
## where only one of them may match at one time
for s in args:
if self.match(s):
return true
return false
proc createToken(self: Lexer, tokenType: TokenType) =
## Creates a token object and adds it to the token
## list. The lexeme and position of the token are
## inferred from the current state of the tokenizer
var tok: Token = new(Token)
tok.kind = tokenType
tok.lexeme = self.source[self.start..<self.current]
tok.line = self.line
tok.spaces = self.spaces
self.spaces = 0
tok.pos = (start: self.start, stop: self.current - 1)
tok.relPos = (start: self.linePos - tok.lexeme.high() - 1, stop: self.linePos - 1)
self.tokens.add(tok)
proc parseEscape(self: Lexer) =
## Boring escape sequence parsing. For more info check out
## https://en.wikipedia.org/wiki/Escape_sequences_in_C.
## As of now, \u and \U are not supported, but they'll
## likely be soon. Another notable limitation is that
## \xhhh and \nnn are limited to the size of a char
## (i.e. uint8, or 256 values)
case self.peek()[0]: # We use a char instead of a string because of how case statements handle ranges with strings
# (i.e. not well, given they crash the C code generator)
of 'a':
self.source[self.current] = cast[char](0x07)
of 'b':
self.source[self.current] = cast[char](0x7f)
of 'e':
self.source[self.current] = cast[char](0x1B)
of 'f':
self.source[self.current] = cast[char](0x0C)
of 'n':
when defined(windows):
# We natively convert LF to CRLF on Windows, and
# gotta thank Microsoft for the extra boilerplate!
self.source[self.current] = cast[char](0x0D)
self.source.insert(self.current + 1, 0X0A)
when defined(darwin):
# Thanks apple, lol
self.source[self.current] = cast[char](0x0A)
when defined(linux):
self.source[self.current] = cast[char](0X0D)
of 'r':
self.source[self.current] = cast[char](0x0D)
of 't':
self.source[self.current] = cast[char](0x09)
of 'v':
self.source[self.current] = cast[char](0x0B)
of '"':
self.source[self.current] = '"'
of '\'':
self.source[self.current] = '\''
of '\\':
self.source[self.current] = cast[char](0x5C)
of '0'..'9': # This is the reason we're using char instead of string. See https://github.com/nim-lang/Nim/issues/19678
var code = ""
var value = 0
var i = self.current
while i < self.source.high() and (let c = self.source[
i].toLowerAscii(); c in '0'..'7') and len(code) < 3:
code &= self.source[i]
i += 1
assert parseOct(code, value) == code.len()
if value > uint8.high().int:
self.error("escape sequence value too large (> 255)")
self.source[self.current] = cast[char](value)
of 'u', 'U':
self.error("unicode escape sequences are not supported (yet)")
of 'x':
var code = ""
var value = 0
var i = self.current
while i < self.source.high() and (let c = self.source[
i].toLowerAscii(); c in 'a'..'f' or c in '0'..'9'):
code &= self.source[i]
i += 1
assert parseHex(code, value) == code.len()
if value > uint8.high().int:
self.error("escape sequence value too large (> 255)")
self.source[self.current] = cast[char](value)
else:
self.error(&"invalid escape sequence '\\{self.peek()}'")
proc parseString(self: Lexer, delimiter: string, mode: string = "single") =
## Parses string literals. They can be expressed using matching pairs
## of either single or double quotes. Most C-style escape sequences are
## supported, moreover, a specific prefix may be prepended
## to the string to instruct the lexer on how to parse it:
## - b -> declares a byte string, where each character is
## interpreted as an integer instead of a character
## - r -> declares a raw string literal, where escape sequences
## are not parsed and stay as-is
## - f -> declares a format string, where variables may be
## interpolated using curly braces like f"Hello, {name}!".
## Braces may be escaped using a pair of them, so to represent
## a literal "{" in an f-string, one would use {{ instead
## Multi-line strings can be declared using matching triplets of
## either single or double quotes. They can span across multiple
## lines and escape sequences in them are not parsed, like in raw
## strings, so a multi-line string prefixed with the "r" modifier
## is redundant, although multi-line byte/format strings are supported
var slen = 0
while not self.check(delimiter) and not self.done():
if self.match("\n"):
if mode == "multi":
self.incLine()
else:
self.error("unexpected EOL while parsing string literal")
if mode in ["raw", "multi"]:
discard self.step()
elif self.match("\\"):
# This madness here serves to get rid of the slash, since \x is mapped
# to a one-byte sequence but the string '\x' is actually 2 bytes (or more,
# depending on the specific escape sequence)
self.source = self.source[0..<self.current] & self.source[
self.current + 1..^1]
self.parseEscape()
if mode == "format" and self.match("{"):
if self.match("{"):
self.source = self.source[0..<self.current] & self.source[
self.current + 1..^1]
continue
while not self.check(["}", "\""]):
discard self.step()
if self.check("\""):
self.error("unclosed '{' in format string")
elif mode == "format" and self.check("}"):
if not self.check("}", 1):
self.error("unmatched '}' in format string")
else:
self.source = self.source[0..<self.current] & self.source[
self.current + 1..^1]
discard self.step()
inc(slen)
if slen > 1 and delimiter == "'":
self.error("invalid character literal (length must be one!)")
if mode == "multi":
if not self.match(delimiter.repeat(3)):
self.error("unexpected EOL while parsing multi-line string literal")
elif self.done() and self.peek(-1) != delimiter:
self.error("unexpected EOF while parsing string literal")
else:
discard self.step()
if delimiter == "\"":
self.createToken(String)
else:
self.createToken(Char)
proc parseBinary(self: Lexer) =
## Parses binary numbers
while self.peek().isDigit():
if not self.check(["0", "1"]):
self.error(&"invalid digit '{self.peek()}' in binary literal")
discard self.step()
proc parseOctal(self: Lexer) =
## Parses octal numbers
while self.peek().isDigit():
if self.peek() notin "0".."7":
self.error(&"invalid digit '{self.peek()}' in octal literal")
discard self.step()
proc parseHex(self: Lexer) =
## Parses hexadecimal numbers
while self.peek().isAlphaNumeric():
if not self.peek().isDigit() and self.peek().toLowerAscii() notin "a".."f":
self.error(&"invalid hexadecimal literal")
discard self.step()
proc parseNumber(self: Lexer) =
## Parses numeric literals, which encompass
## integers and floating point numbers.
## Floats also support scientific notation
## (i.e. 3e14), while the fractional part
## must be separated from the decimal one
## using a dot (which acts as the comma).
## Float literals such as 32.5e3 are also supported.
## The "e" for the scientific notation of floats
## is case-insensitive. Binary number literals are
## expressed using the prefix 0b, hexadecimal
## numbers with the prefix 0x and octal numbers
## with the prefix 0o. Numeric literals support
## size specifiers, like so: 10'u8, 3.14'f32
var kind: TokenType
case self.peek():
of "b":
discard self.step()
kind = Binary
self.parseBinary()
of "x":
kind = Hex
discard self.step()
self.parseHex()
of "o":
kind = Octal
discard self.step()
self.parseOctal()
else:
kind = Integer
while isDigit(self.peek()) and not self.done():
discard self.step()
if self.check(["e", "E"]):
kind = Float
discard self.step()
while self.peek().isDigit() and not self.done():
discard self.step()
elif self.check("."):
# TODO: Is there a better way?
discard self.step()
if not isDigit(self.peek()):
self.error("invalid float number literal")
kind = Float
while isDigit(self.peek()) and not self.done():
discard self.step()
if self.check(["e", "E"]):
discard self.step()
while isDigit(self.peek()) and not self.done():
discard self.step()
if self.match("'"):
# Could be a size specifier, better catch it
while (self.peek().isAlphaNumeric() or self.check("_")) and
not self.done():
discard self.step()
self.createToken(kind)
proc parseBackticks(self: Lexer) =
## Parses tokens surrounded
## by backticks. This may be used
## for name stropping as well as to
## reimplement existing operators
## (e.g. +, -, etc.) without the
## parser complaining about syntax
## errors
while not self.match("`") and not self.done():
if self.peek().isAlphaNumeric() or self.symbols.existsSymbol(self.peek()):
discard self.step()
continue
self.error(&"unexpected character: '{self.peek()}'")
self.createToken(Identifier)
# Strips the backticks
self.tokens[^1].lexeme = self.tokens[^1].lexeme[1..^2]
proc parseIdentifier(self: Lexer) =
## Parses keywords and identifiers.
## Note that multi-character tokens
## (aka UTF runes) are not supported
## by design and *will* break things
while (self.peek().isAlphaNumeric() or self.check("_")) and not self.done():
discard self.step()
let name: string = self.source[self.start..<self.current]
if self.symbols.existsKeyword(name):
# It's a keyword!
self.createToken(self.symbols.keywords[name])
else:
# It's an identifier!
self.createToken(Identifier)
proc next(self: Lexer) =
## Scans a single token. This method is
## called iteratively until the source
## file reaches EOF
if self.done():
# We done boi
return
elif self.match(["\r", "\f", "\e"]):
# We skip characters we don't need
return
elif self.match(" "):
# Whitespaces
inc(self.spaces)
inc(self.start, 2)
elif self.match("\t"):
self.error("tabs are not allowed in peon code, use spaces for indentation instead")
elif self.match("\n"):
# New line
self.incLine()
# TODO: Broken
#[if not self.getToken("\n").isNil():
self.createToken(Semicolon)]#
elif self.match("`"):
# Stropped token
self.parseBackticks()
elif self.match(["\"", "'"]):
# String or character literal
var mode = "single"
if self.peek(-1) != "'" and self.check(self.peek(-1)) and self.check(
self.peek(-1), 1):
# Multiline strings start with 3 quotes
discard self.step(2)
mode = "multi"
self.parseString(self.peek(-1), mode)
elif self.peek().isDigit():
discard self.step() # Needed because parseNumber reads the next
# character to tell the base of the number
# Number literal
self.parseNumber()
elif self.peek().isAlphaNumeric() and self.check(["\"", "'"], 1):
# Prefixed string literal (i.e. f"Hi {name}!")
case self.step():
of "r":
self.parseString(self.step(), "raw")
of "b":
self.parseString(self.step(), "bytes")
of "f":
self.parseString(self.step(), "format")
else:
self.error(&"unknown string prefix '{self.peek(-1)}'")
elif self.peek().isAlphaNumeric() or self.check("_"):
# Keywords and identifiers
self.parseIdentifier()
elif self.match("#"):
if not self.match("pragma["):
# Inline comments
while not (self.match("\n") or self.done()):
discard self.step()
self.createToken(Comment)
self.incLine()
else:
self.createToken(Pragma)
else:
# If none of the above conditions matched, there's a few
# other options left:
# - The token is a built-in operator, or
# - it's an expression/statement delimiter, or
# - it's not a valid token at all
# We handle all of these cases here by trying to
# match the longest sequence of characters possible
# as either an operator or a statement/expression
# delimiter
var n = self.symbols.getMaxSymbolSize()
while n > 0:
for symbol in self.symbols.getSymbols(n):
if self.match(symbol):
# We've found the largest possible
# match!
self.tokens.add(self.getToken(symbol))
return
dec(n)
# We just assume what we have in front of us
# is a symbol
discard self.step()
self.createToken(Symbol)
proc lex*(self: Lexer, source, file: string): seq[Token] =
## Lexes a source file, converting a stream
## of characters into a series of tokens
var symbols = self.symbols
discard self.newLexer()
self.symbols = symbols
self.source = source
self.file = file
self.lines = @[]
self.lastLine = 0
self.linePos = 0
self.lineCurrent = 0
while not self.done():
self.next()
self.start = self.current
self.lineCurrent = self.linePos
self.tokens.add(Token(kind: EndOfFile, lexeme: "",
line: self.line, pos: (self.current, self.current),
relPos: (start: 0, stop: self.linePos - 1)))
self.incLine()
return self.tokens

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,83 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import std/strformat
import std/strutils
type
TokenType* {.pure.} = enum
## Token types enumeration
# Booleans
True, False,
# Control flow statements
If, Else,
# Looping statements
While, For,
# Keywords
Function, Break, Continue,
Var, Let, Const, Return,
Coroutine, Generator, Import,
Raise, Assert, Await, Foreach,
Yield, Type, Operator, Case,
Enum, From, Ptr, Ref, Object,
Export, Block, Switch, Lent
# Literal types
Integer, Float, String, Identifier,
Binary, Octal, Hex, Char
# Brackets, parentheses,
# operators and others
LeftParen, RightParen, # ()
LeftBrace, RightBrace, # {}
LeftBracket, RightBracket, # []
Dot, Semicolon, Comma, # . ; ,
# Miscellaneous
EndOfFile, # Marks the end of the token stream
NoMatch, # Used internally by the symbol table
Comment, # Useful for documentation comments, pragmas, etc.
Symbol, # A generic symbol
Pragma,
Token* = ref object
## A token object
kind*: TokenType # The type of the token
lexeme*: string # The lexeme associated to the token
line*: int # The line where the token appears
pos*: tuple[start, stop: int] # The absolute position in the source file
relPos*: tuple[start, stop: int] # The relative position in the source line
spaces*: int # Number of spaces before this token
proc `$`*(self: Token): string =
## Strinfifies
if self != nil:
result = &"Token(kind={self.kind}, lexeme={self.lexeme.escape()}, line={self.line}, pos=({self.pos.start}, {self.pos.stop}), relpos=({self.relPos.start}, {self.relPos.stop}), spaces={self.spaces})"
else:
result = "nil"
proc `==`*(self, other: Token): bool =
## Returns self == other
return self.kind == other.kind and self.lexeme == other.lexeme

70
src/main.nim Normal file
View File

@ -0,0 +1,70 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import util/fmterr
import util/symbols
import frontend/parsing/lexer
import frontend/parsing/parser
import frontend/compiler/compiler
import std/strformat
proc `$`(self: TypedNode): string =
if self.node.isConst():
var self = TypedExpr(self)
return &"{self.node}: {self.kind[]}"
case self.node.kind:
of varDecl, typeDecl, funDecl:
var self = TypedDecl(self)
result = &"{self.name[]}: {self.name.valueType[]}"
of identExpr, binaryExpr, unaryExpr:
var self = TypedExpr(self)
result &= &"{self.node}: {self.kind[]}"
else:
result = &"{self.node}: ? ({self.node.kind})"
proc main =
var
lexer = newLexer()
parser = newParser()
compiler = newPeonCompiler()
source: string
file = "test.pn"
lexer.fillSymbolTable()
while true:
stdout.write(">>> ")
stdout.flushFile()
try:
source = stdin.readLine()
for typedNode in compiler.compile(parser.parse(lexer.lex(source, file), file, lexer.getLines(), lexer.getSource()), lexer.getFile(), lexer.getSource(),
showMismatches=true):
echo &"{typedNode.node} -> {compiler.stringify(typedNode)}\n"
except IOError:
echo ""
break
except LexingError as exc:
print(exc)
except ParseError as exc:
print(exc)
except CompileError as exc:
print(exc)
when isMainModule:
setControlCHook(proc () {.noconv.} = echo ""; quit(0))
main()

83
src/util/fmterr.nim Normal file
View File

@ -0,0 +1,83 @@
# Copyright 2022 Mattia Giambirtone & All Contributors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Utilities to print formatted error messages to stderr
import frontend/compiler/compiler
import frontend/parsing/parser
import frontend/parsing/lexer
import errors
import std/os
import std/terminal
import std/strutils
import std/strformat
proc printError(file, line: string, lineNo: int, pos: tuple[start, stop: int], fn: Declaration, msg: string) =
## Internal helper to print a formatted error message
## to stderr
stderr.styledWrite(fgRed, styleBright, "Error in ", fgYellow, &"{file}:{lineNo}:{pos.start}")
if not fn.isNil() and fn.kind == funDecl:
stderr.styledWrite(fgRed, styleBright, " in function ", fgYellow, FunDecl(fn).name.token.lexeme)
stderr.styledWriteLine(styleBright, fgDefault, ": ", msg)
if line.len() > 0:
stderr.styledWrite(fgRed, styleBright, "Source line: ", resetStyle, fgDefault, line[0..<pos.start])
if pos.stop == line.len():
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..<pos.stop])
stderr.styledWriteLine(fgDefault, line[pos.stop..^1])
else:
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..pos.stop])
stderr.styledWriteLine(fgDefault, line[pos.stop + 1..^1])
proc print*(exc: CompileError) =
## Prints a formatted error message
## for compilation errors to stderr
var file = exc.file
var contents = ""
case exc.line:
of -1: discard
of 0: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line]
else: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
printError(file, contents, exc.line, exc.node.getRelativeBoundaries(), exc.function, exc.msg)
proc print*(exc: ParseError) =
## Prints a formatted error message
## for parsing errors to stderr
var file = exc.file
if file notin ["<string>", ""]:
file = relativePath(exc.file, getCurrentDir())
var contents = ""
if exc.line != -1:
contents = exc.parser.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
else:
contents = ""
printError(file, contents, exc.line, exc.token.relPos, exc.parser.getCurrentFunction(), exc.msg)
proc print*(exc: LexingError) =
## Prints a formatted error message
## for lexing errors to stderr
var file = exc.file
if file notin ["<string>", ""]:
file = relativePath(exc.file, getCurrentDir())
var contents = ""
if exc.line != -1:
contents = exc.lexer.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
else:
contents = ""
printError(file, contents, exc.line, exc.pos, nil, exc.msg)

60
src/util/symbols.nim Normal file
View File

@ -0,0 +1,60 @@
import ../frontend/parsing/lexer
proc fillSymbolTable*(tokenizer: Lexer) =
## Initializes the Lexer's symbol
## table with builtin symbols and
## keywords
# 1-byte symbols
tokenizer.symbols.addSymbol("{", LeftBrace)
tokenizer.symbols.addSymbol("}", RightBrace)
tokenizer.symbols.addSymbol("(", LeftParen)
tokenizer.symbols.addSymbol(")", RightParen)
tokenizer.symbols.addSymbol("[", LeftBracket)
tokenizer.symbols.addSymbol("]", RightBracket)
tokenizer.symbols.addSymbol(".", Dot)
tokenizer.symbols.addSymbol(",", Comma)
tokenizer.symbols.addSymbol(";", Semicolon)
# Keywords
tokenizer.symbols.addKeyword("type", TokenType.Type)
tokenizer.symbols.addKeyword("enum", Enum)
tokenizer.symbols.addKeyword("case", Case)
tokenizer.symbols.addKeyword("operator", Operator)
tokenizer.symbols.addKeyword("generator", Generator)
tokenizer.symbols.addKeyword("fn", TokenType.Function)
tokenizer.symbols.addKeyword("coroutine", Coroutine)
tokenizer.symbols.addKeyword("break", TokenType.Break)
tokenizer.symbols.addKeyword("continue", Continue)
tokenizer.symbols.addKeyword("while", While)
tokenizer.symbols.addKeyword("for", For)
tokenizer.symbols.addKeyword("foreach", Foreach)
tokenizer.symbols.addKeyword("if", If)
tokenizer.symbols.addKeyword("else", Else)
tokenizer.symbols.addKeyword("await", TokenType.Await)
tokenizer.symbols.addKeyword("raise", TokenType.Raise)
tokenizer.symbols.addKeyword("assert", TokenType.Assert)
tokenizer.symbols.addKeyword("const", Const)
tokenizer.symbols.addKeyword("let", Let)
tokenizer.symbols.addKeyword("var", TokenType.Var)
tokenizer.symbols.addKeyword("import", Import)
tokenizer.symbols.addKeyword("yield", TokenType.Yield)
tokenizer.symbols.addKeyword("return", TokenType.Return)
tokenizer.symbols.addKeyword("object", Object)
tokenizer.symbols.addKeyword("export", Export)
tokenizer.symbols.addKeyword("block", TokenType.Block)
tokenizer.symbols.addKeyword("switch", TokenType.Switch)
tokenizer.symbols.addKeyword("lent", TokenType.Lent)
# These are more like expressions with a reserved
# name that produce a value of a builtin type,
# but we don't need to care about that until
# we're in the parsing/compilation steps so
# it's fine
tokenizer.symbols.addKeyword("true", True)
tokenizer.symbols.addKeyword("false", False)
tokenizer.symbols.addKeyword("ref", TokenType.Ref)
tokenizer.symbols.addKeyword("ptr", TokenType.Ptr)
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
">>", "<<"]:
tokenizer.symbols.addSymbol(sym, Symbol)