Initial rework additions
This commit is contained in:
parent
fe568afc68
commit
e6e9b3965c
|
@ -0,0 +1,201 @@
|
|||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
127
README.md
127
README.md
|
@ -1,3 +1,126 @@
|
|||
# peon-rewrite
|
||||
# The peon programming language
|
||||
|
||||
Work in progress for Peon 0.2.x
|
||||
Peon is a modern, multi-paradigm, async-first programming language with a focus on correctness and speed.
|
||||
|
||||
[Go to the Manual](docs/manual.md)
|
||||
|
||||
|
||||
## What's peon?
|
||||
|
||||
__Note__: For simplicity reasons, the verbs in this section refer to the present even though part of what's described here is not implemented yet.
|
||||
|
||||
|
||||
Peon is a multi-paradigm, statically-typed programming language inspired by C, Nim, Python, Rust and C++: it supports modern, high-level
|
||||
features such as automatic type inference, parametrically polymorphic generic types, pure functions, closures, interfaces, single inheritance,
|
||||
reference types, templates, coroutines, raw pointers and exceptions.
|
||||
|
||||
The memory management model is rather simple: a Mark and Sweep garbage collector is employed to reclaim unused memory, although more garbage
|
||||
collection strategies (such as generational GC or deferred reference counting) are planned to be added in the future.
|
||||
|
||||
Peon features a native cooperative concurrency model designed to take advantage of the inherent waiting of typical I/O workloads, without the use of more than one OS thread (wherever possible), allowing for much greater efficiency and a smaller memory footprint. The asynchronous model used forces developers to write code that is both easy to reason about, thanks to the [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) model that is core to peon's async event loop implementation, and works as expected every time (without dropping signals, exceptions, or task return values).
|
||||
|
||||
Other notable features are the ability to define (and overload) custom operators with ease by implementing them as language-level functions, [Universal function call syntax](https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax), [Name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)) and named scopes.
|
||||
|
||||
In peon, all objects are first-class (this includes functions, iterators, closures and coroutines).
|
||||
|
||||
## Disclaimers
|
||||
|
||||
**Disclaimer 1**: The project is still in its very early days: lots of stuff is not implemented, a work in progress or
|
||||
otherwise outright broken. Feel free to report bugs!
|
||||
|
||||
**Disclaimer 2**: Currently, the `std` module has to be _always_ imported explicitly for even the most basic snippets to work. This is because intrinsic types and builtin operators are defined within it: if it is not imported, peon won't even know how to parse `2 + 2` (and even if it could, it would have no idea what the type of the expression would be). You can have a look at the [peon standard library](src/peon/stdlib) to see how the builtins are defined (be aware that they heavily rely on compiler black magic to work) and can even provide your own implementation if you're so inclined.
|
||||
|
||||
|
||||
### TODO List
|
||||
|
||||
In no particular order, here's a list of stuff that's done/to do (might be incomplete/out of date):
|
||||
- User-defined types
|
||||
- Function calls ✅
|
||||
- Control flow (if-then-else, switch) ✅
|
||||
- Looping (while) ✅
|
||||
- Iteration (foreach)
|
||||
- Type conversions
|
||||
- Type casting
|
||||
- Intrinsics ✅
|
||||
- Type unions ✅
|
||||
- Functions ✅
|
||||
- Closures
|
||||
- Managed references
|
||||
- Unmanaged references
|
||||
- Named scopes/blocks ✅
|
||||
- Inheritance
|
||||
- Interfaces
|
||||
- Generics ✅
|
||||
- Automatic types ✅
|
||||
- Iterators/Generators
|
||||
- Coroutines
|
||||
- Pragmas ✅
|
||||
- Attribute resolution ✅
|
||||
- Universal Function Call Syntax
|
||||
- Import system ✅
|
||||
- Exceptions
|
||||
- Templates (_not_ like C++ templates) ✅
|
||||
- Optimizations (constant folding, branch and dead code elimination, inlining)
|
||||
|
||||
|
||||
## Feature wishlist
|
||||
|
||||
Here's a random list of high-level features I would like peon to have and that I think are kinda neat (some may
|
||||
have been implemented alredady):
|
||||
- Reference types are not nullable by default (must use `#pragma[nullable]`)
|
||||
- The `commutative` pragma, which allows to define just one implementation of an operator
|
||||
and have it become commutative
|
||||
- Easy C/Nim interop via FFI
|
||||
- C/C++ backend
|
||||
- Nim backend
|
||||
- [Structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) (must-have!)
|
||||
- Simple OOP (with multiple dispatch!)
|
||||
- RTTI, with methods that dispatch at runtime based on the true (aka runtime) type of a value
|
||||
- Limited compile-time evaluation (embed the Peon VM in the C/C++/Nim backend and use that to execute peon code at compile time)
|
||||
|
||||
|
||||
## The name
|
||||
|
||||
The name for peon comes from [Productive2's](https://git.nocturn9x.space/prod2) genius cute brain, and is a result of shortening
|
||||
the name of the fastest animal on earth: the **Pe**regrine Falc**on**. I guess I wanted this to mean peon will be blazing fast (I
|
||||
certainly hope so!)
|
||||
|
||||
# Peon needs you.
|
||||
|
||||
No, but really. I need help. This project is huge and (IMHO) awesome, but there's a lot of non-trivial work to do and doing
|
||||
it with other people is just plain more fun and rewarding. If you want to get involved, definitely try [contacting](https://nocturn9x.space/contact) me
|
||||
or open an issue/PR!
|
||||
|
||||
|
||||
# Credits
|
||||
|
||||
- Araq, for creating the amazing language that is [Nim](https://nim-lang.org) (as well as all of its contributors!)
|
||||
- Guido Van Rossum, aka the chad who created [Python](https://python.org) and its awesome community and resources
|
||||
- The Nim community and contributors, for making Nim what it is today
|
||||
- Bob Nystrom, for his amazing [book](https://craftinginterpreters.com) that inspired me
|
||||
and taught me how to actually make a programming language (kinda, I'm still very dumb)
|
||||
- [Njsmith](https://vorpus.org/), for his awesome articles on structured concurrency
|
||||
- All the amazing people in the [r/ProgrammingLanguages](https://reddit.com/r/ProgrammingLanguages) subreddit and its [Discord](https://discord.gg/tuFCPmB7Un) server
|
||||
- [Art](https://git.nocturn9x.space/art) <3
|
||||
- Everyone to listened (and still listens to) me ramble about compilers, programming languages and the likes (and for giving me ideas and testing peon!)
|
||||
- ... More? (I'd thank the contributors but it's just me :P)
|
||||
- Me! I guess
|
||||
|
||||
|
||||
## Ok, cool, how do I use it?
|
||||
|
||||
Great question! If this README somehow didn't turn you away already (thanks, by the way), then you may want to try peon
|
||||
out for yourself. Fortunately, the process is quite straightforward:
|
||||
|
||||
- First, you're gonna have to install [Nim](https://nim-lang.org/), the language peon is written in. I highly recommend
|
||||
using [choosenim](https://github.com/dom96/choosenim) to manage your Nim installations as it makes switching between them and updating them a breeze
|
||||
- Then, clone this repository and compile peon in release mode with `nim c -d:release --passC:"-flto" -o:peon src/main`, which should produce`peon` binary
|
||||
ready for you to play with (if your C toolchain doesn't support LTO then you can just omit the `--passC` option, although that would be pretty weird for
|
||||
a modern linker)
|
||||
- If you want to move the executable to a different directory (say, into your `PATH`), you should copy peon's standard
|
||||
library (found in `/src/peon/stdlib`) into a known folder, edit the `moduleLookupPaths` variable inside `src/config.nim`
|
||||
by adding said folder to it so that the peon compiler knows where to find modules when you `import std;` and then recompile
|
||||
peon. Hopefully I will automate this soon, but as of right now the work is all manual
|
||||
|
||||
|
||||
__Note__: On Linux, peon will also look into `~/.local/peon/stdlib` by default, so you can just create the `~/.local/peon` folder and copy `src/peon/stdlib` there
|
|
@ -0,0 +1,3 @@
|
|||
{
|
||||
"makefile.extensionOutputFolder": "./.vscode"
|
||||
}
|
|
@ -0,0 +1,113 @@
|
|||
# Peon - Bytecode Specification
|
||||
|
||||
This document aims to document peon's bytecode as well as how it is (de-)serialized to/from files and
|
||||
other file-like objects. Note that the segments in a bytecode dump appear in the order they are listed
|
||||
in this document.
|
||||
|
||||
## Code Structure
|
||||
|
||||
A peon program is compiled into a tightly packed sequence of bytes that contain all the necessary information
|
||||
the VM needs to execute said program. There is no dependence between the frontend and the backend outside of the
|
||||
bytecode format (which is implemented in a separate serialiazer module) to allow for maximum modularity.
|
||||
|
||||
A peon bytecode file contains the following:
|
||||
|
||||
- Constants
|
||||
- The program's code
|
||||
- Debugging information (file and version metadata, module info. Optional)
|
||||
|
||||
|
||||
## File Headers
|
||||
|
||||
A peon bytecode file starts with the header, which is structured as follows:
|
||||
|
||||
- The literal string `PEON_BYTECODE`
|
||||
- A 3-byte version number (the major, minor and patch version numbers of the compiler that generated the file)
|
||||
- The branch name of the repository the compiler was built from, prepended with its length as a 1 byte integer
|
||||
- The commit hash (encoded as a 40-byte hex-encoded string) in the aforementioned branch from which the compiler was built from (particularly useful in development builds)
|
||||
- An 8-byte UNIX timestamp (with Epoch 0 starting at 1/1/1970 12:00 AM) representing the exact date and time of when the file was generated
|
||||
|
||||
## Debug information
|
||||
|
||||
The following segments contain extra information and metadata about the compiled bytecode to aid debugging, but they may be missing
|
||||
in release builds.
|
||||
|
||||
### Line data segment
|
||||
|
||||
The line data segment contains information about each instruction in the code segment and associates them
|
||||
1:1 with a line number in the original source file for easier debugging using run-length encoding. The segment's
|
||||
size is fixed and is encoded at the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data
|
||||
in this segment can be decoded as explained in [this file](../src/frontend/compiler/targgets/bytecode/opcodes.nim#L29), which is quoted
|
||||
below:
|
||||
```
|
||||
[...]
|
||||
## lines maps bytecode instructions to line numbers using Run
|
||||
## Length Encoding. Instructions are encoded in groups whose structure
|
||||
## follows the following schema:
|
||||
## - The first integer represents the line number
|
||||
## - The second integer represents the number of
|
||||
## instructions on that line
|
||||
## For example, if lines equals [1, 5], it means that there are 5 instructions
|
||||
## at line 1, meaning that all instructions in code[0..4] belong to the same line.
|
||||
## This is more efficient than using the naive approach, which would encode
|
||||
## the same line number multiple times and waste considerable amounts of space.
|
||||
[...]
|
||||
```
|
||||
|
||||
### Functions segment
|
||||
|
||||
This segment contains details about each function in the original file. The segment's size is fixed and is encoded at the
|
||||
beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The data in this segment can be decoded as explained
|
||||
in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L39), which is quoted below:
|
||||
|
||||
```
|
||||
[...]
|
||||
## functions encodes the following information:
|
||||
## - Function name
|
||||
## - Argument count
|
||||
## - Function boundaries
|
||||
## The encoding for functions is the following:
|
||||
## - First, the position into the bytecode where the function begins is encoded (as a 3 byte integer)
|
||||
## - Second, the position into the bytecode where the function ends is encoded (as a 3 byte integer)
|
||||
## - After that follows the argument count as a 1 byte integer
|
||||
## - Lastly, the function's name (optional) is encoded in ASCII, prepended with
|
||||
## its size as a 2-byte integer
|
||||
[...]
|
||||
```
|
||||
|
||||
### Modules segment
|
||||
|
||||
This segment contains details about the modules that make up the original source code which produced a given bytecode dump.
|
||||
The data in this segment can be decoded as explained in [this file](../src/frontend/compiler/targets/bytecode/opcodes.nim#L49), which is quoted below:
|
||||
```
|
||||
[...]
|
||||
## modules contains information about all the peon modules that the compiler has encountered,
|
||||
## along with their start/end offset in the code. Unlike other bytecode-compiled languages like
|
||||
## Python, peon does not produce a bytecode file for each separate module it compiles: everything
|
||||
## is contained within a single binary blob. While this simplifies the implementation and makes
|
||||
## bytecode files entirely "self-hosted", it also means that the original module information is
|
||||
## lost: this segment serves to fix that. The segment's size is encoded at the beginning as a 4-byte
|
||||
## sequence (i.e. a single 32-bit integer) and its encoding is similar to that of the functions segment:
|
||||
## - First, the position into the bytecode where the module begins is encoded (as a 3 byte integer)
|
||||
## - Second, the position into the bytecode where the module ends is encoded (as a 3 byte integer)
|
||||
## - Lastly, the module's name is encoded in ASCII, prepended with its size as a 2-byte integer
|
||||
[...]
|
||||
```
|
||||
|
||||
|
||||
## Constant segment
|
||||
|
||||
The constant segment contains all the read-only values that the code will need at runtime, such as hardcoded
|
||||
variable initializers or constant expressions. It is similar to the `.rodata` section of Assembly files, although
|
||||
the implementation is different. Constants are encoded as a linear sequence of bytes with no type information about
|
||||
them whatsoever: it is the code that, at runtime, loads each constant (whose type is determined at compile time) onto
|
||||
the stack accordingly. For example, a 32 bit integer constant would be encoded as a sequence of 4 bytes, which would
|
||||
then be loaded by the appropriate `LoadInt32` instruction at runtime. The segment's size is fixed and is encoded at
|
||||
the beginning as a sequence of 4 bytes (i.e. a single 32 bit integer). The constant segment may be empty, although in
|
||||
real-world scenarios it likely won't be.
|
||||
|
||||
## Code segment
|
||||
|
||||
The code segment contains the linear sequence of bytecode instructions of a peon program to be fed directly to
|
||||
peon's virtual machine. The segment's size is fixed and is encoded at the beginning as a sequence of 3 bytes
|
||||
(i.e. a single 24 bit integer). All the instructions are documented [here](../src/frontend/compiler/targgets/bytecode/opcodes.nim)
|
|
@ -0,0 +1,32 @@
|
|||
# Peon design scratchpad
|
||||
|
||||
This is just a random doc I made to keep track of all the design changes I have
|
||||
in mind for Peon: with this being my first serious attempt at making a programming
|
||||
language that's actually _useful_, I want to get the design right the first time
|
||||
(no one wants to make JavaScript 2.0, right? _Right?_).
|
||||
|
||||
|
||||
The basic idea is:
|
||||
- Some peon code comes in (from a file or as command-line input, doesn't matter)
|
||||
- It gets tokenized and parsed into a typeless AST
|
||||
- The compiler processes the typeless AST into a typed one
|
||||
- The typed AST is passed to an optional optimizer module, which spits
|
||||
out another (potentially identical) typed AST representing the optimized
|
||||
program. The optimizer is always run even when optimizations are disabled,
|
||||
as it takes care of performing closure conversion and other cool stuff
|
||||
- The typed AST is passed to a code generator module that is specific to every
|
||||
backend/platform, which actually takes care of producing the code that will
|
||||
then be executed
|
||||
|
||||
|
||||
The current design is fairly modular and some parts of the codebase are more final
|
||||
than others: for example, the lexer and parser are more or less complete and unlikely
|
||||
to undergo massive changes in the future as opposed to the compiler which has been subject
|
||||
to many major refactoring steps as the project went along, but I digress.
|
||||
|
||||
The typed AST format should ideally be serializable to binary files so that I can slot in
|
||||
different optimizer/code generator modules written in different languages without the need
|
||||
to use FFI. The format will serve a similar purpose to the IR used by gcc (GIMPLE), but instead
|
||||
of being an RTL-like language it'll operate on a much higher level since we don't really need to
|
||||
support any other programming language other than peon itself (while gcc has to be interoperable
|
||||
with FORTRAN and other stuff).
|
|
@ -0,0 +1,179 @@
|
|||
# Peon - Formal Grammar Specification
|
||||
|
||||
__Note__: This document is currently a draft and is therefore incomplete
|
||||
|
||||
## Rationale
|
||||
The purpose of this document is to provide an unambiguous formal specification of peon's syntax for use in automated
|
||||
compiler generators (known as "compiler compilers") and parsers.
|
||||
|
||||
Our grammar is inspired by (and extended from) the Lox language as described in Bob Nystrom's book "Crafting Interpreters",
|
||||
available at https://craftinginterpreters.com, and follows the EBNF standard, but for clarity the relevant syntax will
|
||||
be explained below.
|
||||
|
||||
## Disclaimer
|
||||
----------------------------------------------
|
||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
|
||||
"OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119).
|
||||
|
||||
Literals in this document will be often surrounded by double quotes to make it obvious they're not part of a sentence. To
|
||||
avoid ambiguity, this document will always specify explicitly if double quotes need to be considered as part of a term or not,
|
||||
which means that if it is not otherwise stated they are to be considered part of said term. In addition to quotes, literals
|
||||
may be formatted in monospace to make them stand out more in the document.
|
||||
|
||||
## EBNF Syntax & Formatting rules
|
||||
----------------------------------------------
|
||||
As a refresher to experienced users as well as to facilitate reading to newcomers, the variation of EBNF used in this document
|
||||
is detailed below:
|
||||
- The literal `"LF"` (without quotes) is a shorthand for "Line Feed". It symbolizes the end of a line and it's platform-independent
|
||||
- A pair of 2 forward-slashes (character code 47) is used to mark comments. A comment lasts until the
|
||||
the end of a line is encountered. It is RECOMMENDED to use them to clarify each rule, or a group of rules,
|
||||
to simplify human inspection of the specification
|
||||
- The name of non-terminal productions MUST be in lowercase (such as `foo`), while for terminals it MUST be in uppercase (such as `FOO`)
|
||||
- Whitespaces, tabs, newlines and form feeds (character codes 32, 9, 10 and 12 respectively) are not
|
||||
relevant to the grammar and MUST be ignored by automated parsers and parser generators
|
||||
- `"*"` (without quotes, character code 42) is used for repetition of a rule, meaning it MUST match 0 or more times
|
||||
- `"?"` (without quotes, character code 63) means a rule can match 0 or 1 times
|
||||
- `"+"` (character code 43) is used for repetition of a rule, meaning it MUST match 1 or more times
|
||||
- `"|"` (without quotes, character code 123) is used to indicate alternatives and means a rule may match either the first or
|
||||
the second rule. This operator can be chained to obtain something like `"foo" | "bar" | "baz"`, meaning that either
|
||||
the literal strings foo, bar or baz are valid matches for the rule
|
||||
- `"{x,y}"` (without quotes) is used for repetition, meaning a rule MUST match from x to y times (start to end, inclusive).
|
||||
Omitting x means the rule MUST match at least 0 times and at most x times, while omitting y means the rule
|
||||
MUST match exactly y times. Omitting both x and y is the same as using `*`
|
||||
- Production rules are terminated with an ASCII semicolon (`COLON` without quotes, character code 59)
|
||||
- Rules are listed in descending order: the last rule is the highest-precedence one. Think of it as a "more complex rules
|
||||
come first"
|
||||
- An "arrow" (character code 8594) MUST be used to separate rule names from their definition.
|
||||
A rule definition, then, looks something like this (without quotes): `"name → rule definition here; // optional comment"`
|
||||
- Literal numbers can be expressed in their decimal form (i.e. with arabic numbers). Other supported formats are
|
||||
hexadecimal using the prefix `0x`, octal using the prefix `0o`, and binary using the prefix `0b`. For example,
|
||||
the literals `0x7F`, `0b1111111` and `0o177` all represent the decimal number `127` in hexadecimal, binary and
|
||||
octal respectively
|
||||
- The literal `"EOF"` (without quotes), represents the end of the input stream and is a shorthand for "End Of File"
|
||||
- Ranges can be defined by separating the start and the end of the range with three dots (character code 46) and
|
||||
are inclusive at both ends. Both the start and the end of the range are mandatory and it is RECOMMENDED that they
|
||||
be separated by the three dots with a space for ease of reading. Ranges can define numerical sets like in `"0 ... 9"`
|
||||
(without quotes), or lexicographical ones such as `"'a' ... 'z'"` (without quotes), in which case the range should be
|
||||
interpreted as a sequence of the character codes between the start and end of the range. Ranges are inclusive at both
|
||||
ends. It is REQUIRED that the first element in the range is greater or equal to the last one: backwards ranges are illegal.
|
||||
In addition to this, although numerical ranges can use any combination of the supported number representation
|
||||
(meaning `'0 ... 0x10'` is a valid range encompassing all decimal numbers from 0 to 16) it is RECOMMENDED that
|
||||
the representation used is consistent across the start and end of the range. Finally, ranges can have a character
|
||||
and a number as either start or end of them, in which case the character is to be interpreted as its character code in decimal
|
||||
- For readability purposes, it is RECOMMENTED that the grammar text be left aligned and that spaces are used between
|
||||
operators
|
||||
- Literal strings MUST be delimited by matching pairs of double or single quotes (character code 34 and 39) and SHOULD be separated
|
||||
by any other term in the grammar by a space
|
||||
- Characters inside strings can be escaped using backslashes. For example, to add a literal double quote inside a double-quoted string, one MUST
|
||||
write `"\""` (without quotes), althoguh it is recommended to use single quotes in this case (i.e. `'"'` instead)
|
||||
|
||||
## EBNF Grammar
|
||||
----------------------------------------------
|
||||
Below you can find the EBNF specification of peon's grammar.
|
||||
|
||||
```
|
||||
// Top-level code
|
||||
program → declaration* EOF; // An entire program (Note: an empty program *is* a valid program)
|
||||
|
||||
// Declarations (rules that bind a name to an object in the current scope and produce no side effects)
|
||||
|
||||
// A program is composed by a list of declarations
|
||||
declaration → funDecl | varDecl | coroDecl | statement;
|
||||
// Function declarations
|
||||
funDecl → "fn" function;
|
||||
coroDecl → "coro" function;
|
||||
// Constants still count as "variable" declarations in the grammar
|
||||
varDecl → ("var" | "let" | "const") IDENTIFIER ( "=" expression )? COLON;
|
||||
|
||||
|
||||
// Statements (rules that produce side effects, without binding a name. Well, mostly: import, foreach and others do, but they're exceptions to the rule)
|
||||
statement → exprStmt | ifStmt | returnStmt| whileStmt| blockStmt; // The set of all statements
|
||||
// Any expression followed by a semicolon is an expression statement
|
||||
exprStmt → expression COLON;
|
||||
// Returns from a function, illegal in top-level code. An empty return statement is illegal
|
||||
// in non-void functions
|
||||
returnStmt → "return" expression? COLON;
|
||||
// Defers the evaluation of the given expression right before a function exits, illegal in top-level code.
|
||||
// Semantically and functionally equivalent to wrapping a function in a big try block and executing the
|
||||
// expression in the finally block, but less verbose
|
||||
deferStmt → "defer" expression COLON;
|
||||
// Breaks out of a loop or named block
|
||||
breakStmt → "break" IDENTIFIER? COLON;
|
||||
// Skips to the next iteration in a loop or jumps to the
|
||||
// beginning of a named block
|
||||
continueStmt → "continue" IDENTIFIER? COLON;
|
||||
importStmt -> ("from" IDENTIFIER)? "import" (IDENTIFIER ("as" IDENTIFIER)? ","?)+ COLON; // Imports one or more modules in the current scope. Creates a namespace
|
||||
assertStmt → "assert" expression COLON;
|
||||
yieldStmt → "yield" expression? COLON;
|
||||
// Pauses the execution of the calling coroutine and calls the given coroutine. Execution continues when the callee returns
|
||||
awaitStmt → "await" expression COLON;
|
||||
// Exception handling
|
||||
tryStmt → "try" "{" statement* "}" (except+ "finally" statement | "finally" statement | "else" statement | except+ "else" statement | except+ "else" statement "finally" statement);
|
||||
// Blocks create a new scope that lasts until they're closed
|
||||
blockStmt → "{" declaration* "}";
|
||||
// Named blocks are useful for breaking out of deeply nested loops
|
||||
namedBlock → "block" IDENTIFIER "{" declaration* "}";
|
||||
// If statements are conditional jumps
|
||||
ifStmt → "if" expression "{" statement* "}" ("else" "{" statement* "}")?;
|
||||
// While loops run until their condition is true
|
||||
whileStmt → "while" expression "{" statement* "}";
|
||||
// For-each loops iterate over a collection type
|
||||
foreachStmt → "foreach" "(" (IDENTIFIER ":" expression) ")" "{" statement* "}";
|
||||
|
||||
|
||||
// Expressions (rules that produce a value and may have side effects)
|
||||
|
||||
// Assignment is the highest-level expression
|
||||
expression → assignment;
|
||||
assignment → (call ".")? IDENTIFIER ASSIGNTOKENS assignment | lambdaExpr;
|
||||
lambdaExpr → "lambda" lambda; // Lambdas are anonymous functions, so they act as expressions
|
||||
yieldExpr → "yield" expression?; // Empty yield equals yield nil
|
||||
awaitExpr → "await" expression;
|
||||
logic_or → logic_and ("and" logic_and)*;
|
||||
logic_and → equality ("or" equality)*;
|
||||
equality → comparison (("!=" | "==") comparison)*;
|
||||
comparison → term ((">" | ">=" | "<" | "<=" | "as" | "is" | "of") term)*;
|
||||
term → factor (("-" | "+") factor)*; // Precedence for + and - in operations
|
||||
factor → unary (("/" | "*" | "**" | "^" | "&") unary)*; // All other binary operators have the same precedence
|
||||
unary → ("!" | "-" | "~") unary | call;
|
||||
slice → expression "[" expression (":" expression){0,2} "]"
|
||||
call → primary ("(" arguments? ")" | "." IDENTIFIER)*;
|
||||
// Below are some collection literals: lists, sets, dictionaries and tuples
|
||||
listExpr → "[" arguments* "]";
|
||||
// Note: "{}" is an empty dictionary, NOT an empty set
|
||||
setExpr → "{" arguments? "}";
|
||||
dictExpr → "{" (expression ":" expression ("," expression ":" expression)*)* "}"; // {key: value, ...}
|
||||
tupleExpr → "(" arguments* ")";
|
||||
primary → "nan" | "true" | "false" | "nil" | "inf" | NUMBER | STRING | IDENTIFIER | "(" expression ")" "." IDENTIFIER;
|
||||
|
||||
// Utility rules to avoid repetition
|
||||
function → IDENTIFIER ("(" parameters? ")")? blockStmt;
|
||||
lambda → ("(" parameters? ")")? blockStmt;
|
||||
// ident: type [, ident2: type2, ...]
|
||||
parameters → IDENTIFIER ":" IDENTIFIER ("," IDENTIFIER)*;
|
||||
arguments → expression ("," expression)*;
|
||||
except → ("except" expression? statement)
|
||||
|
||||
|
||||
// These are all the terminals (i.e. productions defined non-recursively)
|
||||
COMMENT → "#" UNICODE* LF;
|
||||
COLON → COLON;
|
||||
SINGLESTRING → QUOTE UNICODE* QUOTE;
|
||||
DOUBLESTRING → DOUBLEQUOTE UNICODE* DOUBLEQUOTE;
|
||||
SINGLEMULTI → QUOTE{3} UNICODE* QUOTE{3}; // Single quoted multi-line strings
|
||||
DOUBLEMULTI → DOUBLEQUOTE{3} UNICODE* DOUBLEQUOTE{3}; // Double quoted multi-line string
|
||||
DECIMAL → DIGIT+;
|
||||
FLOAT → DIGIT+ ("." DIGIT+)? (("e" | "E") DIGIT+)?;
|
||||
BIN → "0b" ("0" | "1")+;
|
||||
OCT → "0o" ("0" ... "7")+;
|
||||
HEX → "0x" ("0" ... "9" | "A" ... "F" | "a" ... "f")+;
|
||||
NUMBER → DECIMAL | FLOAT | BIN | HEX | OCT;
|
||||
STRING → ("r"|"b"|"f")? SINGLESTRING | DOUBLESTRING | SINGLEMULTI | DOUBLEMULTI;
|
||||
IDENTIFIER → ALPHA (ALPHA | DIGIT)*; // Valid identifiers are only alphanumeric!
|
||||
QUOTE → "'";
|
||||
DOUBLEQUOTE → "\"";
|
||||
IDENTCHARS → "a" ... "z" | "A" ... "Z" | "_";
|
||||
UNICODE → 0x00 ... 0x10FFFD; // This covers the whole unicode range
|
||||
DIGIT → "0" ... "9";
|
||||
ASSIGNTOKENS → "+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "**=" | "//=" | "="
|
||||
```
|
|
@ -0,0 +1,301 @@
|
|||
# Peon - Manual
|
||||
|
||||
Peon is a statically typed, garbage-collected, C-like programming language with
|
||||
a focus on speed and correctness, but whose main feature is the ability to natively
|
||||
perform highly efficient parallel I/O operations by implementing the [structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/)
|
||||
paradigm.
|
||||
|
||||
__Note__: Peon is currently a WIP (Work In Progress), and much of the content of this manual is purely theoretical as
|
||||
of now. If you want to help make this into a reality, feel free to contribute!
|
||||
|
||||
|
||||
## Table of contents
|
||||
|
||||
- [Manual](#peon---manual)
|
||||
- [Design Goals](#design-goals)
|
||||
- [Examples](#peon-by-example)
|
||||
- [Grammar](grammar.md)
|
||||
- [Bytecode](bytecode.md)
|
||||
|
||||
## Design Goals
|
||||
|
||||
While peon is inspired from Bob Nystrom's [book](https://craftinginterpreters.com), where he describes a simple toy language
|
||||
named Lox, the aspiration for it is to become a programming language that could actually be used in the real world. For that
|
||||
to happen, we need:
|
||||
|
||||
- Exceptions (`try/except/finally`)
|
||||
- An import system (with namespaces, like Python)
|
||||
- Multithreading support (with a global VM lock when GC'ing)
|
||||
- Built-in collections (list, tuple, set, etc.)
|
||||
- Coroutines (w/ structured concurrency)
|
||||
- Generators
|
||||
- Generics
|
||||
- C/Nim FFI
|
||||
- A C backend (for native speed)
|
||||
- A package manager
|
||||
|
||||
Peon ~~steals~~ borrows many ideas from Python, Nim (the the language peon itself is written in), C and many others.
|
||||
|
||||
## Peon by Example
|
||||
|
||||
Here follow a few examples of peon code to make it clear what the end product should look like. Note that
|
||||
not all examples represent working functionality and some of these examples might not be up to date either.
|
||||
For somewhat more updated code snippets, check the [tests](../tests/) directory.
|
||||
|
||||
### Variable declarations
|
||||
|
||||
```
|
||||
var x = 5; # Inferred type is int64
|
||||
var y = 3'u16; # Type is specified as uint16
|
||||
x = 6; # Works: type matches
|
||||
x = 3.0; # Error: Cannot assign float64 to x
|
||||
var x = 3.14; # Error: cannot re-declare x
|
||||
const z = 6.28; # Constant declaration
|
||||
let a = "hi!"; # Cannot be reassigned/mutated
|
||||
var b: int32 = 5; # Explicit type declaration (TODO)
|
||||
```
|
||||
|
||||
__Note__: Peon supports [name stropping](https://en.wikipedia.org/wiki/Stropping_(syntax)), meaning
|
||||
that almost any ASCII sequence of characters can be used as an identifier, including language
|
||||
keywords, but stropped names need to be enclosed by matching pairs of backticks (`\``)
|
||||
|
||||
### Comments
|
||||
|
||||
```
|
||||
# This is a single-line comment
|
||||
# Peon has no specific syntax for multi-line comments.
|
||||
|
||||
fn id[T: any](x: T): T {
|
||||
## Documentation comments start
|
||||
## with two dashes. They are currently
|
||||
## unused, but will be semantically
|
||||
## relevant in the future. They can
|
||||
## be used to document types, modules
|
||||
## and functions
|
||||
return x;
|
||||
}
|
||||
```
|
||||
|
||||
### Functions
|
||||
|
||||
```
|
||||
fn fib(n: int): int {
|
||||
if n < 3 {
|
||||
return n;
|
||||
}
|
||||
return fib(n - 1) + fib(n - 2);
|
||||
}
|
||||
|
||||
fib(30);
|
||||
```
|
||||
|
||||
### Type declarations (TODO)
|
||||
|
||||
```
|
||||
type Foo = object { # Can also be "ref object" for reference types (managed automatically)
|
||||
fieldOne*: int # Asterisk means the field is public outside the current module
|
||||
fieldTwo*: int
|
||||
}
|
||||
```
|
||||
|
||||
### Enumeration types (TODO)
|
||||
|
||||
```
|
||||
type SomeEnum = enum { # Can be mapped to an integer
|
||||
KindOne,
|
||||
KindTwo
|
||||
}
|
||||
```
|
||||
|
||||
### Operator overloading
|
||||
|
||||
```
|
||||
operator `+`(a, b: Foo): Foo {
|
||||
return Foo(fieldOne: a.fieldOne + b.fieldOne, fieldTwo: a.fieldTwo + b.fieldTwo);
|
||||
}
|
||||
|
||||
Foo(fieldOne: 1, fieldTwo: 3) + Foo(fieldOne: 2, fieldTwo: 3); # Foo(fieldOne: 3, fieldTwo: 6)
|
||||
```
|
||||
|
||||
__Note__: Custom operators (e.g. `foo`) can also be defined. The backticks around the plus sign serve to mark it
|
||||
as an identifier instead of a symbol (which is a requirement for function names, since operators are basically
|
||||
functions in peon). In fact, even the built-in peon operators are implemented partially in peon (actually, just
|
||||
their stubs are) and they are then specialized in the compiler to get rid of unnecessary function call overhead.
|
||||
|
||||
### Function calls
|
||||
|
||||
```
|
||||
foo(1, 2 + 3, 3.14, bar(baz));
|
||||
```
|
||||
|
||||
__Note__: Operators can be called as functions; If their name is a symbol, just wrap it in backticks like so:
|
||||
```
|
||||
`+`(1, 2) # Identical to 1 + 2
|
||||
```
|
||||
|
||||
__Note__: Code the likes of `a.b()` is (actually, will be) desugared to `b(a)` if there exists a function
|
||||
`b` whose signature is compatible with the value of `a` (assuming `a` doesn't have a field named `b`,
|
||||
in which case the attribute resolution takes precedence)
|
||||
|
||||
|
||||
### Generics
|
||||
|
||||
```
|
||||
fn genericSum[T: Number](a, b: T): T { # Note: "a, b: T" means that both a and b are of type T
|
||||
return a + b;
|
||||
}
|
||||
|
||||
# This allows for a single implementation to be
|
||||
# re-used multiple times without any code duplication
|
||||
genericSum(1, 2);
|
||||
genericSum(3.14, 0.1);
|
||||
genericSum(1'u8, 250'u8);
|
||||
```
|
||||
|
||||
__Note__: Peon generics are implemented according to a paradigm called [parametric polymorphism](https://en.wikipedia.org/wiki/Parametric_polymorphism). In constrast to the model employed by other languages such as C++, called [ad hoc polymorphism](https://en.wikipedia.org/wiki/Ad_hoc_polymorphism),
|
||||
where each time a generic function is called with a new type signature it is instantiated and
|
||||
typechecked (and then compiled), peon checks generics at declaration time and only once: this
|
||||
not only saves precious compilation time, but it also allows the compiler to generate a single
|
||||
implementation for the function (although this is not a requirement) and catches type errors right
|
||||
when they occur even if the function is never called, rather than having to wait for the function
|
||||
to be called and specialized. Unfortunately, this means that some of the things that are possible
|
||||
in, say, C++ templates are just not possible with peon generics. As an example, take this code snippet:
|
||||
|
||||
```
|
||||
fn add[T: any](a, b: T): T {
|
||||
return a + b;
|
||||
}
|
||||
```
|
||||
|
||||
While the intent of this code is clear and makes sense semantically speaking, peon will refuse
|
||||
to compile it because it cannot prove that the `+` operator is defined on every type (in fact,
|
||||
it's only defined for numbers): this is a feature. If peon allowed it, `any` could be used to
|
||||
escape the safety of the type system (for example, calling `add` with `string`s, which may or
|
||||
may not be what you want).
|
||||
|
||||
Since the goal for peon is to not constrain the developer into one specific programming paradigm,
|
||||
it also implements a secondary, different, generic mechanism using the `auto` type. The above code
|
||||
could be rewritten to work as follows:
|
||||
|
||||
```
|
||||
fn add(a, b: auto): auto {
|
||||
return a + b;
|
||||
}
|
||||
```
|
||||
|
||||
When using automatic types, peon will behave similarly to C++ (think: templates) and only specialize,
|
||||
typecheck and compile the function once it is called with a given type signature. For this reason,
|
||||
automatic and parametrically polymorphic types cannot be used together in peon code.
|
||||
|
||||
Another noteworthy concept to keep in mind is that of type unions. For example, take this snippet:
|
||||
|
||||
```
|
||||
fn foo(x: int32): int32 {
|
||||
return x;
|
||||
}
|
||||
|
||||
|
||||
fn foo(x: int): int {
|
||||
return x;
|
||||
}
|
||||
|
||||
|
||||
fn identity[T: int | int32](x: T): T {
|
||||
return foo(x);
|
||||
}
|
||||
```
|
||||
|
||||
This code will, again, fail to compile: this is because as far as peon is concerned, `foo` is not
|
||||
defined for both `int` and `int32` _at the same time_. In order for that to work, `foo` would need
|
||||
to be rewritten with `T: int32 | int` as its generic argument type in order to avoid the ambiguity
|
||||
(or `identity` could be rewritten to use automatic types instead, both are viable options). Obviously,
|
||||
the above snippet would fail to compile if `foo` were not defined for all the types specified in the
|
||||
type constraint for `identity` as well (this is because, counterintuitively, matching a generic constraint
|
||||
such as `int32 | int` does _not_ mean "either of these types", but rather "_both_ of these types at
|
||||
once").
|
||||
|
||||
|
||||
#### More generics
|
||||
|
||||
```
|
||||
fn genericSth[T: someTyp, K: someTyp2](a: T, b: K) { # Note: no return type == void function
|
||||
# code...
|
||||
}
|
||||
|
||||
genericSth(1, 3.0);
|
||||
```
|
||||
|
||||
|
||||
#### Even more generics
|
||||
|
||||
```
|
||||
type Box*[T: Number] = object {
|
||||
num: T;
|
||||
}
|
||||
|
||||
var boxFloat = Box[float](1.0);
|
||||
var boxInt = Box[int](1);
|
||||
```
|
||||
|
||||
__Note__: The `*` modifier to make a name visible outside the current module must be put
|
||||
__before__ the generic constraints, so only `fn foo*[T](a: T) {}` is the correct syntax.
|
||||
|
||||
|
||||
|
||||
### Forward declarations
|
||||
|
||||
```
|
||||
fn someF: int; # Semicolon and no body == forward declaration
|
||||
|
||||
print(someF()); # Prints 42
|
||||
|
||||
fn someF: int {
|
||||
return 42;
|
||||
}
|
||||
```
|
||||
|
||||
__Note__: A function that is forward-declared __must__ be implemented in the same module as
|
||||
the forward declaration.
|
||||
|
||||
### Generators
|
||||
|
||||
```
|
||||
generator count(n: int): int {
|
||||
while n > 0 {
|
||||
yield n;
|
||||
n -= 1;
|
||||
}
|
||||
}
|
||||
|
||||
foreach n in count(10) {
|
||||
print(n);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### Coroutines
|
||||
|
||||
```
|
||||
import concur;
|
||||
import http;
|
||||
|
||||
|
||||
coroutine req(url: string): string {
|
||||
return (await http.AsyncClient().get(url)).content;
|
||||
}
|
||||
|
||||
|
||||
coroutine main(urls: list[string]) {
|
||||
pool = concur.pool(); # Creates a task pool: like a nursery in njsmith's article
|
||||
foreach url in urls {
|
||||
pool.spawn(req, urls);
|
||||
}
|
||||
# The pool has internal machinery that makes the parent
|
||||
# task wait until all child exit! When this function
|
||||
# returns, ALL child tasks will have exited somehow.
|
||||
# Exceptions and return values propagate neatly, too.
|
||||
}
|
||||
|
||||
|
||||
concur.run(main, newList[string]("https://google.com", "https://debian.org"))
|
||||
```
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,81 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import std/strformat
|
||||
import std/os
|
||||
|
||||
|
||||
# These variables can be tweaked to debug and test various components of the toolchain
|
||||
var debugLexer* = false # Print the tokenizer's output
|
||||
var debugParser* = false # Print the AST generated by the parser
|
||||
var debugCompiler* = false # Disassemble and/or print the code generated by the compiler
|
||||
const debugVM* {.booldefine.} = false # Enable the runtime debugger in the bytecode VM
|
||||
const debugGC* {.booldefine.} = false # Debug the Garbage Collector (extremely verbose)
|
||||
const debugAlloc* {.booldefine.} = false # Trace object allocation (extremely verbose)
|
||||
const debugMem* {.booldefine.} = false # Debug the memory allocator (extremely verbose)
|
||||
var debugSerializer* = false # Validate the bytecode serializer's output
|
||||
const debugStressGC* {.booldefine.} = false # Make the GC run a collection at every allocation (VERY SLOW!)
|
||||
const debugMarkGC* {.booldefine.} = false # Trace the marking phase object by object (extremely verbose)
|
||||
const PeonBytecodeMarker* = "PEON_BYTECODE" # Magic value at the beginning of bytecode files
|
||||
const HeapGrowFactor* = 2 # The growth factor used by the GC to schedule the next collection
|
||||
const FirstGC* = 1024 * 1024; # How many bytes to allocate before running the first GC
|
||||
const enableVMChecks* {.booldefine.} = true; # Enables all types of compiler (nim-wise) checks in the VM
|
||||
# List of paths where peon looks for modules, in order (empty path means current directory, which always takes precedence)
|
||||
const moduleLookupPaths*: seq[string] = @["", "src/peon/stdlib", absolutePath(joinPath(".local", "peon", "stdlib"), getenv("HOME"))]
|
||||
when HeapGrowFactor <= 1:
|
||||
{.fatal: "Heap growth factor must be > 1".}
|
||||
const PeonVersion* = (major: 0, minor: 1, patch: 0)
|
||||
const PeonRelease* = "alpha"
|
||||
const PeonCommitHash* = staticExec("git rev-parse HEAD")
|
||||
const PeonBranch* = staticExec("git symbolic-ref HEAD 2>/dev/null | cut -f 3 -d /")
|
||||
const PeonVersionString* = &"Peon {PeonVersion.major}.{PeonVersion.minor}.{PeonVersion.patch} {PeonRelease} ({PeonBranch}, {CompileDate}, {CompileTime}, {PeonCommitHash[0..PeonCommitHash.high() mod 8]}) [Nim {NimVersion}] on {hostOS} ({hostCPU})"
|
||||
const HelpMessage* = """The peon programming language, Copyright (C) 2023 Mattia Giambirtone & All Contributors
|
||||
|
||||
This program is free software, see the license distributed with this program or check
|
||||
http://www.apache.org/licenses/LICENSE-2.0 for more info.
|
||||
|
||||
Basic Usage
|
||||
-----------
|
||||
|
||||
$ peon file.pn Run the given Peon source file
|
||||
$ peon file.pbc Run the given Peon bytecode file
|
||||
|
||||
|
||||
Options
|
||||
-------
|
||||
|
||||
-h, --help Show this help text and exit
|
||||
-v, --version Print the current peon version and exit
|
||||
-s, --string Execute the passed string as if it was a file
|
||||
-n, --noDump Don't dump the result of compilation to a file.
|
||||
Note that no dump is created when using -s/--string
|
||||
-b, --breakpoints Run the debugger at specific bytecode offsets (comma-separated).
|
||||
Only available with --target:bytecode and when compiled with VM
|
||||
debugging on (-d:debugVM at build time)
|
||||
-d, --disassemble Disassemble the output of compilation (only makes sense with --target:bytecode)
|
||||
-m, --mode Set the compilation mode. Acceptable values are 'debug' and
|
||||
'release'. Defaults to 'debug'
|
||||
-c, --compile Compile the code, but do not execute it. Useful along with -d
|
||||
-w, --warnings Turn warnings on or off (default: on). Acceptable values are
|
||||
yes/on and no/off
|
||||
--noWarn Disable a specific warning (for example, --noWarn:unusedVariable)
|
||||
--showMismatches Show all mismatches when function dispatching fails (output is really verbose)
|
||||
--target Select the compilation target (valid values are: 'c' and 'bytecode'). Defaults to
|
||||
'bytecode'
|
||||
-o, --output Rename the output file with this value (with --target:bytecode, a '.pbc' extension
|
||||
is added if not already present)
|
||||
--debug-dump Debug the bytecode serializer. Only makes sense with --target:bytecode
|
||||
--debug-lexer Show the lexer's output
|
||||
--debug-parser Show the parser's output
|
||||
"""
|
|
@ -0,0 +1,21 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
type
|
||||
PeonException* = ref object of CatchableError
|
||||
## A Nim exception for a generic internal
|
||||
## peon failure (not to be used directly)
|
||||
file*: string # The file where the error occurred
|
||||
line*: int # The line where the error occurred
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,71 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## The code generator for translating peon to C code
|
||||
import std/tables
|
||||
import std/strformat
|
||||
import std/algorithm
|
||||
import std/parseutils
|
||||
import std/strutils
|
||||
import std/sequtils
|
||||
import std/sets
|
||||
import std/os
|
||||
|
||||
|
||||
import frontend/compiler/compiler
|
||||
import frontend/parsing/lexer
|
||||
import frontend/parsing/parser
|
||||
import frontend/parsing/ast
|
||||
|
||||
|
||||
type
|
||||
CompilerFunc = object
|
||||
## An internal compiler function called
|
||||
## by pragmas
|
||||
kind: PragmaKind
|
||||
handler: proc (self: NativeCCompiler, pragma: Pragma, name: Name)
|
||||
|
||||
NativeCCompiler* = ref object of Compiler
|
||||
## The peon to C compiler
|
||||
|
||||
# Compiler procedures called by pragmas
|
||||
compilerProcs: TableRef[string, CompilerFunc]
|
||||
|
||||
|
||||
proc newNativeCCompiler*(replMode: bool = false): NativeCCompiler =
|
||||
## Initializes a new, blank, NativeCCompiler
|
||||
## object
|
||||
new(result)
|
||||
result.ast = @[]
|
||||
result.current = 0
|
||||
result.file = ""
|
||||
result.names = @[]
|
||||
result.depth = 0
|
||||
result.lines = @[]
|
||||
result.currentFunction = nil
|
||||
result.replMode = replMode
|
||||
result.currentModule = nil
|
||||
result.compilerProcs = newTable[string, CompilerFunc]()
|
||||
result.source = ""
|
||||
result.lexer = newLexer()
|
||||
result.lexer.fillSymbolTable()
|
||||
result.parser = newParser()
|
||||
result.isMainModule = false
|
||||
result.disabledWarnings = @[]
|
||||
|
||||
|
||||
method literal*(self: Compiler, node: ASTNode, compile: bool = true): Type {.discardable.} =
|
||||
## Compiles literal expressions
|
||||
|
||||
|
|
@ -0,0 +1,849 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## An Abstract Syntax Tree (AST) structure for our recursive-descent
|
||||
## top-down parser. For more info, check out docs/grammar.md
|
||||
|
||||
|
||||
import std/strformat
|
||||
import std/strutils
|
||||
|
||||
|
||||
import token
|
||||
export token
|
||||
|
||||
|
||||
type
|
||||
NodeKind* = enum
|
||||
## Enumeration of the AST
|
||||
## node types, sorted by
|
||||
## precedence
|
||||
|
||||
# Declarations
|
||||
typeDecl = 0'u8
|
||||
funDecl,
|
||||
varDecl,
|
||||
# Statements
|
||||
ifStmt,
|
||||
returnStmt,
|
||||
breakStmt,
|
||||
continueStmt,
|
||||
whileStmt,
|
||||
forEachStmt,
|
||||
blockStmt,
|
||||
namedBlockStmt,
|
||||
raiseStmt,
|
||||
assertStmt,
|
||||
tryStmt,
|
||||
yieldStmt,
|
||||
awaitStmt,
|
||||
importStmt,
|
||||
exportStmt,
|
||||
deferStmt,
|
||||
# An expression followed by a semicolon
|
||||
exprStmt,
|
||||
# Expressions
|
||||
assignExpr,
|
||||
lambdaExpr,
|
||||
awaitExpr,
|
||||
yieldExpr,
|
||||
setItemExpr, # Set expressions like a.b = "c"
|
||||
binaryExpr,
|
||||
unaryExpr,
|
||||
sliceExpr,
|
||||
callExpr,
|
||||
getItemExpr, # Get expressions like a.b
|
||||
# Primary expressions
|
||||
groupingExpr, # Parenthesized expressions such as (true) and (3 + 4)
|
||||
trueExpr,
|
||||
falseExpr,
|
||||
strExpr,
|
||||
charExpr,
|
||||
intExpr,
|
||||
floatExpr,
|
||||
hexExpr,
|
||||
octExpr,
|
||||
binExpr,
|
||||
identExpr, # Identifier
|
||||
pragmaExpr,
|
||||
refExpr,
|
||||
ptrExpr,
|
||||
genericExpr,
|
||||
switchStmt,
|
||||
lentExpr
|
||||
|
||||
# Here I would've rather used object variants, and in fact that's what was in
|
||||
# place before, but not being able to re-declare a field of the same type in
|
||||
# another case branch is kind of a deal breaker long-term, so until that is
|
||||
# fixed (check out https://github.com/nim-lang/RFCs/issues/368 for more info).
|
||||
# I'll stick to using inheritance instead
|
||||
|
||||
|
||||
# Generic AST node types
|
||||
ASTNode* = ref object of RootObj
|
||||
## An AST node
|
||||
kind*: NodeKind
|
||||
# Regardless of the type of node, we keep the token in the AST node for internal usage.
|
||||
# This is not shown when the node is printed, but makes it a heck of a lot easier to report
|
||||
# errors accurately even deep in the compilation pipeline
|
||||
token*: Token
|
||||
file*: string
|
||||
# This weird inheritance chain is needed for the parser to
|
||||
# work properly
|
||||
Declaration* = ref object of ASTNode
|
||||
## A declaration
|
||||
isPrivate*: bool
|
||||
pragmas*: seq[Pragma]
|
||||
generics*: seq[tuple[name: IdentExpr, cond: Expression]]
|
||||
|
||||
Statement* = ref object of Declaration
|
||||
## A statement
|
||||
Expression* = ref object of Statement
|
||||
## An expression
|
||||
LiteralExpr* = ref object of Expression
|
||||
# Using a string for literals makes it much easier to handle numeric types, as
|
||||
# there is no overflow nor underflow or float precision issues during parsing.
|
||||
# Numbers are just serialized as strings and then converted back to numbers
|
||||
# before being passed to the VM, which also keeps the door open in the future
|
||||
# to implementing bignum arithmetic that can take advantage of natively supported
|
||||
# machine types, meaning that if a numeric type fits into a 64 bit signed/unsigned
|
||||
# int then it is stored in such a type to save space, otherwise it is just converted
|
||||
# to a bigint. Bigfloats with arbitrary-precision arithmetic would also be nice,
|
||||
# although arguably less useful (and probably significantly slower than bigints)
|
||||
literal*: Token
|
||||
|
||||
IntExpr* = ref object of LiteralExpr
|
||||
OctExpr* = ref object of LiteralExpr
|
||||
HexExpr* = ref object of LiteralExpr
|
||||
BinExpr* = ref object of LiteralExpr
|
||||
FloatExpr* = ref object of LiteralExpr
|
||||
StrExpr* = ref object of LiteralExpr
|
||||
CharExpr* = ref object of LiteralExpr
|
||||
|
||||
TrueExpr* = ref object of LiteralExpr
|
||||
FalseExpr* = ref object of LiteralExpr
|
||||
|
||||
IdentExpr* = ref object of Expression
|
||||
name*: Token
|
||||
depth*: int
|
||||
|
||||
GroupingExpr* = ref object of Expression
|
||||
expression*: Expression
|
||||
|
||||
GetItemExpr* = ref object of Expression
|
||||
obj*: Expression
|
||||
name*: IdentExpr
|
||||
|
||||
SetItemExpr* = ref object of GetItemExpr
|
||||
# Since a setItem expression is just
|
||||
# a getItem one followed by an assignment,
|
||||
# inheriting it from getItem makes sense
|
||||
value*: Expression
|
||||
|
||||
CallExpr* = ref object of Expression
|
||||
callee*: Expression # The object being called
|
||||
arguments*: tuple[positionals: seq[Expression], keyword: seq[tuple[
|
||||
name: IdentExpr, value: Expression]]]
|
||||
closeParen*: Token # Needed for error reporting
|
||||
|
||||
GenericExpr* = ref object of Expression
|
||||
ident*: IdentExpr
|
||||
args*: seq[Expression]
|
||||
|
||||
|
||||
UnaryExpr* = ref object of Expression
|
||||
operator*: Token
|
||||
a*: Expression
|
||||
|
||||
BinaryExpr* = ref object of UnaryExpr
|
||||
# Binary expressions can be seen here as unary
|
||||
# expressions with an extra operand so we just
|
||||
# inherit from that and add a second operand
|
||||
b*: Expression
|
||||
|
||||
YieldExpr* = ref object of Expression
|
||||
expression*: Expression
|
||||
|
||||
AwaitExpr* = ref object of Expression
|
||||
expression*: Expression
|
||||
|
||||
LambdaExpr* = ref object of Expression
|
||||
body*: Statement
|
||||
arguments*: seq[tuple[name: IdentExpr, valueType: Expression]]
|
||||
defaults*: seq[Expression]
|
||||
isGenerator*: bool
|
||||
isAsync*: bool
|
||||
isPure*: bool
|
||||
returnType*: Expression
|
||||
depth*: int
|
||||
|
||||
SliceExpr* = ref object of Expression
|
||||
expression*: Expression
|
||||
ends*: seq[Expression]
|
||||
|
||||
AssignExpr* = ref object of Expression
|
||||
name*: IdentExpr
|
||||
value*: Expression
|
||||
|
||||
ExprStmt* = ref object of Statement
|
||||
expression*: Expression
|
||||
|
||||
ImportStmt* = ref object of Statement
|
||||
moduleName*: IdentExpr
|
||||
|
||||
ExportStmt* = ref object of Statement
|
||||
name*: IdentExpr
|
||||
|
||||
AssertStmt* = ref object of Statement
|
||||
expression*: Expression
|
||||
|
||||
RaiseStmt* = ref object of Statement
|
||||
exception*: Expression
|
||||
|
||||
BlockStmt* = ref object of Statement
|
||||
code*: seq[Declaration]
|
||||
|
||||
NamedBlockStmt* = ref object of BlockStmt
|
||||
name*: IdentExpr
|
||||
|
||||
ForStmt* = ref object of Statement
|
||||
discard # Unused
|
||||
|
||||
ForEachStmt* = ref object of Statement
|
||||
identifier*: IdentExpr
|
||||
expression*: Expression
|
||||
body*: Statement
|
||||
|
||||
WhileStmt* = ref object of Statement
|
||||
condition*: Expression
|
||||
body*: BlockStmt
|
||||
|
||||
AwaitStmt* = ref object of Statement
|
||||
expression*: Expression
|
||||
|
||||
BreakStmt* = ref object of Statement
|
||||
label*: IdentExpr
|
||||
|
||||
ContinueStmt* = ref object of Statement
|
||||
label*: IdentExpr
|
||||
|
||||
ReturnStmt* = ref object of Statement
|
||||
value*: Expression
|
||||
|
||||
IfStmt* = ref object of Statement
|
||||
condition*: Expression
|
||||
thenBranch*: Statement
|
||||
elseBranch*: Statement
|
||||
|
||||
YieldStmt* = ref object of Statement
|
||||
expression*: Expression
|
||||
|
||||
VarDecl* = ref object of Declaration
|
||||
name*: IdentExpr
|
||||
value*: Expression
|
||||
isConst*: bool
|
||||
isLet*: bool
|
||||
valueType*: Expression
|
||||
|
||||
FunDecl* = ref object of Declaration
|
||||
name*: IdentExpr
|
||||
body*: Statement
|
||||
arguments*: seq[tuple[name: IdentExpr, valueType: Expression]]
|
||||
defaults*: seq[Expression]
|
||||
isAsync*: bool
|
||||
isGenerator*: bool
|
||||
isPure*: bool
|
||||
returnType*: Expression
|
||||
depth*: int
|
||||
|
||||
TypeDecl* = ref object of Declaration
|
||||
name*: IdentExpr
|
||||
# Empty if type is an enum
|
||||
fields*: seq[tuple[name: IdentExpr, valueType: Expression, isPrivate: bool]]
|
||||
# Empty if type is a structure
|
||||
members*: seq[TypeDecl]
|
||||
isEnum*: bool
|
||||
isRef*: bool
|
||||
parent*: Expression
|
||||
value*: Expression
|
||||
|
||||
Pragma* = ref object of Expression
|
||||
name*: IdentExpr
|
||||
args*: seq[LiteralExpr]
|
||||
|
||||
Var* = ref object of Expression
|
||||
value*: Expression
|
||||
|
||||
Ref* = ref object of Expression
|
||||
value*: Expression
|
||||
|
||||
Ptr* = ref object of Expression
|
||||
value*: Expression
|
||||
|
||||
Lent* = ref object of Expression
|
||||
value*: Expression
|
||||
|
||||
SwitchStmt* = ref object of Statement
|
||||
switch*: Expression
|
||||
branches*: seq[tuple[cond: Expression, body: BlockStmt]]
|
||||
default*: BlockStmt
|
||||
|
||||
|
||||
proc isConst*(self: ASTNode): bool =
|
||||
## Returns true if the given
|
||||
## AST node represents a value
|
||||
## of constant type. All integers,
|
||||
## strings and singletons count as
|
||||
## constants
|
||||
case self.kind:
|
||||
of intExpr, hexExpr, binExpr, octExpr, strExpr, falseExpr, trueExpr,
|
||||
floatExpr:
|
||||
return true
|
||||
else:
|
||||
return false
|
||||
|
||||
|
||||
## AST node constructors
|
||||
proc newASTNode*(kind: NodeKind, token: Token): ASTNode =
|
||||
## Initializes a new generic ASTNode object
|
||||
new(result)
|
||||
result.kind = kind
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newPragma*(name: IdentExpr, args: seq[LiteralExpr]): Pragma =
|
||||
new(result)
|
||||
result.kind = pragmaExpr
|
||||
result.args = args
|
||||
result.name = name
|
||||
result.token = name.token
|
||||
|
||||
|
||||
proc newRefExpr*(expression: Expression, token: Token): Ref =
|
||||
new(result)
|
||||
result.kind = refExpr
|
||||
result.value = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newPtrExpr*(expression: Expression, token: Token): Ptr =
|
||||
new(result)
|
||||
result.kind = ptrExpr
|
||||
result.value = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newLentExpr*(expression: Expression, token: Token): Lent =
|
||||
new(result)
|
||||
result.kind = lentExpr
|
||||
result.value = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newSwitchStmt*(switch: Expression, branches: seq[tuple[cond: Expression, body: BlockStmt]], default: BlockStmt, token: Token): SwitchStmt =
|
||||
new(result)
|
||||
result.kind = switchStmt
|
||||
result.switch = switch
|
||||
result.branches = branches
|
||||
result.token = token
|
||||
result.default = default
|
||||
|
||||
|
||||
proc newIntExpr*(literal: Token): IntExpr =
|
||||
result = IntExpr(kind: intExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newOctExpr*(literal: Token): OctExpr =
|
||||
result = OctExpr(kind: octExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newHexExpr*(literal: Token): HexExpr =
|
||||
result = HexExpr(kind: hexExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newBinExpr*(literal: Token): BinExpr =
|
||||
result = BinExpr(kind: binExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newFloatExpr*(literal: Token): FloatExpr =
|
||||
result = FloatExpr(kind: floatExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newTrueExpr*(token: Token): LiteralExpr = LiteralExpr(kind: trueExpr,
|
||||
token: token, literal: token)
|
||||
proc newFalseExpr*(token: Token): LiteralExpr = LiteralExpr(kind: falseExpr,
|
||||
token: token, literal: token)
|
||||
|
||||
|
||||
proc newStrExpr*(literal: Token): StrExpr =
|
||||
result = StrExpr(kind: strExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newCharExpr*(literal: Token): CharExpr =
|
||||
result = CharExpr(kind: charExpr)
|
||||
result.literal = literal
|
||||
result.token = literal
|
||||
|
||||
|
||||
proc newIdentExpr*(name: Token, depth: int = 0): IdentExpr =
|
||||
result = IdentExpr(kind: identExpr)
|
||||
result.name = name
|
||||
result.token = name
|
||||
result.depth = depth
|
||||
|
||||
|
||||
proc newGroupingExpr*(expression: Expression, token: Token): GroupingExpr =
|
||||
result = GroupingExpr(kind: groupingExpr)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newLambdaExpr*(arguments: seq[tuple[name: IdentExpr, valueType: Expression]], defaults: seq[Expression],
|
||||
body: Statement, isAsync, isGenerator: bool,
|
||||
token: Token, depth: int, pragmas: seq[Pragma] = @[],
|
||||
returnType: Expression, generics: seq[tuple[name: IdentExpr, cond: Expression]] = @[]): LambdaExpr =
|
||||
result = LambdaExpr(kind: lambdaExpr)
|
||||
result.body = body
|
||||
result.arguments = arguments
|
||||
result.defaults = defaults
|
||||
result.isGenerator = isGenerator
|
||||
result.isAsync = isAsync
|
||||
result.token = token
|
||||
result.returnType = returnType
|
||||
result.isPure = false
|
||||
result.pragmas = pragmas
|
||||
result.generics = generics
|
||||
result.depth = depth
|
||||
|
||||
|
||||
proc newGetItemExpr*(obj: Expression, name: IdentExpr,
|
||||
token: Token): GetItemExpr =
|
||||
result = GetItemExpr(kind: getItemExpr)
|
||||
result.obj = obj
|
||||
result.name = name
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newSetItemExpr*(obj: Expression, name: IdentExpr, value: Expression,
|
||||
token: Token): SetItemExpr =
|
||||
result = SetItemExpr(kind: setItemExpr)
|
||||
result.obj = obj
|
||||
result.name = name
|
||||
result.value = value
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newCallExpr*(callee: Expression, arguments: tuple[positionals: seq[
|
||||
Expression], keyword: seq[tuple[name: IdentExpr, value: Expression]]],
|
||||
token: Token): CallExpr =
|
||||
result = CallExpr(kind: callExpr)
|
||||
result.callee = callee
|
||||
result.arguments = arguments
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newGenericExpr*(ident: IdentExpr, args: seq[Expression]): GenericExpr =
|
||||
result = GenericExpr(kind: genericExpr)
|
||||
result.ident = ident
|
||||
result.args = args
|
||||
result.token = ident.token
|
||||
|
||||
|
||||
proc newSliceExpr*(expression: Expression, ends: seq[Expression], token: Token): SliceExpr =
|
||||
result = SliceExpr(kind: sliceExpr)
|
||||
result.expression = expression
|
||||
result.ends = ends
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newUnaryExpr*(operator: Token, a: Expression): UnaryExpr =
|
||||
result = UnaryExpr(kind: unaryExpr)
|
||||
result.operator = operator
|
||||
result.a = a
|
||||
result.token = result.operator
|
||||
|
||||
|
||||
proc newBinaryExpr*(a: Expression, operator: Token, b: Expression): BinaryExpr =
|
||||
result = BinaryExpr(kind: binaryExpr)
|
||||
result.operator = operator
|
||||
result.a = a
|
||||
result.b = b
|
||||
result.token = operator
|
||||
|
||||
|
||||
proc newYieldExpr*(expression: Expression, token: Token): YieldExpr =
|
||||
result = YieldExpr(kind: yieldExpr)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newAssignExpr*(name: IdentExpr, value: Expression,
|
||||
token: Token): AssignExpr =
|
||||
result = AssignExpr(kind: assignExpr)
|
||||
result.name = name
|
||||
result.value = value
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newAwaitExpr*(expression: Expression, token: Token): AwaitExpr =
|
||||
result = AwaitExpr(kind: awaitExpr)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newExprStmt*(expression: Expression, token: Token): ExprStmt =
|
||||
result = ExprStmt(kind: exprStmt)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newImportStmt*(moduleName: IdentExpr, token: Token): ImportStmt =
|
||||
result = ImportStmt(kind: importStmt)
|
||||
result.moduleName = moduleName
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newExportStmt*(name: IdentExpr, token: Token): ExportStmt =
|
||||
result = ExportStmt(kind: exportStmt)
|
||||
result.name = name
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newYieldStmt*(expression: Expression, token: Token): YieldStmt =
|
||||
result = YieldStmt(kind: yieldStmt)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newAwaitStmt*(expression: Expression, token: Token): AwaitStmt =
|
||||
result = AwaitStmt(kind: awaitStmt)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newAssertStmt*(expression: Expression, token: Token): AssertStmt =
|
||||
result = AssertStmt(kind: assertStmt)
|
||||
result.expression = expression
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newRaiseStmt*(exception: Expression, token: Token): RaiseStmt =
|
||||
result = RaiseStmt(kind: raiseStmt)
|
||||
result.exception = exception
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newBlockStmt*(code: seq[Declaration], token: Token): BlockStmt =
|
||||
result = BlockStmt(kind: blockStmt)
|
||||
result.code = code
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newNamedBlockStmt*(code: seq[Declaration], name: IdentExpr, token: Token): NamedBlockStmt =
|
||||
result = NamedBlockStmt(kind: namedBlockStmt)
|
||||
result.code = code
|
||||
result.token = token
|
||||
result.name = name
|
||||
|
||||
|
||||
proc newWhileStmt*(condition: Expression, body: BlockStmt,
|
||||
token: Token): WhileStmt =
|
||||
result = WhileStmt(kind: whileStmt)
|
||||
result.condition = condition
|
||||
result.body = body
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newForEachStmt*(identifier: IdentExpr, expression: Expression,
|
||||
body: Statement, token: Token): ForEachStmt =
|
||||
result = ForEachStmt(kind: forEachStmt)
|
||||
result.identifier = identifier
|
||||
result.expression = expression
|
||||
result.body = body
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newBreakStmt*(token: Token, label: IdentExpr = nil): BreakStmt =
|
||||
result = BreakStmt(kind: breakStmt)
|
||||
result.token = token
|
||||
result.label = label
|
||||
|
||||
|
||||
proc newContinueStmt*(token: Token, label: IdentExpr = nil): ContinueStmt =
|
||||
result = ContinueStmt(kind: continueStmt)
|
||||
result.token = token
|
||||
result.label = label
|
||||
|
||||
|
||||
proc newReturnStmt*(value: Expression, token: Token): ReturnStmt =
|
||||
result = ReturnStmt(kind: returnStmt)
|
||||
result.value = value
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newIfStmt*(condition: Expression, thenBranch, elseBranch: Statement,
|
||||
token: Token): IfStmt =
|
||||
result = IfStmt(kind: ifStmt)
|
||||
result.condition = condition
|
||||
result.thenBranch = thenBranch
|
||||
result.elseBranch = elseBranch
|
||||
result.token = token
|
||||
|
||||
|
||||
proc newVarDecl*(name: IdentExpr, value: Expression, isConst: bool = false,
|
||||
isPrivate: bool = true, token: Token, isLet: bool = false,
|
||||
valueType: Expression, pragmas: seq[Pragma]): VarDecl =
|
||||
result = VarDecl(kind: varDecl)
|
||||
result.name = name
|
||||
result.value = value
|
||||
result.isConst = isConst
|
||||
result.isPrivate = isPrivate
|
||||
result.token = token
|
||||
result.isLet = isLet
|
||||
result.valueType = valueType
|
||||
result.pragmas = pragmas
|
||||
|
||||
|
||||
proc newFunDecl*(name: IdentExpr, arguments: seq[tuple[name: IdentExpr, valueType: Expression]], defaults: seq[Expression],
|
||||
body: Statement, isAsync, isGenerator: bool,
|
||||
isPrivate: bool, token: Token, depth: int,
|
||||
pragmas: seq[Pragma] = @[], returnType: Expression,
|
||||
generics: seq[tuple[name: IdentExpr, cond: Expression]] = @[]): FunDecl =
|
||||
result = FunDecl(kind: funDecl)
|
||||
result.name = name
|
||||
result.arguments = arguments
|
||||
result.defaults = defaults
|
||||
result.body = body
|
||||
result.isAsync = isAsync
|
||||
result.isGenerator = isGenerator
|
||||
result.isPrivate = isPrivate
|
||||
result.token = token
|
||||
result.pragmas = pragmas
|
||||
result.returnType = returnType
|
||||
result.isPure = false
|
||||
result.generics = generics
|
||||
result.depth = depth
|
||||
|
||||
|
||||
proc newTypeDecl*(name: IdentExpr, fields: seq[tuple[name: IdentExpr, valueType: Expression, isPrivate: bool]],
|
||||
defaults: seq[Expression], isPrivate: bool, token: Token, pragmas: seq[Pragma],
|
||||
generics: seq[tuple[name: IdentExpr, cond: Expression]], parent: IdentExpr, isEnum: bool, isRef: bool): TypeDecl =
|
||||
result = TypeDecl(kind: typeDecl)
|
||||
result.name = name
|
||||
result.fields = fields
|
||||
result.isPrivate = isPrivate
|
||||
result.token = token
|
||||
result.pragmas = pragmas
|
||||
result.generics = generics
|
||||
result.parent = parent
|
||||
result.isEnum = isEnum
|
||||
result.isRef = isRef
|
||||
result.members = @[]
|
||||
|
||||
|
||||
proc `$`*(self: ASTNode): string =
|
||||
if self.isNil():
|
||||
return "nil"
|
||||
case self.kind:
|
||||
of intExpr, floatExpr, hexExpr, binExpr, octExpr, strExpr, trueExpr,
|
||||
falseExpr:
|
||||
if self.kind in {trueExpr, falseExpr}:
|
||||
result &= &"Literal({($self.kind)[0..^5]})"
|
||||
elif self.kind == strExpr:
|
||||
result &= &"Literal({LiteralExpr(self).literal.lexeme[1..^2].escape()})"
|
||||
else:
|
||||
result &= &"Literal({LiteralExpr(self).literal.lexeme})"
|
||||
of identExpr:
|
||||
result &= &"Identifier('{IdentExpr(self).name.lexeme}')"
|
||||
of groupingExpr:
|
||||
result &= &"Grouping({GroupingExpr(self).expression})"
|
||||
of getItemExpr:
|
||||
var self = GetItemExpr(self)
|
||||
result &= &"GetItem(obj={self.obj}, name={self.name})"
|
||||
of setItemExpr:
|
||||
var self = SetItemExpr(self)
|
||||
result &= &"SetItem(obj={self.obj}, name={self.value}, value={self.value})"
|
||||
of callExpr:
|
||||
var self = CallExpr(self)
|
||||
result &= &"""Call({self.callee}, arguments=(positionals=[{self.arguments.positionals.join(", ")}], keyword=[{self.arguments.keyword.join(", ")}]))"""
|
||||
of unaryExpr:
|
||||
var self = UnaryExpr(self)
|
||||
result &= &"Unary(Operator('{self.operator.lexeme}'), {self.a})"
|
||||
of binaryExpr:
|
||||
var self = BinaryExpr(self)
|
||||
result &= &"Binary({self.a}, Operator('{self.operator.lexeme}'), {self.b})"
|
||||
of assignExpr:
|
||||
var self = AssignExpr(self)
|
||||
result &= &"Assign(name={self.name}, value={self.value})"
|
||||
of exprStmt:
|
||||
var self = ExprStmt(self)
|
||||
result &= &"ExpressionStatement({self.expression})"
|
||||
of breakStmt:
|
||||
result = "Break()"
|
||||
of importStmt:
|
||||
var self = ImportStmt(self)
|
||||
result &= &"Import({self.moduleName})"
|
||||
of assertStmt:
|
||||
var self = AssertStmt(self)
|
||||
result &= &"Assert({self.expression})"
|
||||
of raiseStmt:
|
||||
var self = RaiseStmt(self)
|
||||
result &= &"Raise({self.exception})"
|
||||
of blockStmt:
|
||||
var self = BlockStmt(self)
|
||||
result &= &"""Block([{self.code.join(", ")}])"""
|
||||
of namedBlockStmt:
|
||||
var self = NamedBlockStmt(self)
|
||||
result &= &"""Block(name={self.name}, [{self.code.join(", ")}])"""
|
||||
of whileStmt:
|
||||
var self = WhileStmt(self)
|
||||
result &= &"While(condition={self.condition}, body={self.body})"
|
||||
of forEachStmt:
|
||||
var self = ForEachStmt(self)
|
||||
result &= &"ForEach(identifier={self.identifier}, expression={self.expression}, body={self.body})"
|
||||
of returnStmt:
|
||||
var self = ReturnStmt(self)
|
||||
result &= &"Return({self.value})"
|
||||
of yieldExpr:
|
||||
var self = YieldExpr(self)
|
||||
result &= &"Yield({self.expression})"
|
||||
of awaitExpr:
|
||||
var self = AwaitExpr(self)
|
||||
result &= &"Await({self.expression})"
|
||||
of ifStmt:
|
||||
var self = IfStmt(self)
|
||||
if self.elseBranch == nil:
|
||||
result &= &"If(condition={self.condition}, thenBranch={self.thenBranch}, elseBranch=nil)"
|
||||
else:
|
||||
result &= &"If(condition={self.condition}, thenBranch={self.thenBranch}, elseBranch={self.elseBranch})"
|
||||
of yieldStmt:
|
||||
var self = YieldStmt(self)
|
||||
result &= &"YieldStmt({self.expression})"
|
||||
of awaitStmt:
|
||||
var self = AwaitStmt(self)
|
||||
result &= &"AwaitStmt({self.expression})"
|
||||
of varDecl:
|
||||
var self = VarDecl(self)
|
||||
result &= &"Var(name={self.name}, value={self.value}, const={self.isConst}, private={self.isPrivate}, type={self.valueType}, pragmas={self.pragmas})"
|
||||
of funDecl:
|
||||
var self = FunDecl(self)
|
||||
result &= &"""FunDecl(name={self.name}, body={self.body}, type={self.returnType}, arguments=[{self.arguments.join(", ")}], defaults=[{self.defaults.join(", ")}], generics=[{self.generics.join(", ")}], async={self.isAsync}, generator={self.isGenerator}, private={self.isPrivate}, pragmas={self.pragmas})"""
|
||||
of typeDecl:
|
||||
var self = TypeDecl(self)
|
||||
result &= &"""TypeDecl(name={self.name}, fields={self.fields}, members={self.members}, private={self.isPrivate}, pragmas={self.pragmas}, generics={self.generics}, parent={self.parent}, ref={self.isRef}, enum={self.isEnum}, value={self.value})"""
|
||||
of lambdaExpr:
|
||||
var self = LambdaExpr(self)
|
||||
result &= &"""Lambda(body={self.body}, type={self.returnType}, arguments=[{self.arguments.join(", ")}], defaults=[{self.defaults.join(", ")}], generator={self.isGenerator}, async={self.isAsync}, pragmas={self.pragmas})"""
|
||||
of sliceExpr:
|
||||
var self = SliceExpr(self)
|
||||
result &= &"""Slice({self.expression}, ends=[{self.ends.join(", ")}])"""
|
||||
of pragmaExpr:
|
||||
var self = Pragma(self)
|
||||
result &= &"Pragma(name={self.name}, args={self.args})"
|
||||
of refExpr:
|
||||
result &= &"Ref({Ref(self).value})"
|
||||
of ptrExpr:
|
||||
result &= &"Ptr({Ptr(self).value})"
|
||||
of lentExpr:
|
||||
result &= &"Lent({Lent(self).value})"
|
||||
of genericExpr:
|
||||
var self = GenericExpr(self)
|
||||
result &= &"Generic(ident={self.ident}, args={self.args})"
|
||||
else:
|
||||
discard
|
||||
|
||||
|
||||
proc `==`*(self, other: IdentExpr): bool {.inline.} = self.token == other.token
|
||||
|
||||
|
||||
proc getRelativeBoundaries*(self: ASTNode): tuple[start, stop: int] =
|
||||
## Recursively computes the position of a node relative
|
||||
## to its containing line
|
||||
case self.kind:
|
||||
of varDecl:
|
||||
var self = VarDecl(self)
|
||||
let start = self.token.relPos.start
|
||||
var stop = self.name.token.relPos.stop
|
||||
if not self.value.isNil():
|
||||
stop = self.value.token.relPos.stop
|
||||
if self.pragmas.len() > 0:
|
||||
stop = getRelativeBoundaries(self.pragmas[^1]).stop
|
||||
result = (start, stop)
|
||||
of typeDecl:
|
||||
result = (self.token.relPos.start, TypeDecl(self).name.getRelativeBoundaries().stop)
|
||||
of breakStmt, returnStmt, continueStmt:
|
||||
result = self.token.relPos
|
||||
of importStmt:
|
||||
result = (self.token.relPos.start, getRelativeBoundaries(ImportStmt(self).moduleName).stop)
|
||||
of exprStmt:
|
||||
result = getRelativeBoundaries(ExprStmt(self).expression)
|
||||
of unaryExpr:
|
||||
var self = UnaryExpr(self)
|
||||
result = (self.operator.relPos.start, getRelativeBoundaries(self.a).stop)
|
||||
of binaryExpr:
|
||||
var self = BinaryExpr(self)
|
||||
result = (getRelativeBoundaries(self.a).start, getRelativeBoundaries(self.b).stop)
|
||||
of intExpr, binExpr, hexExpr, octExpr, strExpr, floatExpr:
|
||||
var self = LiteralExpr(self)
|
||||
result = self.literal.relPos
|
||||
of identExpr:
|
||||
var self = IdentExpr(self)
|
||||
result = self.token.relPos
|
||||
of assignExpr:
|
||||
var self = AssignExpr(self)
|
||||
result = (getRelativeBoundaries(self.name).start, getRelativeBoundaries(self.value).stop)
|
||||
of callExpr:
|
||||
var self = CallExpr(self)
|
||||
result = (getRelativeBoundaries(self.callee).start, self.closeParen.relPos.stop)
|
||||
of getItemExpr:
|
||||
var self = GetItemExpr(self)
|
||||
result = (getRelativeBoundaries(self.obj).start, getRelativeBoundaries(self.name).stop)
|
||||
of pragmaExpr:
|
||||
var self = Pragma(self)
|
||||
let start = self.token.relPos.start
|
||||
var stop = 0
|
||||
if self.args.len() > 0:
|
||||
stop = self.args[^1].token.relPos.stop + 1
|
||||
else:
|
||||
stop = self.token.relPos.stop + 1
|
||||
# -8 so the error highlights the #pragma[ part as well
|
||||
result = (self.token.relPos.start - 8, stop)
|
||||
of genericExpr:
|
||||
var self = GenericExpr(self)
|
||||
let ident = getRelativeBoundaries(self.ident)
|
||||
var stop: int = ident.stop + 2
|
||||
if self.args.len() > 0:
|
||||
stop = getRelativeBoundaries(self.args[^1]).stop
|
||||
result = (ident.start, stop)
|
||||
of refExpr:
|
||||
var self = Ref(self)
|
||||
result = (self.token.relPos.start, self.value.getRelativeBoundaries().stop)
|
||||
of ptrExpr:
|
||||
var self = Ptr(self)
|
||||
result = (self.token.relPos.start, self.value.getRelativeBoundaries().stop)
|
||||
else:
|
||||
result = (0, 0)
|
||||
|
|
@ -0,0 +1,669 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## A simple and modular tokenizer implementation with arbitrary lookahead
|
||||
## using a customizable symbol table
|
||||
|
||||
import std/strutils
|
||||
import std/parseutils
|
||||
import std/strformat
|
||||
import std/tables
|
||||
|
||||
|
||||
import token
|
||||
import errors
|
||||
|
||||
|
||||
export token, errors
|
||||
|
||||
|
||||
type
|
||||
SymbolTable* = ref object
|
||||
## A table of symbols used
|
||||
## to lex a source file
|
||||
|
||||
# Although we don't parse keywords
|
||||
# as symbols, but rather as identifiers,
|
||||
# we keep them here for consistency
|
||||
# purposes
|
||||
keywords: TableRef[string, TokenType]
|
||||
symbols: TableRef[string, TokenType]
|
||||
Lexer* = ref object
|
||||
## A lexer object
|
||||
symbols*: SymbolTable
|
||||
source: string
|
||||
tokens: seq[Token]
|
||||
line: int
|
||||
start: int
|
||||
current: int
|
||||
file: string
|
||||
lines: seq[tuple[start, stop: int]]
|
||||
lastLine: int
|
||||
linePos: int
|
||||
lineCurrent: int
|
||||
spaces: int
|
||||
LexingError* = ref object of PeonException
|
||||
## A lexing exception
|
||||
lexer*: Lexer
|
||||
pos*: tuple[start, stop: int]
|
||||
|
||||
|
||||
proc newSymbolTable: SymbolTable =
|
||||
## Initializes a new symbol table
|
||||
new(result)
|
||||
result.keywords = newTable[string, TokenType]()
|
||||
result.symbols = newTable[string, TokenType]()
|
||||
|
||||
|
||||
proc addSymbol*(self: SymbolTable, lexeme: string, token: TokenType) =
|
||||
## Adds a symbol to the symbol table. Overwrites
|
||||
## any previous entries
|
||||
self.symbols[lexeme] = token
|
||||
|
||||
|
||||
proc removeSymbol*(self: SymbolTable, lexeme: string) =
|
||||
## Removes a symbol from the symbol table
|
||||
## (does nothing if it does not exist)
|
||||
self.symbols.del(lexeme)
|
||||
|
||||
|
||||
proc addKeyword*(self: SymbolTable, lexeme: string, token: TokenType) =
|
||||
## Adds a keyword to the symbol table. Overwrites
|
||||
## any previous entries
|
||||
self.keywords[lexeme] = token
|
||||
|
||||
|
||||
proc removeKeyword*(self: SymbolTable, lexeme: string) =
|
||||
## Removes a keyword from the symbol table
|
||||
## (does nothing if it does not exist)
|
||||
self.keywords.del(lexeme)
|
||||
|
||||
|
||||
proc existsSymbol*(self: SymbolTable, lexeme: string): bool {.inline.} =
|
||||
## Returns true if a given symbol exists
|
||||
## in the symbol table already
|
||||
lexeme in self.symbols
|
||||
|
||||
|
||||
proc existsKeyword*(self: SymbolTable, lexeme: string): bool {.inline.} =
|
||||
## Returns true if a given keyword exists
|
||||
## in the symbol table already
|
||||
lexeme in self.keywords
|
||||
|
||||
|
||||
proc getToken(self: Lexer, lexeme: string): Token =
|
||||
## Gets the matching token object for a given
|
||||
## string according to the symbol table or
|
||||
## returns nil if there's no match
|
||||
let table = self.symbols
|
||||
var kind = table.symbols.getOrDefault(lexeme, table.keywords.getOrDefault(
|
||||
lexeme, NoMatch))
|
||||
if kind == NoMatch:
|
||||
return nil
|
||||
new(result)
|
||||
result.kind = kind
|
||||
result.lexeme = self.source[self.start..<self.current]
|
||||
result.line = self.line
|
||||
result.pos = (start: self.start, stop: self.current - 1)
|
||||
result.relPos = (start: self.linePos - result.lexeme.high() - 1, stop: self.linePos - 1)
|
||||
result.spaces = self.spaces
|
||||
self.spaces = 0
|
||||
|
||||
|
||||
proc getMaxSymbolSize(self: SymbolTable): int =
|
||||
## Returns the maximum length of all the symbols
|
||||
## currently in the table. Note that keywords are
|
||||
## not symbols, they're identifiers (or at least
|
||||
## are parsed the same way in Lexer.parseIdentifier)
|
||||
for lexeme in self.symbols.keys():
|
||||
if len(lexeme) > result:
|
||||
result = len(lexeme)
|
||||
|
||||
|
||||
proc getSymbols(self: SymbolTable, n: int): seq[string] =
|
||||
## Returns all n-bytes symbols
|
||||
## in the symbol table
|
||||
for lexeme in self.symbols.keys():
|
||||
if len(lexeme) == n:
|
||||
result.add(lexeme)
|
||||
|
||||
# Wrappers around isDigit and isAlphanumeric for
|
||||
# strings
|
||||
proc isDigit(s: string): bool =
|
||||
for c in s:
|
||||
if not c.isDigit():
|
||||
return false
|
||||
return true
|
||||
|
||||
|
||||
proc isAlphaNumeric(s: string): bool =
|
||||
for c in s:
|
||||
if not c.isAlphaNumeric():
|
||||
return false
|
||||
return true
|
||||
|
||||
# Forward declaration
|
||||
proc incLine(self: Lexer)
|
||||
|
||||
# Simple public getters used for error
|
||||
# formatting and whatnot
|
||||
proc getStart*(self: Lexer): int = self.start
|
||||
proc getFile*(self: Lexer): string = self.file
|
||||
proc getCurrent*(self: Lexer): int = self.current
|
||||
proc getCurrentLinePos*(self: Lexer): tuple[start, stop: int] = (self.lastLine, self.linePos)
|
||||
proc getLine*(self: Lexer): int = self.line
|
||||
proc getLines*(self: Lexer): seq[tuple[start, stop: int]] = self.lines
|
||||
proc getSource*(self: Lexer): string = self.source
|
||||
proc getRelPos*(self: Lexer, line: int): tuple[start, stop: int] =
|
||||
if self.tokens.len() == 0 or self.tokens[^1].kind != EndOfFile:
|
||||
self.incLine()
|
||||
return self.lines[line - 1]
|
||||
|
||||
|
||||
proc newLexer*(self: Lexer = nil): Lexer =
|
||||
## Initializes the lexer or resets
|
||||
## the state of an existing one
|
||||
new(result)
|
||||
if self != nil:
|
||||
result = self
|
||||
result.source = ""
|
||||
result.tokens = @[]
|
||||
result.line = 1
|
||||
result.start = 0
|
||||
result.current = 0
|
||||
result.file = ""
|
||||
result.lines = @[]
|
||||
result.lastLine = 0
|
||||
result.linePos = 0
|
||||
result.lineCurrent = 0
|
||||
result.symbols = newSymbolTable()
|
||||
result.spaces = 0
|
||||
|
||||
|
||||
proc done(self: Lexer): bool =
|
||||
## Returns true if we reached EOF
|
||||
result = self.current >= self.source.len
|
||||
|
||||
|
||||
proc incLine(self: Lexer) =
|
||||
## Increments the lexer's line
|
||||
## counter and updates internal
|
||||
## line metadata
|
||||
self.lines.add((self.lastLine, self.current))
|
||||
self.lastLine = self.current
|
||||
self.line += 1
|
||||
self.linePos = 0
|
||||
|
||||
|
||||
proc step(self: Lexer, n: int = 1): string =
|
||||
## Steps n characters forward in the
|
||||
## source file (default = 1). A string
|
||||
## of at most n bytes is returned. If n
|
||||
## exceeds EOF, the string will be shorter
|
||||
while len(result) < n:
|
||||
if self.done() or self.current > self.source.high():
|
||||
break
|
||||
else:
|
||||
result.add(self.source[self.current])
|
||||
inc(self.current)
|
||||
inc(self.linePos)
|
||||
|
||||
|
||||
proc peek(self: Lexer, distance: int = 0, length: int = 1): string =
|
||||
## Returns a stream of characters of
|
||||
## at most length bytes from the source
|
||||
## file, starting at the given distance,
|
||||
## without consuming it. The distance
|
||||
## parameter may be negative to retrieve
|
||||
## previously consumed tokens. If the
|
||||
## distance and/or the length are beyond
|
||||
## EOF (even partially), the resulting string
|
||||
## will be shorter than length bytes. The string
|
||||
## may be empty
|
||||
var i = distance
|
||||
while len(result) < length:
|
||||
if self.done() or self.current + i > self.source.high() or
|
||||
self.current + i < 0:
|
||||
break
|
||||
else:
|
||||
result.add(self.source[self.current + i])
|
||||
inc(i)
|
||||
|
||||
|
||||
proc error(self: Lexer, message: string) =
|
||||
## Raises a lexing error with the
|
||||
## appropriate metadata
|
||||
raise LexingError(msg: message, line: self.line, file: self.file, lexer: self, pos: (self.lineCurrent, self.linePos - 1))
|
||||
|
||||
|
||||
proc check(self: Lexer, s: string, distance: int = 0): bool =
|
||||
## Behaves like self.match(), without consuming the
|
||||
## token. False is returned if we're at EOF
|
||||
## regardless of what the token to check is.
|
||||
## The distance is passed directly to self.peek()
|
||||
if self.done():
|
||||
return false
|
||||
return self.peek(distance, len(s)) == s
|
||||
|
||||
|
||||
proc check(self: Lexer, args: openarray[string], distance: int = 0): bool =
|
||||
## Calls self.check() in a loop with
|
||||
## each character from the given set of
|
||||
## strings and returns at the first match.
|
||||
## Useful to check multiple tokens in a situation
|
||||
## where only one of them may match at one time
|
||||
for s in args:
|
||||
if self.check(s, distance):
|
||||
return true
|
||||
return false
|
||||
|
||||
|
||||
proc match(self: Lexer, s: string): bool =
|
||||
## Returns true if the next len(s) bytes
|
||||
## of the source file match the provided
|
||||
## string. If the match is successful,
|
||||
## len(s) bytes are consumed, otherwise
|
||||
## false is returned
|
||||
if not self.check(s):
|
||||
return false
|
||||
discard self.step(len(s))
|
||||
return true
|
||||
|
||||
|
||||
proc match(self: Lexer, args: openarray[string]): bool =
|
||||
## Calls self.match() in a loop with
|
||||
## each character from the given set of
|
||||
## strings and returns at the first match.
|
||||
## Useful to match multiple tokens in a situation
|
||||
## where only one of them may match at one time
|
||||
for s in args:
|
||||
if self.match(s):
|
||||
return true
|
||||
return false
|
||||
|
||||
|
||||
proc createToken(self: Lexer, tokenType: TokenType) =
|
||||
## Creates a token object and adds it to the token
|
||||
## list. The lexeme and position of the token are
|
||||
## inferred from the current state of the tokenizer
|
||||
var tok: Token = new(Token)
|
||||
tok.kind = tokenType
|
||||
tok.lexeme = self.source[self.start..<self.current]
|
||||
tok.line = self.line
|
||||
tok.spaces = self.spaces
|
||||
self.spaces = 0
|
||||
tok.pos = (start: self.start, stop: self.current - 1)
|
||||
tok.relPos = (start: self.linePos - tok.lexeme.high() - 1, stop: self.linePos - 1)
|
||||
self.tokens.add(tok)
|
||||
|
||||
|
||||
proc parseEscape(self: Lexer) =
|
||||
## Boring escape sequence parsing. For more info check out
|
||||
## https://en.wikipedia.org/wiki/Escape_sequences_in_C.
|
||||
## As of now, \u and \U are not supported, but they'll
|
||||
## likely be soon. Another notable limitation is that
|
||||
## \xhhh and \nnn are limited to the size of a char
|
||||
## (i.e. uint8, or 256 values)
|
||||
case self.peek()[0]: # We use a char instead of a string because of how case statements handle ranges with strings
|
||||
# (i.e. not well, given they crash the C code generator)
|
||||
of 'a':
|
||||
self.source[self.current] = cast[char](0x07)
|
||||
of 'b':
|
||||
self.source[self.current] = cast[char](0x7f)
|
||||
of 'e':
|
||||
self.source[self.current] = cast[char](0x1B)
|
||||
of 'f':
|
||||
self.source[self.current] = cast[char](0x0C)
|
||||
of 'n':
|
||||
when defined(windows):
|
||||
# We natively convert LF to CRLF on Windows, and
|
||||
# gotta thank Microsoft for the extra boilerplate!
|
||||
self.source[self.current] = cast[char](0x0D)
|
||||
self.source.insert(self.current + 1, 0X0A)
|
||||
when defined(darwin):
|
||||
# Thanks apple, lol
|
||||
self.source[self.current] = cast[char](0x0A)
|
||||
when defined(linux):
|
||||
self.source[self.current] = cast[char](0X0D)
|
||||
of 'r':
|
||||
self.source[self.current] = cast[char](0x0D)
|
||||
of 't':
|
||||
self.source[self.current] = cast[char](0x09)
|
||||
of 'v':
|
||||
self.source[self.current] = cast[char](0x0B)
|
||||
of '"':
|
||||
self.source[self.current] = '"'
|
||||
of '\'':
|
||||
self.source[self.current] = '\''
|
||||
of '\\':
|
||||
self.source[self.current] = cast[char](0x5C)
|
||||
of '0'..'9': # This is the reason we're using char instead of string. See https://github.com/nim-lang/Nim/issues/19678
|
||||
var code = ""
|
||||
var value = 0
|
||||
var i = self.current
|
||||
while i < self.source.high() and (let c = self.source[
|
||||
i].toLowerAscii(); c in '0'..'7') and len(code) < 3:
|
||||
code &= self.source[i]
|
||||
i += 1
|
||||
assert parseOct(code, value) == code.len()
|
||||
if value > uint8.high().int:
|
||||
self.error("escape sequence value too large (> 255)")
|
||||
self.source[self.current] = cast[char](value)
|
||||
of 'u', 'U':
|
||||
self.error("unicode escape sequences are not supported (yet)")
|
||||
of 'x':
|
||||
var code = ""
|
||||
var value = 0
|
||||
var i = self.current
|
||||
while i < self.source.high() and (let c = self.source[
|
||||
i].toLowerAscii(); c in 'a'..'f' or c in '0'..'9'):
|
||||
code &= self.source[i]
|
||||
i += 1
|
||||
assert parseHex(code, value) == code.len()
|
||||
if value > uint8.high().int:
|
||||
self.error("escape sequence value too large (> 255)")
|
||||
self.source[self.current] = cast[char](value)
|
||||
else:
|
||||
self.error(&"invalid escape sequence '\\{self.peek()}'")
|
||||
|
||||
|
||||
proc parseString(self: Lexer, delimiter: string, mode: string = "single") =
|
||||
## Parses string literals. They can be expressed using matching pairs
|
||||
## of either single or double quotes. Most C-style escape sequences are
|
||||
## supported, moreover, a specific prefix may be prepended
|
||||
## to the string to instruct the lexer on how to parse it:
|
||||
## - b -> declares a byte string, where each character is
|
||||
## interpreted as an integer instead of a character
|
||||
## - r -> declares a raw string literal, where escape sequences
|
||||
## are not parsed and stay as-is
|
||||
## - f -> declares a format string, where variables may be
|
||||
## interpolated using curly braces like f"Hello, {name}!".
|
||||
## Braces may be escaped using a pair of them, so to represent
|
||||
## a literal "{" in an f-string, one would use {{ instead
|
||||
## Multi-line strings can be declared using matching triplets of
|
||||
## either single or double quotes. They can span across multiple
|
||||
## lines and escape sequences in them are not parsed, like in raw
|
||||
## strings, so a multi-line string prefixed with the "r" modifier
|
||||
## is redundant, although multi-line byte/format strings are supported
|
||||
var slen = 0
|
||||
while not self.check(delimiter) and not self.done():
|
||||
if self.match("\n"):
|
||||
if mode == "multi":
|
||||
self.incLine()
|
||||
else:
|
||||
self.error("unexpected EOL while parsing string literal")
|
||||
if mode in ["raw", "multi"]:
|
||||
discard self.step()
|
||||
elif self.match("\\"):
|
||||
# This madness here serves to get rid of the slash, since \x is mapped
|
||||
# to a one-byte sequence but the string '\x' is actually 2 bytes (or more,
|
||||
# depending on the specific escape sequence)
|
||||
self.source = self.source[0..<self.current] & self.source[
|
||||
self.current + 1..^1]
|
||||
self.parseEscape()
|
||||
if mode == "format" and self.match("{"):
|
||||
if self.match("{"):
|
||||
self.source = self.source[0..<self.current] & self.source[
|
||||
self.current + 1..^1]
|
||||
continue
|
||||
while not self.check(["}", "\""]):
|
||||
discard self.step()
|
||||
if self.check("\""):
|
||||
self.error("unclosed '{' in format string")
|
||||
elif mode == "format" and self.check("}"):
|
||||
if not self.check("}", 1):
|
||||
self.error("unmatched '}' in format string")
|
||||
else:
|
||||
self.source = self.source[0..<self.current] & self.source[
|
||||
self.current + 1..^1]
|
||||
discard self.step()
|
||||
inc(slen)
|
||||
if slen > 1 and delimiter == "'":
|
||||
self.error("invalid character literal (length must be one!)")
|
||||
if mode == "multi":
|
||||
if not self.match(delimiter.repeat(3)):
|
||||
self.error("unexpected EOL while parsing multi-line string literal")
|
||||
elif self.done() and self.peek(-1) != delimiter:
|
||||
self.error("unexpected EOF while parsing string literal")
|
||||
else:
|
||||
discard self.step()
|
||||
if delimiter == "\"":
|
||||
self.createToken(String)
|
||||
else:
|
||||
self.createToken(Char)
|
||||
|
||||
|
||||
proc parseBinary(self: Lexer) =
|
||||
## Parses binary numbers
|
||||
while self.peek().isDigit():
|
||||
if not self.check(["0", "1"]):
|
||||
self.error(&"invalid digit '{self.peek()}' in binary literal")
|
||||
discard self.step()
|
||||
|
||||
|
||||
proc parseOctal(self: Lexer) =
|
||||
## Parses octal numbers
|
||||
while self.peek().isDigit():
|
||||
if self.peek() notin "0".."7":
|
||||
self.error(&"invalid digit '{self.peek()}' in octal literal")
|
||||
discard self.step()
|
||||
|
||||
|
||||
proc parseHex(self: Lexer) =
|
||||
## Parses hexadecimal numbers
|
||||
while self.peek().isAlphaNumeric():
|
||||
if not self.peek().isDigit() and self.peek().toLowerAscii() notin "a".."f":
|
||||
self.error(&"invalid hexadecimal literal")
|
||||
discard self.step()
|
||||
|
||||
|
||||
proc parseNumber(self: Lexer) =
|
||||
## Parses numeric literals, which encompass
|
||||
## integers and floating point numbers.
|
||||
## Floats also support scientific notation
|
||||
## (i.e. 3e14), while the fractional part
|
||||
## must be separated from the decimal one
|
||||
## using a dot (which acts as the comma).
|
||||
## Float literals such as 32.5e3 are also supported.
|
||||
## The "e" for the scientific notation of floats
|
||||
## is case-insensitive. Binary number literals are
|
||||
## expressed using the prefix 0b, hexadecimal
|
||||
## numbers with the prefix 0x and octal numbers
|
||||
## with the prefix 0o. Numeric literals support
|
||||
## size specifiers, like so: 10'u8, 3.14'f32
|
||||
var kind: TokenType
|
||||
case self.peek():
|
||||
of "b":
|
||||
discard self.step()
|
||||
kind = Binary
|
||||
self.parseBinary()
|
||||
of "x":
|
||||
kind = Hex
|
||||
discard self.step()
|
||||
self.parseHex()
|
||||
of "o":
|
||||
kind = Octal
|
||||
discard self.step()
|
||||
self.parseOctal()
|
||||
else:
|
||||
kind = Integer
|
||||
while isDigit(self.peek()) and not self.done():
|
||||
discard self.step()
|
||||
if self.check(["e", "E"]):
|
||||
kind = Float
|
||||
discard self.step()
|
||||
while self.peek().isDigit() and not self.done():
|
||||
discard self.step()
|
||||
elif self.check("."):
|
||||
# TODO: Is there a better way?
|
||||
discard self.step()
|
||||
if not isDigit(self.peek()):
|
||||
self.error("invalid float number literal")
|
||||
kind = Float
|
||||
while isDigit(self.peek()) and not self.done():
|
||||
discard self.step()
|
||||
if self.check(["e", "E"]):
|
||||
discard self.step()
|
||||
while isDigit(self.peek()) and not self.done():
|
||||
discard self.step()
|
||||
if self.match("'"):
|
||||
# Could be a size specifier, better catch it
|
||||
while (self.peek().isAlphaNumeric() or self.check("_")) and
|
||||
not self.done():
|
||||
discard self.step()
|
||||
self.createToken(kind)
|
||||
|
||||
|
||||
proc parseBackticks(self: Lexer) =
|
||||
## Parses tokens surrounded
|
||||
## by backticks. This may be used
|
||||
## for name stropping as well as to
|
||||
## reimplement existing operators
|
||||
## (e.g. +, -, etc.) without the
|
||||
## parser complaining about syntax
|
||||
## errors
|
||||
while not self.match("`") and not self.done():
|
||||
if self.peek().isAlphaNumeric() or self.symbols.existsSymbol(self.peek()):
|
||||
discard self.step()
|
||||
continue
|
||||
self.error(&"unexpected character: '{self.peek()}'")
|
||||
self.createToken(Identifier)
|
||||
# Strips the backticks
|
||||
self.tokens[^1].lexeme = self.tokens[^1].lexeme[1..^2]
|
||||
|
||||
|
||||
proc parseIdentifier(self: Lexer) =
|
||||
## Parses keywords and identifiers.
|
||||
## Note that multi-character tokens
|
||||
## (aka UTF runes) are not supported
|
||||
## by design and *will* break things
|
||||
while (self.peek().isAlphaNumeric() or self.check("_")) and not self.done():
|
||||
discard self.step()
|
||||
let name: string = self.source[self.start..<self.current]
|
||||
if self.symbols.existsKeyword(name):
|
||||
# It's a keyword!
|
||||
self.createToken(self.symbols.keywords[name])
|
||||
else:
|
||||
# It's an identifier!
|
||||
self.createToken(Identifier)
|
||||
|
||||
|
||||
proc next(self: Lexer) =
|
||||
## Scans a single token. This method is
|
||||
## called iteratively until the source
|
||||
## file reaches EOF
|
||||
if self.done():
|
||||
# We done boi
|
||||
return
|
||||
elif self.match(["\r", "\f", "\e"]):
|
||||
# We skip characters we don't need
|
||||
return
|
||||
elif self.match(" "):
|
||||
# Whitespaces
|
||||
inc(self.spaces)
|
||||
inc(self.start, 2)
|
||||
elif self.match("\t"):
|
||||
self.error("tabs are not allowed in peon code, use spaces for indentation instead")
|
||||
elif self.match("\n"):
|
||||
# New line
|
||||
self.incLine()
|
||||
# TODO: Broken
|
||||
#[if not self.getToken("\n").isNil():
|
||||
self.createToken(Semicolon)]#
|
||||
elif self.match("`"):
|
||||
# Stropped token
|
||||
self.parseBackticks()
|
||||
elif self.match(["\"", "'"]):
|
||||
# String or character literal
|
||||
var mode = "single"
|
||||
if self.peek(-1) != "'" and self.check(self.peek(-1)) and self.check(
|
||||
self.peek(-1), 1):
|
||||
# Multiline strings start with 3 quotes
|
||||
discard self.step(2)
|
||||
mode = "multi"
|
||||
self.parseString(self.peek(-1), mode)
|
||||
elif self.peek().isDigit():
|
||||
discard self.step() # Needed because parseNumber reads the next
|
||||
# character to tell the base of the number
|
||||
# Number literal
|
||||
self.parseNumber()
|
||||
elif self.peek().isAlphaNumeric() and self.check(["\"", "'"], 1):
|
||||
# Prefixed string literal (i.e. f"Hi {name}!")
|
||||
case self.step():
|
||||
of "r":
|
||||
self.parseString(self.step(), "raw")
|
||||
of "b":
|
||||
self.parseString(self.step(), "bytes")
|
||||
of "f":
|
||||
self.parseString(self.step(), "format")
|
||||
else:
|
||||
self.error(&"unknown string prefix '{self.peek(-1)}'")
|
||||
elif self.peek().isAlphaNumeric() or self.check("_"):
|
||||
# Keywords and identifiers
|
||||
self.parseIdentifier()
|
||||
elif self.match("#"):
|
||||
if not self.match("pragma["):
|
||||
# Inline comments
|
||||
while not (self.match("\n") or self.done()):
|
||||
discard self.step()
|
||||
self.createToken(Comment)
|
||||
self.incLine()
|
||||
else:
|
||||
self.createToken(Pragma)
|
||||
else:
|
||||
# If none of the above conditions matched, there's a few
|
||||
# other options left:
|
||||
# - The token is a built-in operator, or
|
||||
# - it's an expression/statement delimiter, or
|
||||
# - it's not a valid token at all
|
||||
# We handle all of these cases here by trying to
|
||||
# match the longest sequence of characters possible
|
||||
# as either an operator or a statement/expression
|
||||
# delimiter
|
||||
var n = self.symbols.getMaxSymbolSize()
|
||||
while n > 0:
|
||||
for symbol in self.symbols.getSymbols(n):
|
||||
if self.match(symbol):
|
||||
# We've found the largest possible
|
||||
# match!
|
||||
self.tokens.add(self.getToken(symbol))
|
||||
return
|
||||
dec(n)
|
||||
# We just assume what we have in front of us
|
||||
# is a symbol
|
||||
discard self.step()
|
||||
self.createToken(Symbol)
|
||||
|
||||
|
||||
proc lex*(self: Lexer, source, file: string): seq[Token] =
|
||||
## Lexes a source file, converting a stream
|
||||
## of characters into a series of tokens
|
||||
var symbols = self.symbols
|
||||
discard self.newLexer()
|
||||
self.symbols = symbols
|
||||
self.source = source
|
||||
self.file = file
|
||||
self.lines = @[]
|
||||
self.lastLine = 0
|
||||
self.linePos = 0
|
||||
self.lineCurrent = 0
|
||||
while not self.done():
|
||||
self.next()
|
||||
self.start = self.current
|
||||
self.lineCurrent = self.linePos
|
||||
self.tokens.add(Token(kind: EndOfFile, lexeme: "",
|
||||
line: self.line, pos: (self.current, self.current),
|
||||
relPos: (start: 0, stop: self.linePos - 1)))
|
||||
self.incLine()
|
||||
return self.tokens
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,83 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import std/strformat
|
||||
import std/strutils
|
||||
|
||||
|
||||
type
|
||||
|
||||
TokenType* {.pure.} = enum
|
||||
## Token types enumeration
|
||||
|
||||
# Booleans
|
||||
True, False,
|
||||
|
||||
# Control flow statements
|
||||
If, Else,
|
||||
|
||||
# Looping statements
|
||||
While, For,
|
||||
|
||||
# Keywords
|
||||
Function, Break, Continue,
|
||||
Var, Let, Const, Return,
|
||||
Coroutine, Generator, Import,
|
||||
Raise, Assert, Await, Foreach,
|
||||
Yield, Type, Operator, Case,
|
||||
Enum, From, Ptr, Ref, Object,
|
||||
Export, Block, Switch, Lent
|
||||
|
||||
# Literal types
|
||||
Integer, Float, String, Identifier,
|
||||
Binary, Octal, Hex, Char
|
||||
|
||||
# Brackets, parentheses,
|
||||
# operators and others
|
||||
|
||||
LeftParen, RightParen, # ()
|
||||
LeftBrace, RightBrace, # {}
|
||||
LeftBracket, RightBracket, # []
|
||||
Dot, Semicolon, Comma, # . ; ,
|
||||
|
||||
# Miscellaneous
|
||||
|
||||
EndOfFile, # Marks the end of the token stream
|
||||
NoMatch, # Used internally by the symbol table
|
||||
Comment, # Useful for documentation comments, pragmas, etc.
|
||||
Symbol, # A generic symbol
|
||||
Pragma,
|
||||
|
||||
Token* = ref object
|
||||
## A token object
|
||||
kind*: TokenType # The type of the token
|
||||
lexeme*: string # The lexeme associated to the token
|
||||
line*: int # The line where the token appears
|
||||
pos*: tuple[start, stop: int] # The absolute position in the source file
|
||||
relPos*: tuple[start, stop: int] # The relative position in the source line
|
||||
spaces*: int # Number of spaces before this token
|
||||
|
||||
|
||||
|
||||
proc `$`*(self: Token): string =
|
||||
## Strinfifies
|
||||
if self != nil:
|
||||
result = &"Token(kind={self.kind}, lexeme={self.lexeme.escape()}, line={self.line}, pos=({self.pos.start}, {self.pos.stop}), relpos=({self.relPos.start}, {self.relPos.stop}), spaces={self.spaces})"
|
||||
else:
|
||||
result = "nil"
|
||||
|
||||
|
||||
proc `==`*(self, other: Token): bool =
|
||||
## Returns self == other
|
||||
return self.kind == other.kind and self.lexeme == other.lexeme
|
|
@ -0,0 +1,70 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import util/fmterr
|
||||
import util/symbols
|
||||
import frontend/parsing/lexer
|
||||
import frontend/parsing/parser
|
||||
import frontend/compiler/compiler
|
||||
|
||||
|
||||
import std/strformat
|
||||
|
||||
|
||||
proc `$`(self: TypedNode): string =
|
||||
if self.node.isConst():
|
||||
var self = TypedExpr(self)
|
||||
return &"{self.node}: {self.kind[]}"
|
||||
case self.node.kind:
|
||||
of varDecl, typeDecl, funDecl:
|
||||
var self = TypedDecl(self)
|
||||
result = &"{self.name[]}: {self.name.valueType[]}"
|
||||
of identExpr, binaryExpr, unaryExpr:
|
||||
var self = TypedExpr(self)
|
||||
result &= &"{self.node}: {self.kind[]}"
|
||||
else:
|
||||
result = &"{self.node}: ? ({self.node.kind})"
|
||||
|
||||
|
||||
proc main =
|
||||
var
|
||||
lexer = newLexer()
|
||||
parser = newParser()
|
||||
compiler = newPeonCompiler()
|
||||
source: string
|
||||
file = "test.pn"
|
||||
lexer.fillSymbolTable()
|
||||
while true:
|
||||
stdout.write(">>> ")
|
||||
stdout.flushFile()
|
||||
try:
|
||||
source = stdin.readLine()
|
||||
for typedNode in compiler.compile(parser.parse(lexer.lex(source, file), file, lexer.getLines(), lexer.getSource()), lexer.getFile(), lexer.getSource(),
|
||||
showMismatches=true):
|
||||
echo &"{typedNode.node} -> {compiler.stringify(typedNode)}\n"
|
||||
except IOError:
|
||||
echo ""
|
||||
break
|
||||
except LexingError as exc:
|
||||
print(exc)
|
||||
except ParseError as exc:
|
||||
print(exc)
|
||||
except CompileError as exc:
|
||||
print(exc)
|
||||
|
||||
|
||||
|
||||
when isMainModule:
|
||||
setControlCHook(proc () {.noconv.} = echo ""; quit(0))
|
||||
main()
|
|
@ -0,0 +1,83 @@
|
|||
# Copyright 2022 Mattia Giambirtone & All Contributors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
## Utilities to print formatted error messages to stderr
|
||||
import frontend/compiler/compiler
|
||||
import frontend/parsing/parser
|
||||
import frontend/parsing/lexer
|
||||
import errors
|
||||
|
||||
|
||||
import std/os
|
||||
import std/terminal
|
||||
import std/strutils
|
||||
import std/strformat
|
||||
|
||||
|
||||
proc printError(file, line: string, lineNo: int, pos: tuple[start, stop: int], fn: Declaration, msg: string) =
|
||||
## Internal helper to print a formatted error message
|
||||
## to stderr
|
||||
stderr.styledWrite(fgRed, styleBright, "Error in ", fgYellow, &"{file}:{lineNo}:{pos.start}")
|
||||
if not fn.isNil() and fn.kind == funDecl:
|
||||
stderr.styledWrite(fgRed, styleBright, " in function ", fgYellow, FunDecl(fn).name.token.lexeme)
|
||||
stderr.styledWriteLine(styleBright, fgDefault, ": ", msg)
|
||||
if line.len() > 0:
|
||||
stderr.styledWrite(fgRed, styleBright, "Source line: ", resetStyle, fgDefault, line[0..<pos.start])
|
||||
if pos.stop == line.len():
|
||||
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..<pos.stop])
|
||||
stderr.styledWriteLine(fgDefault, line[pos.stop..^1])
|
||||
else:
|
||||
stderr.styledWrite(fgRed, styleUnderscore, line[pos.start..pos.stop])
|
||||
stderr.styledWriteLine(fgDefault, line[pos.stop + 1..^1])
|
||||
|
||||
|
||||
proc print*(exc: CompileError) =
|
||||
## Prints a formatted error message
|
||||
## for compilation errors to stderr
|
||||
var file = exc.file
|
||||
var contents = ""
|
||||
case exc.line:
|
||||
of -1: discard
|
||||
of 0: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line]
|
||||
else: contents = exc.compiler.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
printError(file, contents, exc.line, exc.node.getRelativeBoundaries(), exc.function, exc.msg)
|
||||
|
||||
|
||||
proc print*(exc: ParseError) =
|
||||
## Prints a formatted error message
|
||||
## for parsing errors to stderr
|
||||
var file = exc.file
|
||||
if file notin ["<string>", ""]:
|
||||
file = relativePath(exc.file, getCurrentDir())
|
||||
var contents = ""
|
||||
if exc.line != -1:
|
||||
contents = exc.parser.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
else:
|
||||
contents = ""
|
||||
printError(file, contents, exc.line, exc.token.relPos, exc.parser.getCurrentFunction(), exc.msg)
|
||||
|
||||
|
||||
proc print*(exc: LexingError) =
|
||||
## Prints a formatted error message
|
||||
## for lexing errors to stderr
|
||||
var file = exc.file
|
||||
if file notin ["<string>", ""]:
|
||||
file = relativePath(exc.file, getCurrentDir())
|
||||
var contents = ""
|
||||
if exc.line != -1:
|
||||
contents = exc.lexer.getSource().strip(chars={'\n'}).splitLines()[exc.line - 1]
|
||||
else:
|
||||
contents = ""
|
||||
printError(file, contents, exc.line, exc.pos, nil, exc.msg)
|
||||
|
|
@ -0,0 +1,60 @@
|
|||
import ../frontend/parsing/lexer
|
||||
|
||||
|
||||
proc fillSymbolTable*(tokenizer: Lexer) =
|
||||
## Initializes the Lexer's symbol
|
||||
## table with builtin symbols and
|
||||
## keywords
|
||||
|
||||
# 1-byte symbols
|
||||
tokenizer.symbols.addSymbol("{", LeftBrace)
|
||||
tokenizer.symbols.addSymbol("}", RightBrace)
|
||||
tokenizer.symbols.addSymbol("(", LeftParen)
|
||||
tokenizer.symbols.addSymbol(")", RightParen)
|
||||
tokenizer.symbols.addSymbol("[", LeftBracket)
|
||||
tokenizer.symbols.addSymbol("]", RightBracket)
|
||||
tokenizer.symbols.addSymbol(".", Dot)
|
||||
tokenizer.symbols.addSymbol(",", Comma)
|
||||
tokenizer.symbols.addSymbol(";", Semicolon)
|
||||
# Keywords
|
||||
tokenizer.symbols.addKeyword("type", TokenType.Type)
|
||||
tokenizer.symbols.addKeyword("enum", Enum)
|
||||
tokenizer.symbols.addKeyword("case", Case)
|
||||
tokenizer.symbols.addKeyword("operator", Operator)
|
||||
tokenizer.symbols.addKeyword("generator", Generator)
|
||||
tokenizer.symbols.addKeyword("fn", TokenType.Function)
|
||||
tokenizer.symbols.addKeyword("coroutine", Coroutine)
|
||||
tokenizer.symbols.addKeyword("break", TokenType.Break)
|
||||
tokenizer.symbols.addKeyword("continue", Continue)
|
||||
tokenizer.symbols.addKeyword("while", While)
|
||||
tokenizer.symbols.addKeyword("for", For)
|
||||
tokenizer.symbols.addKeyword("foreach", Foreach)
|
||||
tokenizer.symbols.addKeyword("if", If)
|
||||
tokenizer.symbols.addKeyword("else", Else)
|
||||
tokenizer.symbols.addKeyword("await", TokenType.Await)
|
||||
tokenizer.symbols.addKeyword("raise", TokenType.Raise)
|
||||
tokenizer.symbols.addKeyword("assert", TokenType.Assert)
|
||||
tokenizer.symbols.addKeyword("const", Const)
|
||||
tokenizer.symbols.addKeyword("let", Let)
|
||||
tokenizer.symbols.addKeyword("var", TokenType.Var)
|
||||
tokenizer.symbols.addKeyword("import", Import)
|
||||
tokenizer.symbols.addKeyword("yield", TokenType.Yield)
|
||||
tokenizer.symbols.addKeyword("return", TokenType.Return)
|
||||
tokenizer.symbols.addKeyword("object", Object)
|
||||
tokenizer.symbols.addKeyword("export", Export)
|
||||
tokenizer.symbols.addKeyword("block", TokenType.Block)
|
||||
tokenizer.symbols.addKeyword("switch", TokenType.Switch)
|
||||
tokenizer.symbols.addKeyword("lent", TokenType.Lent)
|
||||
# These are more like expressions with a reserved
|
||||
# name that produce a value of a builtin type,
|
||||
# but we don't need to care about that until
|
||||
# we're in the parsing/compilation steps so
|
||||
# it's fine
|
||||
tokenizer.symbols.addKeyword("true", True)
|
||||
tokenizer.symbols.addKeyword("false", False)
|
||||
tokenizer.symbols.addKeyword("ref", TokenType.Ref)
|
||||
tokenizer.symbols.addKeyword("ptr", TokenType.Ptr)
|
||||
for sym in [">", "<", "=", "~", "/", "+", "-", "_", "*", "?", "@", ":", "==", "!=",
|
||||
">=", "<=", "+=", "-=", "/=", "*=", "**=", "!", "%", "&", "|", "^",
|
||||
">>", "<<"]:
|
||||
tokenizer.symbols.addSymbol(sym, Symbol)
|
Loading…
Reference in New Issue