About a month ago and after a long hiatus, I published EndBASIC 0.3 and the adrenaline rush that came with it got my wheels spinning again full-steam ahead. So here I am today, ready to announce the 0.4 release. But… “what could have possibly changed in just a month of someone’s free time”, you wonder? Enough, actually!
EndBASIC 0.4 is the release that fulfills my original goal of being able to run a “guess the number” game. The “only” change needed after 0.3 to make this possible was the addition of random numbers, but these in turn required adding function calls to back
RND() and also supporting floating point numbers because that’s what this function is supposed to return.
All in all, there are four major features in this new release:
- Built-in functions, including random number generation and string manipulation.
- Rudimentary double-precision floating point numbers.
- Three built-in demo programs to get you started with something tangible out of the box.
You can dive right in by visiting https://repl.endbasic.dev/ for an interactive session (doesn’t really work on Android, sorry), following the project at https://www.endbasic.dev/, or watching the demo below.
WARNING: 🚨 But don’t leave yet! Keep reading for the backstory behind the new features.
FOR loops are an interesting creature in any programming language: they are not strictly necessary for turing-complete computations because they can be trivially rewritten as
WHILE loops. However,
FOR loops are very convenient, and in the eyes of beginners, I bet that BASIC’s
FOR loops are much easier to understand than
WHILE loops. No matter what, there is no question that I had to add them to the language.
FOR loops to the language was easy. I might have spent an hour total on this work, which involved modifying the parser to recognize the new syntax, extending the AST to represent the new loops, and updating the execution engine to process the new construct in the AST. But along the way, several design questions popped up and they surfaced interesting trade-offs.
Let’s take a look at how a trivial
FOR loop looks like and what its mechanic conversion to a
WHILE loop is:
In the code snippets above, we have a
FOR loop that repeats a
i by 1 so
i ends up having all integer values in the range
[1..10]. (Note that the end value is inclusive.) The transformation to a
WHILE loop is trivial: we set the iterator to the initial value, test if the value is less than the end value, and increment it at the end of each iteration.
So far so good. But BASIC also allows a
STEP clause in the
FOR loop, which indicates the amount with which to increment the iterator. The default is 1 but we can set it to any other number we like:
In this example we set the step to 2, and because of that, the loop’s iterator will cover the values
[1,3,5,7,9] before terminating.
Based on these examples, we now know that to represent a
FOR loop in the AST, we have to capture the name of the iterator, the start value, the end value, and the step value. But how do we store these three numerical values in the AST? “Oh, that’s obvious”, I thought. “They are integer expressions so we record them as such!” Not so fast, my friend. Accepting these values as expressions causes some trouble.
You see: I only showed you
FOR loops counting upwards. But can these loops count downwards too? They can, of course, and they look like this:
Note the negative step value, which is set to -2. This loop will count from 10 to 1 so the values the iterator will see are
There are two interesting facts here. The first is that the step’s sign tells us whether the loop counts upwards or downwards. The second is that the terminating condition for “downwards” loops is the opposite of the one we saw before: we need to loop while the iterator is larger than or equal to the end value, not smaller or equal.
Which leads us to trouble in the AST. If we track the start, end, and step values as expressions, as I originally attempted, we have no upfront idea about what condition to test for when we are executing the code. At each iteration, we have to evaluate the step expression to see if it is positive or negative, then build a conditional expression to test the iterator value, and then evaluate it. And because we do this at each iteration, we could have a loop that moves up and down if its body happened to touch the variables involved in the step expression.
Yikes. And also strange. This sounded too complicated and fragile. And indeed, my suspicion was correct. I checked the old documentation and found that the
STEP clause in QuickBASIC 4.5 takes an integer literal, not an expression. Given this limitation, the interpreter can tell, at parse time, whether the loop will count upwards or downwards, which is very useful in representing the loop in the AST.
I made this same simplification and ended up with an AST representation that stores the terminating condition in it, not just the end value. This allows for faster execution at runtime as we only need to evaluate expressions and not think about which expressions those are.
I could have also gone all the way and not tracked
FOR loops in the AST at all. As shown above, these can be trivially converted to
WHILE loops so an alternate representation in the AST could have been just that. I felt that this approach lost fidelity because you can’t tell anymore how the source code looked like by just inspecting the AST, so I preferred to go for an explicit
FOR representation. But both would be acceptable options.
Rudimentary floating point support
Floating point numbers aren’t something I was planning on adding to the language, but as I said in the opening, they were a requirement to implement
RND(). Sure, I could have changed the signature of this function to return integers instead, but I suppose floating point numbers will come in handy in the future too. (I recall
COS() being very useful for graphics and stuff back in the day… so… we’ll see.)
Adding the new double type (
# annotation) to the language was trivial. It took me maybe 5 minutes of actual typing to get them to work after getting rid of a spurious
#[derive(Eq)] that I had sprinkled throughout the AST types. However, extending the unit and integration tests to cope with the addition of the new type was… painful, and lengthened this task from those 5 minutes to over an hour. But don’t get me wrong: this was not a waste of time.
One of the things that writing these tests quickly revealed is that integers and doubles aren’t magically compatible. Duh. If I wanted to support mixing them in expressions, I’d have to add type promotion rules… and I really don’t like automatic type promotion. Yes, yes, I know BASIC does it so I should implement it too, but let’s see how far I can get with the clutch I added: two functions (
ITOD) to explicitly cast between these two types.
Another interesting detail that Rust and the tests revealed is what happens on overflow. The code to handle integer expressions has to worry about overflow conditions and uses Rust’s primitives to do so. Copy/pasting that code to handle doubles failed to compile because those overflow-safe primitives don’t exist for doubles. And the reason is simple: doubles don’t overflow and divisions by zero don’t fail. Obvious too, you can tell me, but my experience with floating point numbers is almost zero. So how do they behave? The tests I copy/pasted from the integer code provided the answers: magic “numbers”. Overflows turn into infinities and divisions by zero turn into NaNs.
Which is fine for now, but highlights that what I have implemented to support doubles in EndBASIC is extremely rudimentary. Yes, you can perform computations on these numbers but that’s about it: forget about checking for corner cases as the right constants are not exposed to the language. As I’m not an expert, I’m certain to get this wrong, but if you are interested in lending a hand please let me know!
Supporting function calls is something I dreaded adding for a while because this meant touching the expression parser to support them. But it had to be done.
The expression parser in EndBASIC is an implementation of the Shunting-yard algorithm by Edsger Dijkstra. You can find plenty of pseudo-code variants of this algorithm online, but most deal with printing stuff out to the screen instead of building an AST, and most also fail to cover complex cases. Function calls, and especially function calls with variadic arguments, are one of those complex cases that aren’t fully detailed.
It took me a little while to figure out how to add these to the parser, and along the way I uncovered a couple of existing embarrassing bugs. But with this done, it was then a matter of representing functions in the code. Note that these functions are builtin for now: they are written in Rust and cannot be defined from EndBASIC code. Yet.
The interesting thing in this area is that, before this work, EndBASIC already had a concept of “builtin commands”, which are a generic way to express callable statements like
INPUT without tying them to the parser. This is by design: I want the core of the interpreter to be free from side-effects so that it can be safely embedded in other binaries, and the statements to write to the console definitely have side-effects.
Adding functions meant replicating the existing logic for built-in commands. And because copy/pastes are ugly, this quickly turned into a yak-shaving problem: functions return a value while commands do not, and functions can be used in different contexts than commands. They are almost the same, but not really. Yet… could we handle them in the same way?
I spent a long time prototyping different ways to unify functions and commands under a single
Callable trait… but in the end abandoned them all. No matter what I tried, the benefits in the design were eclipsed by the hacks that had to be put in place to accept one class of callable or another under different contexts (statements vs. expressions). I still feel “dirty” by this duality so I might come back to clean it up later.
But now that functions are supported in the core engine, adding new functions is trivial. And to prove that they work, I added common string manipulation functions such as
MID() on top of the single function I originally wanted to add:
RND(). To witness:
Ready PRINT RND() 0.13684954101612082 Ready PRINT DTOI(RND() * 1000.0) 385 Ready PRINT LEN("Hello") 5
The latest goodie in this release is something I didn’t originally think of for 0.4. In fact, my original plans were to add arrays instead, but that’s a can of worms that I’ll save for later.
As stated in the opening of this post, my goal at the beginning of EndBASIC was to support writing a “guess the number” game. And write it I did. But… how could I give it to you to showcase the features of the language? Sure, I could add the source program to the Git repository… but anyone installing EndBASIC via
cargo install would not have access to the file. And anyone accessing EndBASIC through the web interface (the most common case, I presume) would have no way of putting that code into the interpreter.
The easiest solution that came to mind was to bundle the code in the interpreter itself via Rust’s
include_bytes! macro. It’s not the greatest solution because I’m bloating the binary and the WASM pack with unnecessary code, but it doesn’t matter much really. We are talking about a few KBs only and this is for fun right now, so who cares. But it’s indeed ugly that these files are currently in the language core and not in the CLI crate.
With that in mind, I extended the
LOAD commands with support for magic files that correspond to demos. These file names are all prefixed with
DEMO: (note the colon character), which is a name that cannot be represented in most regular file systems. And with that done, I added three demos to the interpreter: the game I’ve been talking about, a useless “Hello World” type of program, and an interactive tour:
Ready DIR Modified Size Name 2020-12-23 03:12 2107 DEMO:GUESS.BAS 2020-12-22 14:20 651 DEMO:HELLO.BAS 2020-12-24 01:52 6590 DEMO:TOUR.BAS 3 file(s), 9348 bytes
A lot of things could come next. Arrays is one of them. Better screen handling is another. User-defined functions and procedures are yet another. A
PLAY statement for music would be the pipe dream one.
But I think it’s time to pause for a bit and polish what I have so far. EndBASIC is now about 12k LoC. Roughly half of that is test code, so that leaves the interpreter at 6k lines. After removing comments and docstrings, we are left with only about 5k lines for the interpreter. That’s… much smaller than I expected to see, honestly. But the code has grown quite a few warts that make it hard to iterate on.
One of these warts are the tests, yes. I blogged about using the builder pattern for tests a few weeks ago, and the results of that experiment with the console manipulation tests have been pretty good. I think it’s time to go back to the other modules and adopt these ideas too to remove the annoying and abundant
The other wart is something that has been bothering me for a while too: I truly want to keep the language core separate from any constructs that might have side-effects. I want to ensure that the language parser and execution engine are completely decoupled from the console and any external resources. For the most part, they are already, but the
endbasic-core crate has too many dependencies that don’t make this fact clear. For one, the latest release has added a new dependency to generate random numbers, and that’s something that the language itself doesn’t need. So, I want to break apart the two components and have a tiny language parser with a separate “standard library” crate.
So, stay tuned. In the meantime, and to reiterate the opening of the post:
And with that, Merry Xmas if that’s your thing. Consider this your Santa 🎅 gift!