Since its inception two years ago, the EndBASIC interpreter has been using an AST-based execution
engine. And during all this time, people have mocked the language for not allowing 10 GOTO 10
.
Well, fear not: the upcoming 0.10 release has full support for GOTO
and GOSUB
, features that
were made possible by moving to a bytecode-based interpreter. Let’s take a peek at what the
problems were and how I addressed them.
A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.
Before: AST-based execution
An Abstract Syntax Tree (AST) represents the structure of the program as a connected acyclic undirected graph—aka… a tree. Each node represents a construct that appears in the program text and each edge indicates how those constructs are nested. ASTs are the output of a language parser.
ASTs are useful to efficiently walk over a program’s source code at a high level without having to worry about lexing or parsing, and they are useful to implement things like compilers: for each construct in the AST, such as an arithmetic operation or an if statement, the compiler can inspect the various parts of the construct and “directly” emit machine code for them.
But ASTs can also be used to execute a program. Here, take a look at what the main execution loop in EndBASIC used to look like:
async fn exec(&mut self, stmts: &[Statement]) -> Result<()> {
for stmt in stmts {
match stmt {
Statement::Assignment(vref, expr) => {
self.assign(vref, expr).await?
}
Statement::BuiltinCall(name, args) => {
let cmd = match self.symbols.get(&VarRef::new(name, VarType::Auto))? {
Some(Symbol::Command(cmd)) => cmd.clone(),
Some(_) => return new_syntax_error(format!("{} is not a command", name)),
None => return new_syntax_error(format!("Unknown builtin {}", name)),
};
cmd.exec(args, self)
.await
.map_err(|e| Error::from_call_error(cmd.metadata(), e))?;
}
Statement::If(branches) => {
self.do_if(branches).await?;
}
Statement::While(condition, body) => {
self.do_while(condition, body).await?;
}
// ... other statement types elided ...
}
}
}
In the above code snippet, note how walking the AST (represented in stmts
as a sequence of
Statement
values) for execution is easy: there is a loop that “sees” every high-level construct
and can then simulate what should be done for each of them.
To add some more color to the above, here is how an IF
statement’s execution looked like. Note
that we explicitly test for each conditional branch that appears in the program and, once we find
the branch that is true (if any) we recursively call into the exec
function to evaluate its
contents and then terminate execution:
async fn do_if(&mut self, branches: &[(Expr, Vec<Statement>)]) -> Result<()> {
for (expr, stmts) in branches {
match expr.eval(&mut self.symbols).await? {
Value::Boolean(true) => {
self.exec(stmts).await?;
break; // Stop executing branches.
}
Value::Boolean(false) => (),
_ => return new_syntax_error("IF/ELSEIF require a boolean condition"),
};
}
Ok(())
}
Similarly, here is how WHILE
loop execution looked like. Note again that we have a native loop
that iterates over the body of the interpreted loop and exits when the condition is false:
async fn do_while(&mut self, condition: &Expr, body: &[Statement]) -> Result<()> {
loop {
match condition.eval(&mut self.symbols).await? {
Value::Boolean(true) => self.exec(body).await?,
Value::Boolean(false) => break,
_ => return new_syntax_error("WHILE requires a boolean condition"),
}
}
Ok(())
}
Looks pretty simple and reasonable, right?
In-between: Hacking GOTO support
Indeed it does. The AST-based executor presented above is simple. But it comes with significant
problems—the most salient one being that implementing the much desired GOTO
is very difficult.
Consider an EndBASIC program like this:
IF 1 < 4 THEN
WHILE TRUE
GOTO @exit
WEND
END IF
@exit: PRINT "hi"
If we parse the above into an AST and try to execute it with the algorithm presented earlier, we end up with a native call stack that looks like this:
exec <-- @exit is visible at this level.
| -> do_if
| | -> exec
| | | -> do_while
| | | | -> exec
| | | | | -> do_goto <-- How do we jump to @exit?
Which is a problem. There is no way the do_goto
native function can jump to @exit
right away.
The only thing we can do is unwind the native stack until we reach the nesting level where @exit
is defined and continue execution there. And because we can do this and adding GOTO
as a
feature was not negotiable at this point, I implemented this exact solution as a first cut in
commit
8ef803e.
Here are the crux of the changes to the AST-based execution loop:
async fn exec(&mut self, stmts: &[Statement]) -> Result<()> {
let mut i = 0;
while i < stmts.len() || self.pending_goto.is_some() {
// If there is a "pending GOTO", see if we can find the target label
// at this level by scanning over all statements.
if let Some(target) = self.pending_goto.as_ref() {
i = 0;
while i < stmts.len() {
if let Statement::Label(label) = &stmts[i] {
if target == label {
i += 1;
self.pending_goto = None;
break;
}
}
i += 1;
}
// The label wasn't found here. Unwind this stack frame and the
// calling exec() will try this again.
if i == stmts.len() {
return Ok(());
}
}
// Now execute statements at this level as before ...
match &stmts[i] {
Statement::Goto(label) => {
self.pending_goto = Some(label.clone());
// Stack unwinding will happen at the next iteration, where we
// will first try to find the target at the current level.
}
Statement::Label(label) => {
// Nothing to do (other than sanity checks). The labels as
// targets were handled above.
}
// ... other statement types elided ...
}
i += 1;
}
}
Does the job. Unfortunately, this is not pretty because the pending_goto
looks like a clutch,
and this is not efficient because what should be a simple instruction jump (a thing that
microprocessors can do just fine) becomes a convoluted stack unwinding process plus a target lookup
at every level.
What’s worse is that the above isn’t feature-complete either because we are only unwinding the stack (going up the call chain) but never go down. Think about this other program:
IF 1 < 4 THEN
WHILE TRUE
GOTO @other
WEND
ELSE
@other: PRINT "hi"
END IF
This snippet differs from the above in that the @other
target of the GOTO
cannot be found by
simply unwinding the stack: the executor would need to unwind the stack, yes, but it would also need
to go into every possible branch of the program to look for the target, which would either be even
more inefficient or it would require a lot of extra bookkeeping to track where the targets live.
And those are not the only problems. What if, instead of a GOTO
, we want to implement a
GOSUB
—a call to an arbitrary location that can later return execution to
where the call happened? “Impossible.”
After: compiler plus bytecode execution
All of the problems above go away if we “flatten” the AST by replacing all nesting with (conditional) jumps and then execute the code with a program counter and a call stack. If that sounds like writing a compiler and a virtual machine, it’s because this is precisely what this means. But it needn’t be as complicated as it sounds. In fact, I procrastinated on this for over a year and first implemented the hack above because I feared the change but, in the end, coming up with a prototype was a matter of a couple of hours.
Let’s go back to our “complex” code statement from earlier. This EndBASIC program:
IF 1 < 4 THEN
WHILE TRUE
GOTO @other
WEND
ELSE
@other: PRINT "hi"
END IF
… is equivalent to this after flattening the AST:
IF 1 < 4 THEN: GOTO @if_else: END IF
@while_start:
IF NOT TRUE: GOTO @while_end: END IF
GOTO @other
GOTO @while_start
@while_end:
GOTO @if_end
@if_else:
@other: PRINT "hi"
@if_end:
All nesting is gone! The only thing we are left with are conditional jumps (the IF
one-liners),
unconditional jumps, and their corresponding targets. We have essentially gone through a
compilation phase. And if we replace the string labels with line numbers (akin to addresses), just
like an assembler would do, we are left with this:
1 IF 1 < 4 THEN: GOTO 6: END IF
2 IF NOT TRUE: GOTO 6: END IF
3 GOTO 6
4 GOTO 2
5 GOTO 7
6 PRINT "hi"
… which looks like an awful lot like native machine code, doesn’t it?
And this is what commit c0cb9a3 did. In this change, I added a rudimentary version of a compiler that flattens the AST and a trivial bytecode-based virtual machine that executes every instruction. The execution loop thus becomes:
let mut pc = 0;
while pc < instrs.len() {
let instr = &instrs[pc];
match instr {
Instruction::Assignment(span) => {
self.assign(span).await?;
pc += 1;
}
Instruction::BuiltinCall(span) => {
self.call_builtin(span).await?;
pc += 1;
}
Instruction::Jump(span) => {
pc = span.addr;
}
Instruction::JumpIfNotTrue(span) => {
match span.cond.eval(&mut self.symbols).await? {
Value::Boolean(true) => pc += 1,
Value::Boolean(false) => pc = span.addr,
_ => {
return new_syntax_error(span.cond.start_pos(), span.error_msg);
}
}
}
// ... other instruction types elided ...
}
}
And with this, you can imagine that implementing GOTO
becomes trivial—as does implementing
GOSUB
and
ON ERROR
,
which are other features that are already available at HEAD
.
Now, the above is far from being true bytecode: for example, the instruction types that are
currently defined are still too high level. Something like the BuiltinCall
instruction type does
an incredible amount of work because this is evaluating (in native code) the various arguments to
the call and calling into the native function. To turn this into true bytecode, the arguments
themselves—which are arbitrary expressions—would need to be evaluated via bytecode instructions
with a supporting stack or register bank. And to be real bytecode, the instructions should have a
compact binary representation.
Having a true bytecode and a memory map sounds super-fun to build. It may even be necessary when
adding support for custom-defined functions. And it would pave the way to having PEEK
and POKE
.
But we didn’t need any of this to solve the GOTO
problem. So, for now, this is all that EndBASIC
0.10 will ship with.
Stay tuned for the forthcoming release announcement!