One thing that bothers me quite a
bit lot about various language implementations is that the core of their interpreter isn’t clearly separate from their standard library. This makes it hard to embed those interpreters into other programs because it’s not obvious how to limit their side-effects and because the interpreter dependency can be heavy.
In my mind, embedding an interpreter into a program should come with the default guarantee that the interpreted code will have of zero side-effects. The features of the interpreter should all be opt-in rather than opt-out so that it is painfully obvious which behaviors are exposed to the interpreted code.
The one case where a lot of languages seem to fail this division is in one of the most fundamental primitives:
EndBASIC’s design strictly adheres to these principles. From the very beginning, I’ve tried to keep a clear split between the language parser (lexing, AST construction, expression evaluation, and program execution) and the commands that form the standard library. Going back to the previous example, the language parser has no special knowledge of
With the EndBASIC 0.5 release, I’ve gone one step further and drawn the firm line that separates the language parser and the standard library by putting them into separate crates:
endbasic-std. This separation makes it easy to keep the dependencies of the core to a minimum and also helps ensure that the APIs exposed by the core are sufficient to support this level of decoupling.
As of this writing, the EndBASIC core interpreter weighs around 2,500 lines (excluding tests) and a simple Rust program that loads the interpreter is about 480kb after stripping symbols. It’s not the tiniest interpreter you will find (Lua is 247kb), but it’s a pretty small library by today’s standards if you ask me—especially considering I’ve put near to no effort in optimizing it for size.
In this post, I want to take you through the basics of embedding the interpreter into a Rust program. To do so, we will go through a couple of examples that illustrate how to bridge the gap between interpreted EndBASIC code and native Rust code.
WARNING: Please be aware that the Rust API is still in flux and I’m making no backwards compatibility promises until EndBASIC 1.0, and I can’t predict when that’ll be.
The basics with an empty machine
The EndBASIC interpreter is defined as an abstract machine represented by the
Machine type in the
endbasic-core crate. In its simplest form, we can run a piece of EndBASIC code by doing:
// The script we want to run.
let script = r#"first_var = 123: second_var = "foo""#;
// Create an empty machine (no builtin commands nor functions) and run the script.
let mut machine = endbasic_core::exec::Machine::default();
block_on(machine.exec(&mut script.as_bytes()).expect("Execution failed");
And, later on, we can observe the side-effects that our sample
script program had on the machine (and on the machine itself; nowhere else):
Easy, isn’t it? We started by creating an empty
Machine object that contains zero builtin commands or functions and then gave it some code to run via
exec. Because the machine isn’t hooked to any native callables, it’s impossible for the parsed code to escape the interpreter: all the script can do is modify symbols within the interpreter (and maybe enter an infinite loop).
Of special interest here is the need for
block_on. Because EndBASIC provides first-class integration with the web, all of the interpreter is async-friendly… which means that it’s on you to explicitly block for the results of
exec. Which is a good thing because there are different crates to do this (e.g.
futures-lite) so you can choose the one that best fits your needs.
Such a simple machine might not seem very useful at first sight, but two obvious use cases come to mind:
A configuration parser. Configuration is code so you might as well use a real programming language for it. However, you definitely don’t want configuration to have side-effects of any kind, and the EndBASIC core interpreter is able to guarantee that. The
config.rsexample builds up on this idea.
A domain-specific language (DSL). While the machine starts empty, you can trivially hook in your own commands and functions to connect the interpreter with native primitives written in Rust. With that, you can easily define a custom DSL for whatever purpose you need. Maybe you want to have a tiny language to control some peripheral attached to your Raspberry PI over GPIO? The
dsl.rsexample tries to elaborate on this idea.
All in all, I think there is an inherent beauty in calling into a interpreted language from a compiled one while having easy interop between the two and being able to precisely define what the interpreted code is allowed to do. EndBASIC gives you those features.
Integrating the standard library
In the examples above, we saw the usage of the
endbasic-core crate alone, which means we did not have access to any standard library features. But we can also get access to those by pulling in the heavier
The simplest way:
// The script we want to run.
let script = r#"PRINT LEFT("Hello, world!", 5)"#;
// Instantiate a machine with the commands and functions that are intended for
// scripting (avoiding interactive commands such as HELP) and run the script.
let console = ...;
let mut machine = endbasic_std::scripting_machine(console);
block_on(machine.exec(&mut script.as_bytes().as_ref()).expect("Execution failed");
And if you ran this, you’d see
Hello on the console.
But… which console? Note that I left
console uninitialized above. As any good library,
endbasic-std doesn’t assume it can alter the standard streams (
stderr) of your program, and it doesn’t do so. Instead, the machine has to be connected to a
Console abstraction, and it’s up to you to choose which one. The standard library ships with an optional
TerminalConsole implementation that talks to the real terminal via the
crossterm crate, but you can supply your own implementation if you need something lighter-weight.
More interestingly though: where does the
Console abstraction live? Remember that I said that the core ought to be agnostic to any commands, which means that it doesn’t even provide
Console trait lives in
endbasic-std. This makes for some awkward tradeoffs because the state that you would assume should like in the
Machine (the stdout stream, for example) must instead be kept within the commands themselves. In other words: each command instance is attached to the specific state objects it can affect—no more, no less—which I think is a good tradeoff even if the internal APIs look strange at times.
script-runner.rs example contains a functional version of this idea. You’ll notice it’s very simple, but that’s because there is not much more to it! That’s all the code you need to run a full-featured interpreter.
For a slightly more interesting example, however, look at how the web interface customizes the integration points between the standard library and the “operating system” so that the former can transparently interact with the WASM runtime.
With this, you should now be able to play around with the native side of the EndBASIC interpreter and connect it to other programs. I’m not suggesting that you go overboard and hook BASIC up into all super-modern stuff you write. Or… am I? A simple to understand language, originally designed (in 1964!) for ease of use, might actually be a good idea in some scenarios. And my specific implementation comes with only a small build and runtime cost! 😊