One thing that bothers me quite a bit lot about various language implementations is that the core of their interpreter isn’t clearly separate from their standard library. This makes it hard to embed those interpreters into other programs because it’s not obvious how to limit their side-effects and because the interpreter dependency can be heavy.

In my mind, embedding an interpreter into a program should come with the default guarantee that the interpreted code will have of zero side-effects. The features of the interpreter should all be opt-in rather than opt-out so that it is painfully obvious which behaviors are exposed to the interpreted code.

The one case where a lot of languages seem to fail this division is in one of the most fundamental primitives: print. This primitive writes messages to the console of course… but if we are embedding a language interpreter, we may not want the interpreted code to have the ability to touch our console. Therefore, we must have total control on the side-effects of the interpreter.

EndBASIC’s design strictly adheres to these principles. From the very beginning, I’ve tried to keep a clear split between the language parser (lexing, AST construction, expression evaluation, and program execution) and the commands that form the standard library. Going back to the previous example, the language parser has no special knowledge of PRINT. All the parser knows when it sees PRINT is that there is a call to some command that may or may not be defined. It’s up to the caller of the parser to install that symbol, and if it isn’t installed into the interpreter, then the interpreted programs will have no means of touching the console. (Lua is similar in this regard.)

With the EndBASIC 0.5 release, I’ve gone one step further and drawn the firm line that separates the language parser and the standard library by putting them into separate crates: endbasic-core and endbasic-std. This separation makes it easy to keep the dependencies of the core to a minimum and also helps ensure that the APIs exposed by the core are sufficient to support this level of decoupling.

As of this writing, the EndBASIC core interpreter weighs around 2,500 lines (excluding tests) and a simple Rust program that loads the interpreter is about 480kb after stripping symbols. It’s not the tiniest interpreter you will find (Lua is 247kb), but it’s a pretty small library by today’s standards if you ask me—especially considering I’ve put near to no effort in optimizing it for size.

In this post, I want to take you through the basics of embedding the interpreter into a Rust program. To do so, we will go through a couple of examples that illustrate how to bridge the gap between interpreted EndBASIC code and native Rust code.

WARNING: Please be aware that the Rust API is still in flux and I’m making no backwards compatibility promises until EndBASIC 1.0, and I can’t predict when that’ll be.

The basics with an empty machine

The EndBASIC interpreter is defined as an abstract machine represented by the Machine type in the endbasic-core crate. In its simplest form, we can run a piece of EndBASIC code by doing:

// The script we want to run.
let script = r#"first_var = 123: second_var = "foo""#;

// Create an empty machine (no builtin commands nor functions) and run the script.
let mut machine = endbasic_core::exec::Machine::default();
block_on(machine.exec(&mut script.as_bytes()).expect("Execution failed");

And, later on, we can observe the side-effects that our sample script program had on the machine (and on the machine itself; nowhere else):

assert_eq!(123, machine.get_var_as_int("first_var").unwrap());
assert_eq!("foo", machine.get_var_as_string("second_var").unwrap());

Easy, isn’t it? We started by creating an empty Machine object that contains zero builtin commands or functions and then gave it some code to run via exec. Because the machine isn’t hooked to any native callables, it’s impossible for the parsed code to escape the interpreter: all the script can do is modify symbols within the interpreter (and maybe enter an infinite loop).

Of special interest here is the need for block_on. Because EndBASIC provides first-class integration with the web, all of the interpreter is async-friendly… which means that it’s on you to explicitly block for the results of exec. Which is a good thing because there are different crates to do this (e.g. futures or futures-lite) so you can choose the one that best fits your needs.

Such a simple machine might not seem very useful at first sight, but two obvious use cases come to mind:

  1. A configuration parser. Configuration is code so you might as well use a real programming language for it. However, you definitely don’t want configuration to have side-effects of any kind, and the EndBASIC core interpreter is able to guarantee that. The example builds up on this idea.

  2. A domain-specific language (DSL). While the machine starts empty, you can trivially hook in your own commands and functions to connect the interpreter with native primitives written in Rust. With that, you can easily define a custom DSL for whatever purpose you need. Maybe you want to have a tiny language to control some peripheral attached to your Raspberry PI over GPIO? The example tries to elaborate on this idea.

All in all, I think there is an inherent beauty in calling into a interpreted language from a compiled one while having easy interop between the two and being able to precisely define what the interpreted code is allowed to do. EndBASIC gives you those features.

Integrating the standard library

In the examples above, we saw the usage of the endbasic-core crate alone, which means we did not have access to any standard library features. But we can also get access to those by pulling in the heavier endbasic-std crate.

The simplest way:

// The script we want to run.
let script = r#"PRINT LEFT("Hello, world!", 5)"#;

// Instantiate a machine with the commands and functions that are intended for
// scripting (avoiding interactive commands such as HELP) and run the script.
let console = ...;
let mut machine = endbasic_std::scripting_machine(console);
block_on(machine.exec(&mut script.as_bytes().as_ref()).expect("Execution failed");

And if you ran this, you’d see Hello on the console.

But… which console? Note that I left console uninitialized above. As any good library, endbasic-std doesn’t assume it can alter the standard streams (stdin, stdout, or stderr) of your program, and it doesn’t do so. Instead, the machine has to be connected to a Console abstraction, and it’s up to you to choose which one. The standard library ships with an optional TerminalConsole implementation that talks to the real terminal via the crossterm crate, but you can supply your own implementation if you need something lighter-weight.

More interestingly though: where does the Console abstraction live? Remember that I said that the core ought to be agnostic to any commands, which means that it doesn’t even provide PRINT. This in turn means that the core doesn’t need the concept of a “console”. Ergo the Console trait lives in endbasic-std. This makes for some awkward tradeoffs because the state that you would assume should like in the Machine (the stdout stream, for example) must instead be kept within the commands themselves. In other words: each command instance is attached to the specific state objects it can affect—no more, no less—which I think is a good tradeoff even if the internal APIs look strange at times.

The example contains a functional version of this idea. You’ll notice it’s very simple, but that’s because there is not much more to it! That’s all the code you need to run a full-featured interpreter.

For a slightly more interesting example, however, look at how the web interface customizes the integration points between the standard library and the “operating system” so that the former can transparently interact with the WASM runtime.

With this, you should now be able to play around with the native side of the EndBASIC interpreter and connect it to other programs. I’m not suggesting that you go overboard and hook BASIC up into all super-modern stuff you write. Or… am I? A simple to understand language, originally designed (in 1964!) for ease of use, might actually be a good idea in some scenarios. And my specific implementation comes with only a small build and runtime cost! 😊