Initial versions of the EndBASIC Service, and therefore initial versions of EndTRACKER, used dynamic dispatch to support abstract definitions of system services such as the database they talk to and the clock they use. This looked like a bunch of Arc<dyn Foo> objects passed around and was done to support extremely fast unit testing.

When I generalized the core logic of these services into the III-IV framework, I decided to experiment with a switch to static dispatch. The rationale was that using static dispatch better aligns with the design of well-regarded crates in the Rust ecosystem, and also because I wanted to avoid unnecessary runtime costs in the foundational pieces of my web services.

Let me tell you that this decision was a huge mistake and that the experiment has utterly failed. Using static dispatch has been a constant source of frustration due to the difficulty in passing types around and reasoning about trait bounds. The situation had gotten so bad that I dreaded adding new functionality to my services whenever a change to a statically-typed struct was needed, because that meant adding yet another type parameter and plumbing it through tens of source files.

In lieu of the difficulties, which eventually turned into blockers to implementing new features, I made the choice of going back to dynamic dispatch. The goal was to gain ergonomics at the expense of a supposedly-negligible runtime cost. Let me tell you about the problems I faced, the refactoring journey, and some measurements I gathered after the rewrite.

A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.

0 subscribers

Follow @jmmv on Mastodon Follow @jmmv on Twitter RSS feed

Initial impressions

The adoption of static dispatch in III-IV started pretty simple and it did the job well. Even though it took me days of fighting with the Rust type system, I eventually got it to work. The production binary was statically bound to the PostgreSQL database backend and the unit tests were bound to SQLite, all while respecting the type safety offered by sqlx and without having virtual function calls anywhere.

Let’s take a peek at what the sample key/value store core pieces looked like by going through the architectural layers described in MVC but for non-UI apps.

At the bottom layer, the database, there was a transaction trait to supply the operations required by the business logic layer:

#[async_trait]
trait Tx: BareTx {
    async fn get_keys(&mut self) -> DbResult<BTreeSet<Key>>;
    
    // ... more database operations ...
}

This trait was separately implemented for PostgreSQL and SQLite by providing separate PostgresTx and SqliteTx specific types, and both the specific transaction type and the database backing it were chosen at build time where the database connection was established.

Moving up to the business-logic layer, the Driver was parameterized on the domain-specific Tx so that it could have access to those operations (and only those):

#[derive(Clone)]
struct Driver<D>
where
    D: Db + Clone + Send + Sync + 'static,
    D::Tx: Tx + Send + Sync + 'static,
{
    db: D,
}

impl<D> Driver<D>
where
    D: Db + Clone + Send + Sync + 'static,
    D::Tx: Tx + Send + Sync + 'static,
{
    async fn get_keys(self) -> DriverResult<BTreeSet<Key>> {
        let mut tx = self.db.begin().await?;
        let keys = tx.get_keys().await?;
        tx.commit().await?;
        Ok(keys)
    }
}

// ... more impl blocks for different driver operations ...

Note how, in the above, the db.begin() method call returns an instance of a Tx right away, ensuring that callers always issue database operations as part of a transaction. This had been a deliberate decision from the very beginning to prevent issuing standalone database calls that could compromise the correctness of the data, because there was no scenario in which a transaction was not necessary.

Finally, the upper REST layer took a Driver as the engine to run the API requests through and, as a consequence, the REST handlers all had to be parameterized like the underlying Driver:

async fn handler<D>(
    State(driver): State<Driver<D>>,
) -> Result<impl IntoResponse, RestError>
where
    D: Db + Clone + Send + Sync + 'static,
    D::Tx: Tx + Send + Sync + 'static,
{
    let keys = driver.get_keys().await?;
    Ok(Json(keys))
}

This is where things started to look finicky because the REST layer now had to spell out the internals of the Driver… but it didn’t look so bad at the beginning. Combining this, the sunken cost fallacy—it had taken me days to devise how to make the above work—and the idea of avoiding an unnecessary abstraction at runtime made me plough ahead with this implementation.

The problems

It soon wasn’t all roses. What you could see above was an extremely simplified view of how things ended looking like in a real service with more than just the database dependency. Without further ado, let me present to you the monstrosity that I ended up with in EndTRACKER. Here is the Driver definition for the data plane microservice:

#[derive(Derivative)]
#[derivative(Clone(bound = ""))]
struct Driver<A, C, D, G, QD>
where
    A: AbusePolicy<D::Tx> + Clone + Send + Sync + 'static,
    C: Clock + Clone + Send + Sync + 'static,
    D: Db + Clone + Send + Sync + 'static,
    D::Tx: DataTx + From<D::SqlxTx> + Send + Sync + 'static,
    G: GeoLocator + Clone + Send + Sync + 'static,
    QD: Db + Clone + Send + Sync + 'static,
    QD::Tx: ClientTx<T = BatchTask> + From<QD::SqlxTx> + Send + Sync + 'static,
{
    db: D,
    clock: C,
    // ... more fields ...
}

But that’s not all, oh no. This chunk also infected the REST layer, which in theory should not care about the specifics of the driver layer. Here, look at this RestState type, which is a wrapper over the data that the REST API handlers need to operate:

#[derive(Derivative)]
#[derivative(Clone(bound = ""))]
struct RestState<A, C, D, G, QD>
where
    A: AbusePolicy<D::Tx> + Clone + Send + Sync + 'static,
    C: Clock + Clone + Send + Sync + 'static,
    D: Db + Clone + Send + Sync + 'static,
    D::Tx: DataTx + From<D::SqlxTx> + Send + Sync + 'static,
    G: GeoLocator + Clone + Send + Sync + 'static,
    QD: Db + Clone + Send + Sync + 'static,
    QD::Tx: ClientTx<T = BatchTask> + From<QD::SqlxTx> + Send + Sync + 'static,
{
    driver: Driver<A, C, D, G, QD>,
    // ... more fields ...
}

There are several problems with the above:

  1. It is Super Ugly (TM). There is no other way to put it. As much as I like Rust, things like this are painful and scary—but not as painful as deranged modern C++.

  2. These where declarations where repeated 44 times in 37 different files (that is, almost all files). This polluted source files with details they don’t care about. Any small change to the Driver required updating all these repeated chunks in sync. I’m not even sure why Rust requires the duplication and why it’s sometimes OK for the trait bounds to diverge among the various impl blocks, but the duplication is necessary.

  3. It poisoned the REST layer. As mentioned above, the REST layer wants to pass around a RestState object that contains the Driver and other data fields that are only necessary at that level. Yet… to achieve this the REST layer had to replicate all of the internal details of the Driver.

  4. I had to use derivative to remove unnecessary (?) clone trait bounds. The need to have a cloneable Driver and RestState comes from how the axum HTTP framework dispatches route execution, and figuring this out took quite a while. Furthermore… the way this “works” is still obscure to me.

  5. It became impossible to compose transaction types. This is a problem with my design and not an inherent issue with static dispatch, but the use of static dispatch guided me towards this design. Note that, in the above, there are two database instances: D and QD, each with a different associated Tx type. While I wrote some contortions to support sharing the same underlying database connection between them, I never got to replicating those to also share an open transaction. The complexity was already at unmanageable levels to push this design any further. But I needed a solution to this problem.

All in all, the use of static dispatch was slowing me down in building new features as these constructs made me dread modifying certain aspects of the code. And what’s worse: certain initial design choices started showing up as true inefficiencies in production like the inability to issue standalone database calls outside of a transaction. The original goal of minimizing runtime costs was made significantly worse. Fixing these issues required a redesign so it was time for a change.

Switching to dynamic dispatch

The goal with the redesign was to drop all static type parameters and replace them with dyn trait objects. In this way, the Driver would encapsulate these details in just one place and all other code would not have to care about the specific field definitions within this type.

It is easier said than done, but the results speak for themselves. Here is how the new Driver for the simplified key/value store looks like:

#[derive(Clone)]
struct Driver {
    db: Arc<dyn Db + Send + Sync>,
}

impl Driver {
    async fn get_keys(self) -> DriverResult<BTreeSet<Key>> {
        let mut tx = self.db.begin().await?;
        let keys = db::get_keys(tx.ex()).await?;
        tx.commit().await?;
        Ok(keys)
    }
}

And, similarly, this is how one of the REST API handlers looks like:

async fn handler(
    State(driver): State<Driver>,
) -> Result<impl IntoResponse, RestError> {
    let keys = driver.get_keys().await?;
    Ok(Json(keys))
}

That’s it. This is the way to declare the Driver over a generic database, the way to write a business-logic operation on top of this database, and the way to write a REST API handler that calls into this operation. The Arcs and the Send + Sync annotations are somewhat ugly but they are nowhere as ugly as the previous disaster. In this version, there is no noise.

What’s more: as part of the redesign, I could throw away the “everything behind a transaction” idea and allow the caller to choose the best execution mode for its needs. Note the tx.ex() call above, which obtains an “executor” from the database and that can be used to talk to the database. This specific call obtains an executor from a transaction, but the same db.ex() method also exists to obtain a standalone executor. Describing how this works is out of the scope of this post though.

Show me the metrics

The pervasiveness of static dispatch in the Rust ecosystem helps leverage “zero-cost abstractions”, but it does come with a cost. Namely: programming time cost. It is great to have a choice, and it is great that many general-purpose Rust crates use static dispatch so that you don’t have to pay unnecessary taxes… but it was not the right choice for me. I might have done things really wrong in my original design and these measurements may not be sustained for other projects, but let’s look at some numbers anyway.

  • Code size. This is how the refactoring looks like according to a git diff --stat:

    ProjectFiles changedLines addedLines deletedDiff
    iii-iv3216912115-242 (-3%)
    endtracker7437514963-1212 (-11%)

    The changes to the III-IV framework are small because the use of static typing within the framework itself wasn’t pervasive: after all, the framework was just exposing the building blocks and not using them on its own. But the 11% code reduction in EndTRACKER alone is very significant.

  • Binary size. Looking at the sizes of the main EndTRACKER binary and the supporting unit testing binaries, both under the release and debug configurations, we observe:

    BinaryModeBefore (MBs)After (MBs)Diff
    main binaryrelease23.3026.47+3.17
    common testsrelease16.7517.11+0.36
    batch testsrelease25.4524.96-0.49
    control testsrelease19.1018.77-0.33
    data testsrelease20.9020.89-0.01
    main binarydebug261.14280.62+19.48
    common testsdebug169.27171.62+2.35
    batch testsdebug239.72238.07-1.65
    control testsdebug192.06192.21+0.15
    data testsdebug208.40207.93-0.47

    I was expecting a slight increase in binary size with the move to dynamic dispatch because the compiler and linker don’t have as many opportunities for inlining and optimizing code. While the results seem to be all over the place, they seem to agree with my expectations: the binary sizes are larger when using dynamic dispatch. Some test binaries are smaller indeed, but this is most likely due to how the tests changed and not necessarily because of the switch from static to dynamic dispatch.

  • Compilation time. I measured an incremental build after modifying a core type in the EndTRACKER codebase to change its internal layout, starting from a cargo clean slate and using the mold linker. With static dispatch, the incremental build times of the binary and tests were somewhere between 12 to 13 seconds, and with dynamic dispatch they dropped to just below 12 seconds. The difference is minimal, and the codebase isn’t large enough to obtain a good signal out of this metric.

    To be honest, I was hoping for a much larger improvement in incremental compilation times. My reasoning was that dealing with the type constraints that existed before must have been expensive, so removing them should reduce compiler execution times. My measurements did not prove this true, unfortunately. Or if they did, the improvements are negligible in this small codebase.

  • Refactoring effort. I spent a couple of days figuring out what the best abstraction was and then I spent many hours during a recent long flight doing all of the mostly-mechanical changes to the EndTRACKER codebase. As usual, updating the tests was the most painful part of all—but also the one that gave me confidence to deploy a new build to production with ease.

  • Runtime cost. This one… well, I haven’t been able to measure it. None of my web services are CPU-bound so the cost of the virtual function dispatch is negligible.

To summarize: not much seems to have changed with this rewrite. Binaries are slightly larger indeed, but not by a lot. However… the benefits in productivity are massive already.

Productivity benefits

One of the benefits of this rewrite is that I’ve been able to finally resolve a long-standing deficiency in test coverage, which I briefly mentioned it in the conclusion of the Unit testing a web service post. This deficiency was that the test suites for the driver and the REST layer ran against SQLite unconditionally and I did not have a way to run them against a real PostgreSQL instance. Well, I have an answer now. All it took after the switch to dynamic dispatch was to introduce a helper function like this one:

async fn connect_to_test_db() -> Arc<dyn Db + Send + Sync> {
    let db: Arc<dyn Db + Send + Sync> = {
        let name = get_optional_var::<String>("TEST", "DB")
            .expect("TEST_DB must be a string");
        match name.as_deref() {
            Some("postgres") => Arc::from(postgres::testutils::setup().await),
            Some("sqlite") | None => Arc::from(sqlite::testutils::setup().await),
            Some(name) => panic!("Invalid TEST_DB {}", name),
        }
    };

    super::init_schema(&mut db.ex()).await.unwrap();

    db
}

… use it to connect to the database in all tests, and configure a GitHub Actions job with TEST_DB=postgres to run the test suites against the production database.

Another benefit is that I have finally unstuck something I’ve been working on-and-off for months and that I had been procrastinating on due to its difficulty. That is: I’ve been trying to generalize the account creation and session management pieces of the EndBASIC Service into III-IV so that I can reuse those in EndTRACKER. This was made really difficult due to static dispatch, but now it’s a piece of cake. Which means I should be able to add user accounts in EndTRACKER really soon now and maybe finally open it up to the public.

All in all, I’m satisfied with the change. The code is much simpler now and I do not foresee the small costs at runtime nor in binary size to be problematic at all for my specific use cases.