One of the three key tenets of Object Oriented Programming (OOP) is encapsulation: objects contain a state that, when observed from the outside, is always internally-consistent. To illustrate this, suppose you have a class to represent a rectangle¹, and that this class tracks the rectangle’s dimensions as well as its area:

class Rectangle {
    int width, height, area;

public:
    Rectangle(int w, int h) : width(w), height(h), area(w * h) {}

    std::pair<int, int> dimensions() const {
        return std::make_pair(width, height);
    }

    int area() const {
        return area;
    }

    void resize(int w, int h) {
        width = w;
        height = h;
        area = w * h;
    }
};

The Rectangle class above has two fields to represent its dimensions (width and height) and a derived field to represent its area, which we store in the object itself because it happens to be very expensive to compute (narrator: it really is not).

If we use this class via its public interface shown above (in a non-threaded environment), we can guarantee that area is always up-to-date with width and height. When the object is first constructed, all fields are in a consistent state, and if we call resize() at a later stage, we know that all fields are also consistent right before and right after the method call.

But note that this is not true within the class implementation: there is a point in time in the constructor and in the resize() method where the precomputed value of area is stale. And that’s OK from an encapsulation perspective: we only need to worry about internal consistency at the public boundaries of the class.

Let’s add error handling

Suppose we want to update our Rectangle class to ensure that width and height are positive and not zero. Easy-peasy:

class Rectangle {
    // ...

public:
    Rectangle(int w, int h) {
        if (w <= 0 || h <= 0) {
            throw std::runtime_error("Dimensions must be positive and non-zero");
        }

        width = w;
        height = h;
        area = w * h;
    }

    // ...
};

All good here from an encapsulation perspective. If we try to instantiate a Rectangle with invalid dimensions and use it, like this:

Rectangle r(some_width, some_height);
std::cout << "The area is: " << r.area() << '\n';

… the constructor will throw an exception, the object will never have existed, and no code after that line in the same scope will run. There isn’t any possibility in which we have an invalid r that is accessible from the code.

Let’s ban exceptions

As it turns out, there are many codebases out there that ban the use of exceptions in C++ (Google’s being a famous one). And if we don’t have exceptions, it’s impossible to return errors from a constructor.

Which begs the question: how do we handle the situation above? How can we detect errors during construction and prevent invalid objects from ever being created? The common answer seems to be to add a separate initialization method (say init), like this:

class Rectangle {
    // ...

public:
    Rectangle() : width(0), height(0), area(0) {}

    bool init(int w, int h) {
        if (w <= 0 || h <= 0) {
            return false;
        }

        width = w;
        height = h;
        area = w * h;
        return true;
    }

    // ...
};

And, in turn, this means that the class now has to be used like this:

Rectangle r;
if (!r.init(some_width, some_height)) {
    // Invalid dimensions; handle error!
}
std::cout << "The area is: " << r.area() << '\n';

The horror. Note the abomination we have introduced here. The caller now must go through two separate steps to create the object: one is Rectangle r, which calls the constructor, and another is the call to r.init(). Critically, the internal state of the object is now invalid across these two steps, and this inconsistent state is observable through the public interface.

Oops, we have violated the encapsulation principle. Our program can now have instances of a Rectangle that are invalid (akin to null values), and code can reference them just fine. This is bad. Really, really bad. As I said a few months ago in reply to a tweet praising RAII:

@jmmv on August 4th, 2021 · Replying to @awesomekling

Making invalid states irrepresentable is probably the best thing one can do to improve reliability when designing a new piece of code. It's a difficult mentality shift though!

30 likes · 5 retweets · Go to Twitter thread

Unfortunately, by separating construction from initialization like we did above, we have gone against this principle. Yes, we had to do it because of the constraints of our environment, but it’d be best if we didn’t have to violate a core tenet of OOP.

A possible solution

There are ways to mitigate, but not fully resolve, this problem. My preferred way is to use a static factory method to perform the object construction and initialization along with the required error handling. Consider the following:

class Rectangle {
    // ...

    Rectangle(int w, int h) : width(w), height(h), area(w * h) {
        assert(width > 0 && height > 0);
    }

public:
    static std::optional<Rectangle> create(int w, int h) {
        if (w <= 0 || h <= 0) {
            return std::nullopt;
        }
        return Rectangle(w, h);
    }

    // ...
};

In this version, we have restored the original constructor that does not perform validation. Objects are now always constructed in a complete state, but it’s again possible for those objects to contain invalid state: the caller of the constructor could supply invalid values. However, the constructor is now private, so this is sound from an encapsulation perspective; remember that we only must maintain consistency and validity at public interface boundaries.

All great, but… with a private constructor, we cannot create instances of our class! To solve this, we can introduce a static factory method that either returns a new Rectangle if the parameters are valid, or an empty object if the inputs are invalid. With this approach, the public interface of Rectangle is now properly encapsulated again: all objects returned by create() are guaranteed to be valid, and if the parameters are invalid, no object is created at all. (We still have to deal with null-ness though…)

Unfortunately, this code is not equivalent to the version that used exceptions. Note that our factory method returns a different object type and is in full control of the object’s creation. This has a problem when dealing with inheritance, because a subclass of Rectangle cannot use the factory method to perform partial initialization. But this is a minor issue that can be worked around.

Also note that the use of std::optional is just one possibility here. Alternatives that also work include std::unique_ptr, boost::optional, or absl::StatusOr, to name a few.

Takeaways

Here are the key points I’d like you to remember from this post:

The single best thing you can do to increase the reliability of a program is to make invalid states impossible to represent.
Keep constructors “dumb”: all they should be doing is assign fields. This applies irrespectively of the use of exceptions.
Avoid init-like methods if at all possible. If you have actual logic during object construction, put that code in a static factory method (with the bonus of a better-named “constructor”), or use dependency injection to push that logic to the caller.
As a corollary to the above, init-like methods that return void are useless. Do not separate construction from initialization just because it’s a widespread practice in your codebase. Keep all initialization in the constructor unless it’s impossible to do so.
If you must provide an init-like method to handle errors, do the smallest possible amount of work within it. Any fields that can be initialized in the constructor should be initialized there (because this allows you to make them const, for example). Reserve the init method for the few fields that are subject to error checking.
Consider adding an is_initialized boolean to the class, and assert that it is true in all methods (except in init, where you would assert that it is false). This adds overhead, both to the code and runtime, but it will help you detect the cases where you end up using a partially-initialized object—which I guarantee will happen.

One final thought to conclude: the reason I really enjoy writing in Rust is because the language forces you to care about these correctness properties (via the Result<T,E> type in the context of this post). If you don’t know Rust but regularly code in C++, I’d strongly recommend you to learn Rust as well: you’ll change the way you think about structuring your data types and algorithms, and will spot problematic patterns in C++ with ease.

Edit (2021-11-25): Updated the static factory method example to return std::optional instead of std::unique_ptr to avoid stack allocations. I had originally used the latter because that’s what I most see in the current codebase I work in and rarely experience the former (in C++). Also added a note saying that the Rectangle example is, of course, trivial.

The use of a trivial Rectangle class is for illustration purposes only. Of course this example is trivial and there are ways around it. In the real world, think about classes that perform “real work” and interact with external systems, such as a wrapper around a database connection. ↩︎