A recent tweet that caught my attention read: “principal engineers should be on-call”. Of course they should be! I’m “surprised” they aren’t everywhere, but I can imagine some reasons to justify their situation. Let’s change that in this thread. ๐Ÿงต ๐Ÿ‘‡

@krisnova on July 10th, 2021

principal engineers should be on call

there - i said it

1.2k likes · Go to Twitter thread

Before we start, look: I’m one of those “principal engineers” and, while I don’t enjoy on-call much, I am on-call. Yes, I complain a lot about on-call—which is natural because there is always room for improvement. But I do not question being on-call.

Note also that “being on-call” here doesn’t necessarily mean carrying a pager. These principles cover any kind of rotation that the team has, like “build cop” or “ticket duty”. Any process you can think of as toil should be shared by the whole team.

Why? Here are my 10 good reasons to be part of the on-call rotation as a principal engineer. And, by the way, these also apply to managers and any other high-level role in the engineering ladder:

  1. โšก Potential for change. The on-call experience needs an owner; it most likely suffers from operational issues already. As a principal engineer, you have more power than others to effect change and fund projects to solve issues, but you must know what those are!

  2. ๐Ÿ‘ฉโ€๐Ÿซ Lead by example. Just because of your title, more junior engineers will mimic what you do, not what you say. Always do the right thing when on-call even if you don’t really feel like it. Over-communicate. Model the behaviors you want to see.

  3. ๐Ÿ™‹โ€โ™‚๏ธ Normalize asking questions. A good way for people to feel confident about asking questions is to witness their senior peers asking questions too, even the “dumbest” ones. It’s easy to spark healthy discussions by asking questions you know the answer to.

  4. ๐Ÿ‘ Show that on-call response matters. If the service’s health is important (and if it isn’t, why do you have a job), you must be in it. On-call isn’t toil to push to junior people. It’s a shared responsibility, and improving it benefits the product overall.

  5. โŒš Show that everyone’s time is valuable. I’ve sometimes heard that principal engineers should have fewer distractions to concentrate on the bigger problems. No: everyone has problems to solve; the problems are just different and the people deserve equal treatment.

  6. ๐Ÿ’ป Remain familiar with the system. It’s easy to think about a system in theoretical terms but, in practice, things can be quite different. Interact with the system frequently, under pressure. You’ll likely find problems you couldn’t even have imagined.

  7. ๐Ÿ‘ฅ Stay close to your customers. What you hear from support and product management can be very different from what your customers actually say. Observe and fix their issues first hand. Talk to them.

  8. ๐Ÿ”ฅ Spot systemic issues. Yes, you can probably notice “the big ones” from production meetings and on-call reports, but these are not the kinds of issues that burn people. Sharp corners everywhere does, and I talked about that in the Service health thread.

  9. ๐Ÿ’ช Improve training materials. If you remain engaged in on-call response, and especially if you change teams as a principal engineer, you’ll spot training gaps. Your experience may let you glance over them, but not everyone will have that privilege. Fix the learning path.

  10. ๐Ÿ™ Gain the respect of your team. And finally: if you don’t meet your team where the daily problems are, they won’t trust you nor any of your plans to “improve” things.

To summarize: if principal engineers (and managers) are on-call, it’s more likely that they’ll be aware of recurring problems and it’s more likely that those problems will be fixed. Otherwise, the experience will deteriorate and, eventually, the service/product will suffer.

And because of your level and the expectations that come with it, it’s on you to make all of this happen.