Having a fast and responsive app is orthogonal to “knowing your big Os”. Unfortunately, most tech companies over-emphasize algorithms in interviews and downplay systems knowledge, and I believe that’s one reason behind sluggish apps and bloated systems.
I’ve seen this play out repeatedly. Interviewers ask a LeetCode-style coding question, which is then followed by the ritual of discussing time and memory complexity. Candidates ace the answers. But then… their “real” code suffers from subtle yet impactful performance problems.
Focusing on big O complexity rarely matters in most apps. Sure, it’s important to think about your algorithmic choices, but there are so many more details to worry about that have a direct impact on app performance and responsiveness. Let’s look at a bunch of them!
A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.
📜 Is your complex O(n) algorithm faster than the trivial O(n2)? Theoretical complexity matters, but know your expected value of “n”. The linear algorithm may be slower than the quadratic one for your specific scenarios and harder to prove correct.
📜 Are you aware of how interfaces can bite you? E.g. in Java, the
List.getgeneric method will run in O(1) time for
ArrayListbut in O(n) time for
LinkedList, and computing dict hash keys is likely not O(1). These are not obvious at the call site.
📜 Do you grasp how much work fits in short periods of time? 1ps, 1ns, 1us, 1ms, 1s… they all sound small but their relative differences are huge. Map them to 1 second, 16 minutes, 277 hours, 380 months, and 317 centuries respectively. Not the same, huh?
💾 Where does your data live? Different memory/storage mediums have vastly different access times. Know the relative differences in speed between register accesses (ps), L1/L2/L3 (few ns), RAM (many ns), SSD (us), HDD (few ms), and network (many ms).
💾 How do you access storage drives? Large sequential I/O operations are faster than lots of random small ones. This is true of HDDs (seek time is ~3-10ms), but also of SSDs due to the fact that processing more ops takes more processing time.
💾 What about the file system on top of those drives? ext4, ZFS, NTFS… they are all vastly different, and the types of I/O patterns that work well in one may be slow in another.
💾 Are you accessing an SSD or an old spinning hard disk? Hard disks still exist. It might be safe to assume SSDs for certain apps, but not always. E.g. users may store large photo libraries in HDDs: can you handle I/O from those efficiently?
💾 Are you doing pure data copies via the CPU? There are features like DMA to offload data transfers to I/O devices, and there are features to minimize userspace<->kernel copies. But, in any case, memory bandwidth isn’t free and is rarely thought about.
🌐 Do you know that high latency usually limits max bandwidth? Bandwidth and latency are two orthogonal metrics. Most networked apps will probably feel fine on a low-bandwidth connection but will feel terrible on a high-latency one. Simulate these scenarios.
🌐 How do you handle high network latency? Does your code block, slowing down everything else? Do you really have to block the UI? Can you schedule other work too happen in the meantime?
🌐 How many network requests do you need per interaction? Each request adds a multi-ms penalty and every request increases the chances of hitting high latency. If a service you depend on gives you p50=10ms and p99=1s, you’ll hit a 1s delay if you send 100 requests.
🌐 How do you handle retries on failed requests? Any retry will tank performance and how you expose or hide what’s going on can make a world of a difference. Perception matters.
On data handling
❄️ Are you using your database’s query facilities or are you fetching all data and then filtering on your own? Databases can perform elaborate queries. Favor running those on the database: the server is closer to the data and you’ll minimize network transfers.
❄️ Are your database queries degrading to full table scans? Maybe you don’t have the right indexes or partitioning schemes to support the common queries that your app needs. Analyze and profile query execution. You probably don’t need to jump to NoSQL.
🪵 How many log messages do you emit? Logging is not free. It has a cost on performance (I’ve seen servers spend ~20% of their CPU just for background logging, all the time), and noisy logs also have an operational cost.
On CPU and memory
🖥️ How do you lay large amounts of data in memory? What are the access patterns? Cache locality matters. Remember the relative differences in access times between the L1, L2 and L3 caches vs. main memory (days vs. years if scaled up).
🖥️ How data-intensive are your tight loops? It’s easy to think about CPU speed, RAM usage and I/O times… but memory bandwidth is a fourth dimension that almost nobody thinks about, and it is becoming scarcer with faster CPUs and I/O.
🧵 Do you take advantage of multiple cores? Single cores aren’t getting faster fast enough anymore, but computers usually have cores to spare. Can you use them at all? Effectively?
🧵 Do you use multiple threads to parallelize CPU-intensive operations, or do you use them to handle blocking I/O? In the former case, you want as many threads as cores. In the latter, switching to an event loop model may work better than a large thread pool.
🧵 Do you perform expensive computation on the thread that’s responsive for the app’s UI? This will introduce pauses and render your app unusable for periods of time, which will make it feel slow. Perception matters.
✏️ Are you rendering directly onto the screen? Flushing your drawing operations right away? Read on double buffering, but beware of the latency you introduce by delaying the final display.
✏️ Are you leveraging your graphic card’s features, or are you drawing on the CPU? Contrary to popular belief, fancy desktop effects are not expensive if done on the GPU—and, actually, doing them on the GPU frees CPU resources.
On development time
🗣️ How much performance are you leaving on the table by “coding faster” in an interpreted language? Computers might be “fast enough” now, but interpreted languages tend to be slower than compiled ones and may make the overall system sluggish.
🪨 How big is your compiled app? Multi-MB apps are a problem when downloaded over the network, but they are also a problem on disk because they are slow to install and uninstall. Big apps may run fast, but they start slow. Perception matters.
⌚ How long does your code take to compile? If it takes too long, the team suffers and features and bug fixes will be delayed. Different choices in how you write code and how you modularize it can bring big differences in productivity.
⌚ How long do the tests take to run? If they take too long, they won’t be run on every commit, introducing friction for everyone else down the road and delaying product launches.
That’s all I could think of for now! I’m not a performance expert at all, but I do like systems and enjoy thinking about their holistic behavior. If you have any more tips, feel free to add them to the thread and I may incorporate them into the list!