Why Spend Time on New Tech?
2024-10-22
Five days of logging in Python had me asking around about the true costs of communication hidden among an app's layers.
Table of Contents
- If Your App Fails In the Layers, Does It Make A Sound?
- The Balancing Act: Who's Paying For All This?
- Another Take From Johnny Magrippis
Using nearly-free, tried and tested tech with known bugs is preferable to sinking a bunch of time into new, unproven tech with unknown bugs. — Swyx
Ask any software developer who's tried to pitch a new framework in a workplace where the "old" framework works just fine: You won’t get far if you’re proposing to try something both new AND uncertain.
But sometimes it pays to look at it with fresh eyes.
The cost of uncertainty may be obvious and upfront, but looking at things through an "if ain't broke, don't fix it" lens can hide incremental costs that make a bigger difference in a shorter time than you might expect. Or at least more than I expected.
I re-lived this lesson this past month, spending 5 days doing in Python what it would've taken me less than half a day to do in Rust.
But first... the disclaimers:
- I like Python and owe a lot to it. I shed my fear of numbers (and how to carry them) thanks to the language.
- NumPy's still the best library going today if we're talking about vectors.
- Burning 5 days of logging was my responsibility first and foremost. I'm not as familiar with Python's
threading
library (I am now, though!) as I am withasyncio
.
But my inexperience with Python's threading
API this still doesn't account for why the existing, experienced Python team couldn't understand parts of their app not working how they wanted to, and why I (or anyone who was a fresh pair of eyes) was exactly who they were looking for to spot the fix.
If Your App Fails In the Layers, Does It Make A Sound?
Part of Python being so multi-layered, when it comes to inter-process communication, was the existing team having trouble spotting where the process was silently failing to clean itself up. They were aware there was a problem for months - they couldn't get the app to exit an evaluation loop when the user ran specific commands that were meant to be "one and done" (making for a slightly annoying user experience in one part of the app that we now wanted to smooth out once and for all) - but we didn't have any errors jumping out at us as the to core problem.
This was a threading problem at heart, not a problem with feature built on top of the thread process; the same feature we wanted to get working for our end users.
Ultimately, this came down to so-called law of leaky abstractions. We had to propagate a cleanup method across 5 different layers of Python abstraction, just to properly close the websocket connection, before the app's features worked as intended.
'If it ain't broke, don't fix it' had unfortunately grown into 'if it ain't failing loudly, you won't hear it'. But we needed our app's main event loop to hear it if we were going to deliver an even better user experience on the frontend.
The Balancing Act: Who's Paying For All This?
My concern here isn't whether we got the app working or not (we did), but moreso the five days spent fixing it.
Was the time we sunk into development down to us as a team, or could I get real cheeky here and deflect some blame onto the Python environment?
This would have been a much louder issue in a language like Rust from day 1, identified and fixed from the beginning and without any inter-process layers involved. Just the core standard library, your dev team and you're done.
But this isn't a Rust vs Python post in the wider scheme of things.
My question is: How often does this happen nowadays? It's an obvious question to ask, but I'm asking because I don't know and genuinely trying to picture the answer.
What does that mean for large services out there, written in Python (or Java or Cobol) passing on the legacy bill to their users?
The project I was committing to was open-source (with a non-zero userbase before you ask!). It'd already seen success with Python threading, so it was too late (and would've been silly) to suggest going back and re-writing it from the ground up. It was much simpler for me to just get up to speed with how the library works, but what if I'd been billing by the project or by the hour for this?
This balancing act in tech (the upfront costs of re-tooling vs the long-term costs of communication) is something that's been covered better and more comprehensively in a Swyx blogpost - "Collapsing Layers" - that still hits relevant to me today.
What I took out of Swyx's post is this:
-
Cheaper hardware means we can run more code, both locally and by the cloud. The result is a trend over the last 50 years for more and more abstraction (API layers on top of API layers of code and more code - us newer developers standard on the shoulders of giants) without worrying about whether it'll slow down our apps in any obvious way.
-
We still haven't hit the hard limit of 1-nanometer process in hardware but, even though the cost of packing more cores into smaller spaces is delivering less gains and racking up more expenses, we're happy to race to that hard limit all the same. Another example of how we prefer to pursue certainty over uncertainty in a collective sense.
-
It would take a huge one-time event to even think of breaking out of a 50-year trend. No one wants to come off as trying to be "the smartest guy in the room" telling their own dev team "you know how everyone has been headed in this direction for the last half century? let's be the ones to NOT do it that way!"
-
Asking for software developers (and the software shops that pay them) to spend MORE time learning what's under the hood for LESS abstraction would be ridiculous. Especially if you're proposing it on company time.
-
But is it that ridiculous? When you look at cost from a different perspective (the cost of event handling and dispatching through different layers) then you have the weapons to creative and divergent thinking when budgeting for new tools.
Another Take From Johnny Magrippis
Johnny Magrippis is a Principle Software Engineer and educator - his Youtube Coding and Chill video playlist is here - his live coding sessions helped me with web apps in Sveltekit this year.
Because we're big fans of writing software with Svelte, our back-and-forth was more focused on when (or if) to pitch re-tooling in Sveltekit and when to stick with React. The same "old" tool vs new tool balance:
I need to remind myself the [Collapsing Layers] article was written almost 5 years ago! It feels super relevant.
One thing it reminds me of is my interview process with a health startup. They eventually closed a half-billion series C during the short time I was there (which would have happened even if I hadn't accepted of course!). Their big thing was the health chatbot that would theoretically diagnose your symptoms like a virtual GP, and definitely book appointments to see an actual GP, or have video consultations!
They had also just released a version of the virtual GP in Rwanda, so when I asked one of my classic questions to a Product Manager - 'What's the last thing you did you're personally proud of?' - I was surprised to hear: 'We re-platformed to React!'
Instead of silently thinking they were out of their minds to be proud of something with invisible business value (if any) when there were so many recent milestones to be proud of, I asked 'Why?'
They said 'React is much easier to hire for!'
This is a conversation that I recall often, and that I take into account whenever I have to consult companies regarding a greenfield project, or when it comes to whether they should re-platform from whatever they have to something else: The end-user doesn't care if your app is in React or Svelte or Rails. But they may care if you've got 20 more engineers working on it starting next month, if they manage to develop it faster!