<![CDATA[Shared Physics]]>https://sharedphysics.com/https://sharedphysics.com/favicon.pngShared Physicshttps://sharedphysics.com/Ghost 5.74Mon, 16 Mar 2026 15:45:36 GMT60<![CDATA[A Year in Reading, 2025]]>Well here we are. Another year has gone by. Each year, it goes by quicker and quicker. Each year, it grabs a hold of me and whispers, “you’re on a one-directional ride, buddy!” I’m now a quarter of the way through — no, scratch

]]>
https://sharedphysics.com/a-year-in-reading-2026/69a119d1e885df3fd0c26d72Sun, 01 Mar 2026 19:00:07 GMTWell here we are. Another year has gone by. Each year, it goes by quicker and quicker. Each year, it grabs a hold of me and whispers, “you’re on a one-directional ride, buddy!” I’m now a quarter of the way through — no, scratch that, almost halfway through, actuarially — and the ride isn’t slowing down.

Hi, my name is Roman and this is my year in reading. Which is only partially about reading and more of an annual letter by way of books.

Santa Cruz, California

My life this year was a bit of a tumult. I left my role leading product development at myLaurel Health after their fundraise, and joined Within Health as their VP of Engineering. It marked a culmination of my 15-year journey from studying philosophy to doing technical work, by way of a very meandering walk through journalism, design, marketing, and product. It’s been a very diagonal career.

This year also involved a lot of travel. Congratulations to everyone who has gotten married! Please have your weddings closer to where I live next time! (That’s a joke. I don’t mind a good destination wedding, even if it's a destination only for me.) California during late summer was a particular highlight — I enjoyed Santa Cruz and the redwoods around San Francisco. But while travel is amazing, it didn’t lend itself to the unbroken spans of time I need for writing or project work. Though I wrote more this year than I did last year, there was even more that I didn't write, that I wanted to write but didn't have the time to get around to.

Long-time readers may also wonder if I finally ran a marathon last year. And the answer is… sort of, yes!

I’ve been training every year with mixed success: my first year, covid happened. My second year, we had a wedding that overlapped on the race day. My third year, my child was born. My fourth year, I humbly did just a half but I got a personal record. And now, year five of training, I had my most thorough training plan. I ran 3-4 times per week, and logged up to 24 miles on my longest training runs. I was feeling great. My best friend flew out to run with me and it was going to be our joint first marathon. And then…

… and then I ran half of it. At mile 15, an injury flared up so bad that I had to walk the next 10 miles. I ran again for the last mile to cross the finish line for an awful time of more than 6 hours total. I hated it, and it was much harder to drag myself through a slow, crippled walk for a miserable ten miles than it was to do any of the training runs. I was passed by everyone except those more injured than me, to finish fifth-from-last in my age group. More than my leg, it was my ego that felt injured. But I eventually crossed the finish line; the thought of a DNF after years of preparation was worse than calling for a ride home and nursing my leg.

So that was my year, and perhaps a metaphor for life: you have to just keep going, injuries and all.

Running the Boulderthon in Colorado, Flatirons in the Distance

Oh, but the books! My favorite books this year were:

  • Mundo Mendo, Book One by Luis Mendo (2025), which came in as a dark horse favorite in the last few days of the year.
  • Ducks by Kate Beaton (2022), an incredibly moving graphic novel and well worth all the accolades it received when it first came out.
  • Monsters: a Fan’s Dilemma by Claire Dederer (2023), which I marked up nearly every page on and kept sharing screenshots of annotations to the annoyance of many of my friends in group texts.
  • Legends and Lattes by Travis Baldree (2022), which was such a perfect antidote to the hell of news and politics that I immediately went out and got copies for a bunch of friends. And then they read them and shared them forward too.
  • Apple in China by Patrick McGee (2025), which was an incredibly detailed and interesting book about what it means to "manufacture technology" and manage supply chains. I learned a ton, even though it wasn't directly applicable to my work.

One interesting observation was that this was the first year when I read more books digitally than in print. Last year, I wrote about noticing my print vs. digital reading patterns, and acting on those 'noticings' – I read faster digitally, though it was often lighter fare. I leaned into that this year, and my higher book count reflects that.

Getting rained on while camping in North Carolina

Cozycore vs. Existential Horror

I read more fiction than any prior year during 2025, and one of the themes of that fiction was “Cozycore vs Existential Horror”. I think the two themes bookended each other nicely. The cozycore bits were a necessary retreat from the stresses of work, life, and politics. The existential dread bits were an embrace of those stresses, perhaps a way of unconsciously bracing myself for a year of ‘of course it can all get worse, look see here!’ (I am not a horror fan, for the record).

Cozycore is many things to many people, but for me it was writing that had a warmth and net positivity — something that felt good to curl up with, the equivalent of a hot chocolate and a fire on a snowy day. Cozycore books tended to have very low stakes and were more focused on character development than on a plot propelled forward by conflict. For example: “The Long Way to a Small, Angry Planet” was ostensibly about a trip to open a wormhole near a resource-rich but aggressive planet, but ninety percent of the book was just about a crew of aliens getting to know each other. Similarly, “Legends and Lattes” was about an orc who decided that she didn’t want to do the dungeons and dragons bit anymore and instead wanted to open up a coffee shop. The plot was about building the shop, hiring staff, figuring out how to make the business work, and a small bit of conflict with the local heavies.

(Other cozycore books worth mentioning: “Tokyo These Days”, the "Hilda" comics, and "A Court of Thorns and Roses". These were skipping stones of calm through the year.)

Between them were books of existential horror and dread. I flew through four books in Charlie Stross’s “The Laundry Files”, a send-up of office work against a backdrop of Lovecraftian supernatural horror written in different thriller pastiches, where the stakes were increasingly upped with each book. I followed fast on that with “There Is No Antimemetics Department” by QNTM (not to be confused with “Antimemetics: Why Some Ideas Resist Spreading” by Nadia Asparouhova, also read this year, which drew on QNTM’s story as source material). This was genuinely horrifying and I couldn’t put it down. (“Where the Axe is Buried”, “Somna”, and “The Apartment in Bab-El-Louk” rounded out that ensemble.)

Existential horror seeped into my non-fiction reading too – or at least, I may have been more attuned to it than I may have been previously. For example, “Ducks” by Kate Beaton is a memoir-slash-graphic novel about two years working in the Canadian oil sands… and also about joblessness, sexism, adultery, casual misogyny, workplace deaths, and being raped. (The “ducks” of the title are the ducks that land in the tar sands and die, a metaphor for people seeking better lives but getting stuck.)

And “Monsters: A Fan's Dilemma” by Claire Dederer was about asking how we deal with art created by people who are monsters, who are problematic. How do you watch Roman Polanski or Woody Allen movies? How do you forgive writers and painters who are personally cruel, or abandon their families and children to focus on making great art? What set of cosmic scales do you use to weigh the quality of art against the monstrosity of the artists' lives? It was a wonderful read in the sense that it was a rare but successful work of Montaigne-style essay'ing: “an attempt”, grappling with questions to which there are no definitive or singular answers. Most things marketed as “essays” are typically opinion or editorials, and not actually “attempts”. So I found it quite lovely and full of interesting thoughts on every page. I left behind many highlights.

I also read "Things in Nature Merely Grow" by Yiyun Li, which has gotten quite a bit of "best of 2025" press already. I came to the book by way of a morbid curiosity — a nonfiction “memoir” (though that’s not the right word) from a mother who lost both of her sons to suicide. (If that isn't ‘existential horror’, I'm not sure what is.) How do you wrestle with something like that? How do you even begin? I struggled with it at first — the writing started with all the emotional heft of a discarded husk of corn, the sort of mechanical writing that you might do on autopilot when your only remaining skill is writing and you are deploying it as a final attempt to prevent a complete personal collapse. It was hard going.

But as I continued, the mechanics of the writing — the repetition of thoughts, memories, and fragments — began to feel like waves lapping a shore. It was meditative, intentionally looping back on itself in a process that was increasingly more alive, more aware, more emotional. I still didn’t like it, but I don’t believe that you’re supposed to particularly like a book from a mother who lost both sons, writing to and about the second loss. I can’t tell if it was good, but the more I read, the more I felt a begrudging respect for the writing, and then by the end — simply respect, with no qualifiers.

Driving home during a blizzard, Colorado

Artists’ Books

“It will be a miracle if your shop succeeds. But you cannot have the miracle unless you open.”
- Peter Miller, “Shopkeeping”

Another theme — broader and more persistent across my years of reading — has been a fondness for artists’ books. These are books that exist outside of economic reason. They aren’t commissioned. They often aren’t profitable. To paraphrase Fobazi Ettarh’s idea of "vocational awe", these are books that exist because someone in the world believed that it was work worth doing, and felt compelled to do the work regardless of any risk or cost or calculus.

Some of these books are mass-produced, some of them are bespoke, some of them make money, and some are created at a great personal cost. They can be big or small, and between their covers they are about any matter that the writer happened to concern themselves with. Sometimes, these books are a continuation of a long conversation with cultural predecessors; other times, they are in conversation only with themselves. They are not always books — they are often books, or book-shaped – but can be music, art, software, or any other thing. In all forms, they reflect a vision the artist had, a vision they went out of their way to make real.

So these book-shaped things are often joyous, magical things: they are imbued with a lot of heart and soul.

I also make artist book-shaped things from time to time, driven by some belief that if I leave behind a memento of having lived, my life becomes more than ‘tears in the rain’. These projects are my way of grappling with the dueling feeling of a fear of death and a celebration of life. To create is to be alive, to have been alive! (An armchair psychologist might remark that I was set off on this path when I stared too closely at an ornate little clay pot in the London Museum as a kid, grappling for half an hour with the idea that someone — a specific person — had made this thing and it has outlived them for nearly seven thousand years, a life and work reverberating across space and time.)

So with that, I celebrated the existence of Luis Mendo’s “Mundo Mendo, Book One”. Mendo is an illustrator living in Japan, and this is a collection of his illustrations, stories, and comics over the course of the previous year, lovingly packaged into a bunkobon-sized book. Mendo is an incredible illustrator — his lines and colors have a vibrancy and spring to them, and the book channels a certain Pan-Am 70s “Enjoy the flight!”-style energy, as if to say "this is fun, this is worth being a part of".

There is no narrative per se, but there is a breeziness of observation and reflection that exists to be enjoyable, useful, and sometimes because simply existing is reason in itself. The thread is Mundo's life. He interviews himself as a shapeshifting cartoon. He reflects on shirts and the material quality of things. He visits bookstores, writes about creative blocks, about how he has his studio set up, about moving out of Tokyo, about raising his kids. He looks at birds and meets with friends.

In a meta turn, Mendo describes the importance of such self-directed, non-commercial work: “When I think about how I arrived at what I am today as an artist, I realize personal work has always been the motor, the secret sauce and the reason for my humble success: doing work for me, without thinking about audience, reach, compensation or any other external element.”

And in doing so, the book is brimming with the sort of warmth and joy that comes from saying “I have this skill and I want to use it to make something beautiful in the world! Something that no one asked me to do, something I want to do just because.”

I am glad that this little thing exists. As Peter Miller writes about shops, “The world can get anything it wants. It cannot get a shop.” Left unsaid: a good shop — and for us readers, a good book — thoughtfully curated, thoughtfully composed, is given not gotten.

Miller continues:

There is no literal need for shops. Certainly no need for ties, nor extra-virgin olive oil, nor a baguette; no need for notebooks from Italy and dot grid notepads from Tokyo; no need for Danish carafes and Swedish tea light holders; no need for hand-printed cards from Paris; no need to restock pencil leads and better pens; no need to compare sketchbooks, nor study Perriand, nor review the work of designer Paul Rand and his wife Ann; no need for the street signs of Paris, the graphics of Barcelona, the stone walls of Vals, no need. But then, of course, "O, reason not the need." We shall as soon never cut flowers again, nor look to the night sky, nor walk in cool, shallow waters. Nor marvel, at the gifts of it all.

A book such as “Mundo Mendo” is a gift that has been given and savored, for which reason is not the reason.

It is in this spirit that I turned to support Luis Mendo’s continued writing and drawing after coming across it, and why I have done the same for Craig Mod’s “Special Projects” (something I talked about last year, with his “Things Become Other Things”). This is why I buy every risograph and weird little project that Robin Sloan prints up, why I back books on Kickstarter with alarming frequency, why I support small and local presses, and why I love nothing more than to buy an art print from a local artist at a coffee shop. I want more of this stuff to exist, damn it!

Such are my joys in this world: a celebration of someone wanting to put a thing out into the world — and having the skills to do so — that without them would simply not exist.

Flying over Philadelphia

Letting Go

Every year, my “year in reading” post coincides with packing or unpacking all of my books into boxes. I’ve moved five times in the last five years, and we’re getting ready to move yet again, six for six. (Writing this is a necessary distraction from packing yet again). At this point my wife is taking my jokes of “let’s do ten moves in ten years” depressingly seriously. So: goodbye, colorful Colorado! Hello, garden state of New Jersey!

When I was in high school, I told everyone that I would never come back to New Jersey. I hated high school, I hated the suburbs, I rebelled against everything within my line of sight. I chose which college to go to based on how far it was from my childhood and every year, tried to get a little further and further away.

It was with this in mind that I came across a passage by Danny Meyer that gave me pause:

A pattern I’ve noticed in chefs is that many spend tremendous energy when they’re young working to build a life away from where and how they grew up, in order to free themselves and define who they are on their own terms. It takes a lot of confidence and emotional security for people—and especially chefs, whose cooking can so clearly reveal their roots—to feel they have accomplished enough in the outside world to “come home” in a culinary or an actual sense.

I can relate to spending tremendous energy trying to get away and recognizing that I need a lot of confidence and emotional security to “come home”. But is this a homecoming? I don’t know. I have grown, and I have a family of my own now. And as my little girl gets older, I want her to get to know family and be around friends, most of whom still live in the tri-state area. So whatever insecurities I have regarding moving back need to be put aside; that is something I think all parents grapple with at some point — to which degree do we pass on our traumas and emotional insecurities to our children?

Anyway, our hand was forced on this decision – our lease was not being renewed. Given “housing costs and affordability” (did you have that on your “books I read” blog post bingo card?), we decided that our best option was to move back; closer to fam, back to friends, back to traffic and turnpikes and the east coast hustle.

So my books are going into boxes again.

Every year, I tell myself that moving is an opportunity to thoughtfully hold everything I own and to Marie Kondo the heck out of it: does it bring me joy? Should it remain in my life? And every year, I push off packing until the last minute to minimize the time spent living out of boxes. And as our last last weeks approach, I throw all my books, office trinkets, and capital-T things into cardboard, slap some tape on them, and truck them to the next place.

This year, I started early. I tried selling some of my books on Amazon (earned a whole two dollars after all the shipping costs and fees and charges), some to local used bookstores (a few hundred dollars for some four boxes of books and records), and some just plain ole donating to Goodwill.

Letting go of books and records is hard for me, especially those which I bought ages ago but never got around to reading. They were enthusiastically aspirational purchases and letting go of them is the psychological equivalent of saying “I am no longer this person.” I don’t find it easy to admit that I am not the person I once was, nor the person who I wish myself to be. But there have been many such past me’s, and I know there will be many more to come – the exercise is necessary, the space for new me's needs to be created, the old me's need to be let go.

At the same time, there are some books — past selves — that I won’t let go of. Some books are memories, haptic or visual triggers, mementos. Severing yourself entirely from the past is just an illusion; the present is the way it is because the past was the way it was.

And other books are my anti-library, a collection of the unread which as Umberto Eco argues, is more valuable than the read ones.

Then there are the books I have held on to for years, waiting to be gifted. Wandering amongst my parents’ books was one of my favorite things to do as a child. I would pull down a book, read parts to see if I am interested in it, look at the covers, and imagine what they could be about. I did not understand half of them, hated a quarter, and loved a few. Later, some of them would be books I would be assigned in college, or come across in bookstores as ‘classics’. I’d find some mentioned in an appendix, or rediscover them for myself when going down a rabbit hole. There is and continues to be a pleasure in connecting those dots!

But I do not want my parents’ books. I have my own library. And the books I am holding on to, with the intention to gift to someone or to pass down, my daughter will likely one day not want them either. They are privately meaningful to me, intimate, in a way that they will not be to anyone else. No one wants to be burdened with a load of sentimental stuff neatly wrapped in a guilt about discarding or selling them. I would rather not pass that obligation to anyone else.

And so another book goes in the moving box, and another goes into the donation box. May the latter find nice homes! With each book, I feel a different emotion: a flicker of a past excitement, a memory of discovery, a reminder of an idea I wanted to learn about. I feel those things, and then let go of another pound of paper and ink.

(Other things I have let go of this year: a job. Our house. My car. Old art supplies. Shirts. Barefoot running shoes. Pretense to starting a healthcare data business. A bread maker. An ice cream maker. My garage workbench. A CNC machine.)

Scenes from a moonlit run

The Complete List of Books

Well we’ve gotten to the end of an introspective year, very few business books in sight. Twas a year for feeling, not for business'ing.

Here is my complete list of books, in an interactive list. I hope you find something to enjoy.

Posts

]]>
<![CDATA[Learning Without the Lessons (In Pursuit of Uncertainty)]]>https://sharedphysics.com/no-lessons-to-be-learned/6965bb35e885df3fd0c269e6Fri, 23 Jan 2026 13:15:01 GMTI have a two year old, which means I’ve been reading a lot of stories with some morality or lesson baked into them. And the thing that bothers me is that the lessons are often wrong.

Consider the story of the tortoise and the hare.

You know the one: a tortoise and a hare have a race. Hare is way ahead, decides to take a break. Falls asleep. Tortoise eventually catches up and crosses the finish line first.

Here’s the full story:

A Hare was making fun of the Tortoise one day for being so slow.

"Do you ever get anywhere?" he asked with a mocking laugh.

"Yes," replied the Tortoise, "and I get there sooner than you think. I'll run you a race and prove it."

The Hare was much amused at the idea of running a race with the Tortoise, but for the fun of the thing he agreed. So the Fox, who had consented to act as judge, marked the distance and started the runners off.

The Hare was soon far out of sight, and to make the Tortoise feel very deeply how ridiculous it was for him to try a race with a Hare, he lay down beside the course to take a nap until the Tortoise should catch up.

The Tortoise meanwhile kept going slowly but steadily, and, after a time, passed the place where the Hare was sleeping. But the Hare slept on very peacefully; and when at last he did wake up, the Tortoise was near the goal. The Hare now ran his swiftest, but he could not overtake the Tortoise in time.

They tell you the lesson is “slow and steady wins the race,” which is a great metaphor for grind and persistence, a useful adage, and so on.

But that’s the lesson for the tortoise. And in ninety-nine out of a hundred other versions of that story, where the hare doesn’t stop and just finishes the race, the tortoise loses. Because the tortoise is slow and the hare is fast. No amount of going slowly and steadily is going to help the tortoise win. The tortoise wins not because of its virtue but because of the unforced mistake of the hare.

I can come up with seven different lessons off the top of my head. How about:

  • “Don’t rest in the middle of a race.”
  • “Give your best no matter who you’re competing against.”
  • “Wait for your competitor to make a mistake, then exploit it.”
  • “Avoid making unforced errors.”
  • “Celebrate only after winning.”
  • “Don't abandon your competitive advantage.”
  • “Only challenge stronger competitors who have exploitable weaknesses.”

In fact, the original Aesop version has it as “The race does not always go to the swiftest,” which is very different. (I’ve also seen a religious reading that pulls out “Pride comes before the fall.”)

There's much more for the hare to learn than for the tortoise. And “Slow and steady wins the race” is not only the least statistically likely lesson, it’s also probably the more boring one.

So the lesson changes depending on whose perspective you take. If you're the hare, you learn a different lesson from that interaction than if you're the tortoise. Moreover, the lesson that is learned may be extremely context specific.

Let’s run the fable past its original ending: the tortoise wins the first race and learns that “slow and steady wins the race.” The hare learns “don't abandon your competitive advantage.” (Don’t rest in the middle of a race).

At some point, the hare and the tortoise have a rematch. The hare trounces the tortoise. They’re not even in the same league. The hare does not go slow and steady, the hare goes fast and pushes as hard as it can and then gets to the finish line and takes a nap. The hare goes on to race against and beat other animals. It focuses on short and middle distance lengths, optimizing for fast-twitch muscles and avoiding underperforming in long races. Eventually the hare retires as the undisputed greatest. Meanwhile, the tortoise continues to race but never finishes in the top quartile again. The tortoise justifies it by saying it is a marathon, not a sprint, that it is doing it for personal development, but it never wins another footrace.

Eventually the tortoise learns that it is playing the wrong game. It is not optimized for foot races, though it originally might have gotten the opposite impression for a highly luck-dependent N-of-1. It moves into swimming long distances, which is a more natural fit.

Eventually, in their old ages, the tortoise and the hare let bygones be bygones and have a rematch. They race by the beach. The tortoise climbs into the water and swims; the hare hops across rocks in the shallows. They reach the end neck and neck.


A friend of mine recently dropped his company’s production database by accident. Instead of running a local drop command, he did it to prod instead. The command had no failsafe and took seconds to start and drop. Turns out the company had never done a restore from backup exercise and it took 3 hours to restore services.

He happened to be in an EU time zone working for an American company focused on education tools, meaning that his entire ordeal was early enough in the morning that it never affected end users and the data loss was negligible.

After the retrospective, they implemented a new multi-step process. You have to manually confirm which environment you are working in. There’s a big new “Danger” and confirmation step when you run the command, can’t miss it. They also set up code red drills to practice recovering from emergencies like that.

I can imagine the manager now: “And what lesson did we learn from all this?”

Well, it depends. How about, “Mistakes that don’t affect users are forgivable.” Or: “We should actually practice those best practices, not just wink-wink-nudge-nudge agree that yes, we should really run those code red drills, but it's so unlikely to happen here that we've deferred them indefinitely.” Or if you’re a security guy, “Implement two-key critical processes!” Or if you’re my friend, “Always double check where you are working.”

There are fifty lessons here, each dependent on perspective, each with different utility and reproducibility. But one of the cardinal sins of software engineering is to generalize from a single instance. Was this an exception or was this an inevitability just waiting in the wings if given enough engineers and time?

Sometimes things that are very unlikely to happen, do happen. And sometimes things that are very likely to happen, don’t.


This reminds me of the story about bomber planes during WW2. A group of bombers would fly out on a mission, but not all of them would return. The ones that did had bullet holes concentrated in the wings. The military's instinct was to reinforce the wings on all their bombers, but this seemed wrong — survivability wasn't improving.

Statistician Abraham Wald, working at a secret research group in Columbia University, realized the accepted wisdom was backwards: this was survivorship bias. They needed to consider the planes that didn’t come back, which led to the insight that the wings don’t matter, but damage to the engine and cockpit is fatal. When those were reinforced, survival rates improved.

Survivorship bias! What a nice statistical term to describe lessons learned and broadly generalized from a particularly uncommon outcome.


“Reinforcing the wings” reminds me of another famous story, of Icarus and his father Daedalus.

Most versions of the story have them escaping from an island by having Daedalus make wings for them out of molted feathers, beeswax, and threads from old sheets. Daedalus tells Icarus “not to fly too low or the water would soak the feathers and not to fly too close to the sun or the heat would melt the wax.” Icarus memorably ignores these instructions, flies too high, melts the beeswax, and plummets into the water where he drowns.

This is where the idiom around “those who fly too close to the sun…” comes from, but if Icarus had flown too low, the idiom we would have been left with would have been “don't fly too close to the water.”

From all this, Daedalus did not learn anything new – he already knew the risks, which is why he warned Icarus. And whatever Icarus learned, it had a very short utility as he plunged into the sea and died.

So perhaps the lesson should have been, "Kids won't listen to you, so don't set them up to fail by designing escape plans that require compliance in high-risk situations". Or, “It’s better to fly at night.” Or, “Do a better job on the wings.” Or, “Don’t try to be too clever; build a boat when you're trying to get off an island.”


Perhaps learning lessons from a story – or from history – is not the point.

History – and most stories – are context-specific. The number of things that have to go right for something to have happened the way it did are often immeasurable. That's context. History and its stories are shaped by that context and cannot be separated from it. It’s hard to point at a story and say with some definitiveness whether the way things went were due to luck, circumstance, willpower, or inevitability. It’s also impossible to recreate the initial conditions and subsequent actions that led to the outcome. After all, we can’t rerun history with different variables to tease out which ones mattered.

At best, you can describe what happened. But even that is reductive — we simplify the retelling of things by choosing what to include and what to leave out. In simplifying the world, we make it possible to tell stories about it. But choosing the simplifications is an act of bias, and a decade later someone will come along with a revisionist view of the world, choosing different details for the retelling, and subsequently drawing different lessons from it.

The lessons we draw from these stories are just pithy descriptions of the punchline. “Slow and steady wins the race” describes what happened one day when a tortoise challenged an overconfident hare to a run. “Don’t fly too close to the sun“ does the same for Icarus’s brief and exuberant escape. These lessons are also fragile — had the outcome of the story been different (all other things similar), so would the lesson have been different. And so the lesson — the thing generalized as an instruction to “do this, not that” — isn’t even connected to the rest of the story.


Here’s a game I like to play: whenever I read stories about exceptional people or exceptional situations, I look for counterfactuals to whatever takeaway the writer is pushing. For every example of privileged success, there is someone who came from nothing. For every example of victory from following a tight playbook, there is someone just as effective who did not know there was a playbook, let alone followed it.

Let’s say we’re reading about renaissance painters. Leonardo was an illegitimate son of a notary who apprenticed in one of the best schools of art during the renaissance. His workshop contemporaries were Lorenzo di Credi, who painted competent religious works and is mainly remembered as “the guy who trained with Leonardo.” There was Perugino, who became successful and respectable and then was totally eclipsed by his student, Raphael. Raphael himself came from a life of privilege, ran in Vatican circles, and died young with a reputation as one of the greats. Raphael’s father, Giovanni Santi, was a court painter with all the same connections but ended up as a footnote in reference to his son. Or take Caravaggio—orphaned at eleven, likely illiterate, arrested multiple times for brawling, killed a man and fled Rome with a price on his head. He also died famous, revolutionized painting, and changed art history.

What do we learn from these stories, these different lives with converging endpoints and similar lives with diverging endpoints? Is there some grand truth to pull out of biography that we can apply broadly and with certainty?

No. There is no grand truth, no repeatable instruction for how to become a Leonardo or to raise a Raphael. (If there was, it would be widely exploited). They are just stories and cases. A successful artist can look like this, but they can also look like that. Even in their exceptional outcomes, they differ: Leonardo was a polymath; Raphael was a painter and architect; Caravaggio was a painter. What was true for one of them was not true for another; some variables may be better predictors of future success, but none guarantee outcomes. Sometimes what truly matters is luck-dependent, like meeting the right collaborator or being born into the right family or era. We can certainly continue to dig deeper — we can pathologize, we can probe into more specific biographical details, turning points, and communities they ran in. But for each fact there is a counterfact represented by one of their contemporaries. Their lives are their own unique cases. Their lives are not the lives of others, and never will be. Their lives are not your life.


So there are no lessons to be learned, no instructions or certainty to pull from history and from stories. Or rather, learning lessons is not the point. Narratives and stories have their place. They entertain and they communicate valuable details — context — about the world.

But when it comes to learning, the goal is not to take a story — a simplified world — and to generalize it to represent an instruction that when followed will ensure repeatable outcomes.

The alternative to trying to pull certainty from a story is to instead look at stories as cases for pattern recognition rather than actionable lessons. A single story can be illustrative of many different concepts, and a single idea can manifest in the world in many different ways. It is to hear the story and remark, “Wow. I’m filing this one under ’overconfidence’, or ‘varieties of success’, or ‘what fake it till you make it looks like’.”

Eventually, you build up a repository of stories to reference. You can look at things happening and say, “This situation looks a bit like these other things I read about. In some of those stories, people did this and got one result; in others, they did something else.” Such pattern recognition is your compass for navigating the uncertainty and complexity of the world: not because it tells you what to do, but because you've seen a range of what's possible.

Reading for lessons is how we see a race between a tortoise and the hare and conclude: “I should work steadily and persistently,” when a better reading may be that “this is a case where overconfidence led to unforced error, where playing to your strengths matters, where cause and effect seem to have been flipped and where a single lucky outcome has created a misleading narrative. I can work with that.”

The world is complex. The world is a mess of detail that rarely looks the same. It changes, it accumulates. The world contains multitudes. Of course it’s our prerogative to try to impose order and certainty upon it, seeking comfort in some absolute instead of grappling with the inherent relativity and reflexivity. But the world is too big to wrap our hands around, too complex for any single framework to capture or describe. (And efforts to impose order on the world bring about their own complications and unintended reactions). Let’s not confuse what we want — lessons, certainty, a playbook to follow — with the reality that we move through.

Things can take many forms. Two people, looking from different points in life, will see different things in your story, and perhaps the same thing in different stories.

Oh, and this essay, this collection of anecdotes, also has no lesson. It’s a remark, to be filed away, and hopefully one day be useful.

]]>
<![CDATA[Are You Treating ChatGPT Better Than Your Coworkers?]]>Here’s a 🌶️ spicy take: I think practice at human/LLM interactions can lead to better human/human interactions.


Over the last few months I've watched colleagues spend considerable time crafting and sharing LLM prompts for different projects—complete with context, examples, and success

]]>
https://sharedphysics.com/on-human-llm-interactions/6839cd17e885df3fd0c2636aSat, 31 May 2025 00:21:28 GMTHere’s a 🌶️ spicy take: I think practice at human/LLM interactions can lead to better human/human interactions.


Over the last few months I've watched colleagues spend considerable time crafting and sharing LLM prompts for different projects—complete with context, examples, and success criteria. Then I'd see them turn around and message their team: "Hey, can you update the deck with a snappy new slide? Need it back in a hour for a client meeting."

I get it. Prompt engineering is cool and new and exciting and there's a lot of experimentation and expertise to share. A lot of that sharing aligns with my own experiences: good prompts provide necessary context, detailed requirements, step-by-step breakdowns, operational personas, clear success criteria, and specific deliverables. Good prompts err on the side lengthiness even for small deliverables.

And on the flipside, folks have quickly understood that lazy prompting leads to lazy deliverables. You get hallucinated slop if you put in low-detail prompts.

But…

"I need this thing by EOD and I'm stuck in meetings all day. Can you just figure it out?"

… all of those things are also good for people.

I mean, compare human/LLM interactions to human-human interactions. My deep research (read: having worked in an office, having talked with other people) suggests that people are generally awful at providing requirements to other people. They don't have to be, but they choose to be. Projects and requests routinely get tossed over a wall with the expectation that the recipient will just figure things out. Yes, people are flexible and have agency to figure things out but it leads to plenty of time wasted on figuring necessary context, vision, goals. Subsequently bad deliverables result in considerable rework, re-rework, and re-re-rework. I’ve seen this in Product/Engineering/Design standoffs, Executive/Manager confusion, and the routine back and forth between Sales/Marketing.

A friend described it perfectly: "It feels like requesters can't be bothered to figure out what they want and put in the work to make that clear, and then they're unhappy with every output."

Why do we communicate better with our AI overlords?

Five things come to mind:

  • Speed changes everything
    Human/LLM interactions are fast. You get output back in seconds. Human/Human interactions are slow, especially in the day to day of business'ing. It takes time for our wetware to communicate, process, fit work into our schedules, do the work, and then deliver the work. This ranges from hours to days and weeks.

  • Costs restructure behaviors
    Human/LLM interactions are relatively cheap, even when you're paying money for them. $20/month or even $200/month is insanely cheap compared to an hour of a professional's time, which starts at just under $20/hour for intern-level work. You might not have to 'pay' your colleague to do work for you, but that cost is built in to their role, the work they're assigned, and their capacity to take on and prioritize new projects.

  • Tight feedback loops lead to better habits
    The speed and cost of interactions greatly impacts cycle speed (doing, receiving, reflecting, trying again) and subsequently accelerates skills development and learning. Specifically, the skill of "communicating requirements to someone".

    Human/LLM interactions are controlled experiments. Same interface, predictable responses, clear cause-and-effect between input quality and output quality. You quickly learn what works even if you're not intentional about the learning.

    Human/human interactions are slow and contain plenty of external variables (mood, energy, differing personalities, baggage) that make it harder to draw repeatable and generalizable lessons about what effective and ineffective interactions look like.

  • There are no safe assumptions
    Human/human interactions bundle a metric ton of assumptions into each interaction. Yes, your co-worker probably does know something about the company you're both working for and the products you sell or the client you're talking about. But those assumptions are safe only at a superficial level.

    Humans are not mind readers; every interaction is lossy between what's in your head, what you've said, what they've heard, and how they interpreted that. That's why interviewing and shadowing are hugely effective information-seeking techniques on projects even between longtime collaborators. Your colleagues can make reasonable assumptions about what you're looking for, but those relationships take time (weeks, months) to develop. And yes, humans can apply additional reasoning and information-seeking behaviors to tackle a problem independently, but (a) not always and (b) not all humans show this pattern of behavior.

    With LLMs, it is not safe to make any assumption. So a good prompt builds in all the necessary information for the LLM to do their work. Any when you get slop back from bad inputs, it's your fault – not the LLMs – for the quality of the output. How should the LLM have known that some detail was critical for your sales deck, or that you had a very specific color scheme in mind when you described it as "should look good"?

  • Power dynamics inform relationships
    You can't pull rank on an LLM, which means that you – yes, you – are always the accountable and responsible party in an interaction. Whereas with people, power-relationships underpin most collaboration dynamics. Very few collaborators are equals in any meaningful sense. Executives can't just tell an LLM to "figure out the rest" and expect magic. There's no "that's their job to understand me" with an LLM. There's no "well you're the tech guy, that's your role to ask questions". What you get out of an LLM is broadly equivalent in quality to what you put in.

    You also can't make up expectations for what the LLM can or can't do. LLMs don't "learn" and "upskill", so telling them to be better at things that they're not good at is a fool's errand. You need to understand their limitations and work within that. There's no external blame to assign for bad outputs, given that we're all using the same LLMs.

We've normalized giving LLMs better direction than we give our coworkers

Human/human and human/LLM interactions are both about effectively communicating information through extremely lossy mediums (text and sound). However, human/LLM interactions make it extremely clear where bad results are consequences of your inputs and unrealistic expectations and not the other party. Combined with their speed and cost, most people who turn to LLMs quickly grok a new pattern of effective communication (prompt engineering): detailed, contextual, specific, and iterative.

Here's an example of something I've recently seen provided to an LLM:

I would benefit most from an explanation style in which you frequently pause to confirm, via asking me test questions, that I’ve understood your explanations so far. Particularly helpful are test questions related to simple, explicit examples. When you pause and ask me a test question, do not continue the explanation until I have answered the questions to your satisfaction. I.e. do not keep generating the explanation, actually wait for me to respond first. Thanks!”

Here's another:

Use precise terminology; avoid generic phrasing. Favor concise language with a high insight-to-word ratio. Write for a C-suite audience—efficient, nuanced, and analytically clear. Avoid lists unless they serve a clear analytical function. Use boldface to emphasize domain-specific or technical terminology. Minimize assumptions. [...] You are an AI expert like Noam Shazeer, a writer in the plain, high precision style of Paul Graham, and a teacher with the conceptual clarity of Richard Feynman.

Here's the version that would have been passed as a human/human interaction:

Can you explain this topic to me? I don't get it, and Justin's explanation was really long winded. Do better.

And as I see more and more people sharing examples of how they prompt LLMs on certain projects, I can't help but think: gee whiz, why couldn't you provide me that level of competence and completeness in your asks and requests?

I've read people describing working with LLMs as having a "super-powered copilot" but needing to treat them like "a junior assistant/intern". Folks invest time and effort to provide clear, well-structured, contextually complete prompts for LLMs to work off of. Meanwhile, slack messages and emails remain cryptic haikus of half-baked requests.

Maybe it's worth throwing down a gauntlet on this: make human collaboration and requirements communication more like prompt engineering. Take the time to figure out what you want and describe it well. You'd do it for an AI. Do it for a human as well.

Next time you're about to fire off a vague slack message or throw a half-baked idea brief over the wall, ask yourself: would I get slop out if I texted it to ChatGPT? You'll probably get better outputs from your team. And heck, your coworkers might wonder why you're suddenly so clear and helpful.

🫳
🎤

]]>
<![CDATA[Appendix to "Identifying Signals of Expertise"]]>In "Identifying Signals of Expertise", I ended up cutting almost 2000 words to keep things succinct and focused. But that stuff was useful! If you're looking to upskill your interviewing skills further and dive deeper into identifying signals of expertise, read on; I've included

]]>
https://sharedphysics.com/appendix-to-identifying-signals-of-expertise/683dc8a9e885df3fd0c265e6Thu, 29 May 2025 22:00:00 GMTIn "Identifying Signals of Expertise", I ended up cutting almost 2000 words to keep things succinct and focused. But that stuff was useful! If you're looking to upskill your interviewing skills further and dive deeper into identifying signals of expertise, read on; I've included three sections that were previously cut:

  • Additional Considerations for an Expertise-Oriented Interviewing Toolkit
  • Theoretical Foundations of Expertise
  • An Example Technical Exchange In Practice

Original Article:

Identifying Signals of Expertise
One of the most useful questions I’ve used for evaluating expertise during a hiring interview is: Tell me about a time when you did something you thought was right, and later it turned out to be a mistake. That kicks off a series of additional questions and followups: * What was

Appendix 1: Additional Considerations for an Expertise-Oriented Interviewing Toolkit

There are many ways to probe for expertise, but all of them share few common "gotcha's" to avoid:

1. Avoid hypothetical questions and generic answers

With this sort of framework, we're conducting a behavioral interview. Instead of asking hypotheticals ('What would you do if...'), you ask about specific past experiences ('Tell me about a time when...').

The premise is simple: past behavior predicts future behavior. Similarly, hypothetical questions produce hypothetical answers — often idealized or aspirational versions of what someone would like to do rather than what they actually do in practice. So always ask for specific, lived experiences.

Consider this exchange:

Question:
How would you work with a difficult client who is demanding unreasonable changes?

The Hypothetical Answer:
I would try to understand why they're asking for those requirements, and work with them to see how we can solve their problem with existing capabilities. If we can't do that, I'd work with sales to price out a new statement of work, then partner with product and engineering to make sure we build the right features to spec.

Is the candidate answering a question or reading an HBR article on how to provide generic assessments? The answer ignores them having to navigate real-world complexity, such as:

  • The client doesn't want to pay for a new SOW. You're in a whale-oriented enterprise market and they have financial weight in pushing your team around. Executive leadership needs to get involved in this call, it's not actually in your hands.
  • The product and engineering team is underwater trying to deliver on five other features. They're telling you this is going into the backlog... but it might never get done because it's not a reusable feature for any other client. The PM told you flat out: "this is a bad feature, convince them out of it." How do you manage a difficult team member from a different department, who isn't wrong in what they're saying?
  • Your boss is driving you to talk about ten other things you're doing but the client doesn't want to hear about that. They're also telling you to make the client happy, and the client doesn't want to hear about how it's a bad idea. They've told you that three other vendors solve this problem in this way and it's on their security checklist. What do you do when you're caught in the middle with no good options?

When I hear a hypothetical answer with no request for clarification or further details, what I really hear is a disregard of how things actually work — someone who has a model of the world in their head and is going to make other things conform to that model, rather than be flexible in figuring out how to adapt to the situations at hand. I've seen that sort of candidate brought in before and it resulted in a pattern of buying time, abdicating responsibility, and lots of private venting about how they're misunderstood or everyone is wrong.

3. Flip hypothetical questions, press on generic answers

So avoid hypothetical questions. Instead, flip them to be emphasize specific examples:

Tell me about a time when you had to deal with a difficult client or colleague. Why was it difficult? What led up to that difficulty? What were the organizational dynamics? How was it resolved?

Those questions — asked as a drip of followups — give you a much truer signal of what someone has actually done in a situation like that, and subsequently what they're likely to do again.

Similarly, apply the same probing-for-details approach on any generic (non-specific) answer that is provided.

4. Work backwards to identify unique types of expertise

I've used these techniques for differentiated hiring when building high-performance engineering teams. It's been a critical piece of my interviewing toolkit, especially when making initial hires to a new team.

But sometimes you need to design different questions because you're looking to evaluate a specific type of expertise. My experience has been that you have to be able to clearly define and articulate the qualities that you're looking for and what the application of those qualities looks like in your operating context. Then, you can work backwards to identify scenarios or contexts that may have elicited either the quality you're looking for, their opposite, or their absence. This approach can help you identify the right questions to ask for specific types of expertise.

5. Edge cases and red flags

This question pokes at many critical pieces of expertise but doesn't catch every edge case that comes up in conversation:

  • A candidate that struggles to answer
    If someone is struggling to answer the question, I'll use a personal anecdote as an example. This reciprocity often unblocks them and makes sharing embarrassing stories feel safer.
  • Early career candidates
    Early-career candidates may not have good examples to talk through. In these cases, pivot the conversation to examples to other contexts: academic projects, internships, even personal projects.
  • Selection bias against the careful and thoughtful
    Some people rarely make big mistakes because they're extremely careful. If you're getting that signal, double-click into it: why is someone so cautious?Probe whether this reflects true thoughtfulness or risk aversion. Ask about potential drawbacks from this thoughtfulness (for example, is it traded off against speed?), and ask about times they operated at the edge of their comfort zones.
  • Cultures of stigmatization
    Some candidates come from cultures that stigmatize mistakes and may be especially reluctant to show weakness. Recognize the behavioral change is tough and consider whether your team can support their transition to a learning culture.
  • Safety-critical roles
    This approach doesn't work for roles where risk-taking is dangerous (healthcare, aviation). Refer to the "work backwards" piece above to figure out the right model of expertise and behavior you want to evaluate for, and craft a new and more appropriate behavioral question, i.e., perhaps around risk mitigation or process adherence.
  • Lies and fabrications
    Reality has fractal-like detail — you can always zoom in further. Keep probing specifics. Liars hit walls quickly; truth-tellers reveal increasing complexity. Double-clicking into the specifics helps filter out most fabulists and unqualified candidates
  • No good examples
    If someone genuinely can't think of a significant mistake, that itself is revealing. Either they operate far within their comfort zone, lack self-awareness, or work in environments with no autonomy. All are important signals.

Appendix 2: Theoretical Foundations

These interviewing questions draw from established models of how expertise develops. They works because they map directly onto how experts seek out information, act on it, evaluate their outcomes, and change their behaviors in response:

Learning Loops (OODA, PDCA)

OODA (observe, orient, decide, act) loops and PDCA (plan, do, check, act) cycles describe how experts refine their judgment through repeated cycles of action and reflection. The question walks candidates through exactly such a cycle, revealing how sophisticated their learning, information seeking, and adaptation processes are.

Situationist Model of Expertise

Situationism suggests expertise is largely situational — behavior depends on context, environment, and support structures. Strong candidates demonstrate how they actively sought information to understand their environment before acting, while weaker ones reveal rigid thinking that ignores context and deflects accountability.

Recognition Primed Decision Making (RPD) Model of Expertise:

The RPD view of expertise is that experts navigate complexity through pattern matching against their library of experiences. The richness of a candidate's example — and their ability to connect it to broader patterns — reveals the depth of their experience library, their ability to extract useful patterns from it, and their information seeking/pattern matching behaviors.


Appendix 3: An Example Technical Exchange In Practice

Here's how this might play out in an interview:

Interviewer: Tell me about a time when you made a mistake, but at the time you thought you were right.

Candidate: At my previous company, I pushed hard for adopting a microservices architecture. I was convinced it was the right approach based on the scaling challenges we were facing.

Interviewer: What were those challenges?

Candidate: We had a monolithic application that was becoming unwieldy. Load times were increasing, and developer productivity was declining because changes in one area affected others unpredictably.

Interviewer: Why did you think microservices were the right solution? How did you advocate for it? Were there other approaches considered?

Candidate: We didn't really talk about other approaches. There was a lot of conversation about microservices as a scaling solution at that time. I had read several case studies from tech giants who solved similar problems this way. We had one experimental microservice already live, and it was one of the most reliable parts of our system so people instinctively bought into the vision. I was the first one to call it out publicly and my manager rallied other teams to buy in pretty quickly because my previous suggestions were pretty good.

Interviewer: How and when did you learn you were wrong?

Candidate: About six months into implementation, we realized we had underestimated the operational complexity. Our team wasn't prepared for the challenges of distributed systems debugging, and our deployment pipeline wasn't mature enough. We were moving slower than before, not faster.

Interviewer: "We?"

Candidate: Yeah, my team and I, and a few other teams. Turns out running one microservice is different than an entire fleet of them! A more senior member pointed out that we lacked SRE/DevOps expertise. We had feature engineers, but no one dedicated full time to managing platforms. I spent time learning about platforms management but I wasn't an expert. So teams ended up implementing inconsistently and our new problem became orchestration, on top of all the old problems.

Interviewer: How did you address it? What happened next?

Candidate: We had invested too much in the migration by the time we realized it might not be the right path. Executive leadership forced us to pause further decomposition in favor of new feature development. In the meantime, I convinced my boss to let me run a few sprints focusing on improving our operational tooling and monitoring. Our architecture remained in a semi-modularized phase and as a team we burned some trust on that project.

Interviewer: What lessons did you learn from it?

Candidate: The biggest lesson was that architectural patterns aren't one-size-fits-all. What works for Google or Netflix didn't work for a team of our size and maturity. I realized we need to evaluate technology decisions not just on technical merits but on organizational readiness. Nobody blamed me for it, but I felt personal responsibility for having pushed for it. On the bright side, I feel much more comfortable with my platform management skills. I used that knowledge a lot in my next role.

This exchange reveals the candidate's technical judgment, how they influence others, their ability to recognize and correct course, and how they extract broader principles from specific experiences.

]]>
<![CDATA[Identifying Signals of Expertise]]>One of the most useful questions I've used for evaluating expertise during a hiring interview is:

Tell me about a time when you did something you thought was right, and later it turned out to be a mistake.

That kicks off a series of additional questions and followups:

]]>
https://sharedphysics.com/signals-of-expertise/682659ade885df3fd0c25ec8Thu, 29 May 2025 20:00:17 GMTOne of the most useful questions I've used for evaluating expertise during a hiring interview is:

Tell me about a time when you did something you thought was right, and later it turned out to be a mistake.

That kicks off a series of additional questions and followups:

  • What was the context?
  • Why did you think you were right and how did you advocate for it?
  • How and when did you learn you were wrong?
  • How did you address it?
  • What lessons did you learn from it?
  • What needed to be true for you to have been right?

This drip of questions typically takes 10-15 minutes of interviewing time. It's a variation on the "tell me about a time you changed your mind about something" question, but provides very different signals. Importantly, it is not a "tell me about a mistake you've made" question, which looks similar on paper but misses the point. And that point is: when did a candidate do something that at the time seemed right enough to them and to others, and only in hindsight was revealed to be the wrong approach in some critical way?

Here's what it does, why it works, and the signals it unpacks:

1. It interrupts interviewing autopilot

It's an uncommon framing that breaks common interviewing patterns, forcing people to think in real time — something AI and memorized answers struggle to fake. I sometimes open the interview with this question to set the tone for an authentic conversation and get someone off of performance mode.

2. It checks for introspection and situational awareness

Introspection and situational awareness are critical pieces of expertise: thinking about yourself, understanding your behaviors, putting them in context, and changing based on what you learn. This question pokes at that mechanism; everyone's been wrong before, but not everyone reflects on it or adapts their behavior in response.

Furthermore, a breadth of lived experience suggests some accumulation of mistakes, errors, and general wrongness along the way. All things considered, it's extremely unlikely you're interviewing someone who is perfect (possible but not probable). This is a good thing! Mistakes and errors are an inevitable part of growth, and these questions aim to uncover examples (and awareness) of such growth. Probing into what a person did after they learned they were wrong helps you understand their ability to react to new information that might be different from what they already had in mind — a good signal into their decision-making and information-seeking behaviors.

3. It identifies level-appropriateness thinking

The magnitude of the mistake should match the seniority of the role. A senior architect who's never made anything worse than a syntax error either hasn't been given senior-level responsibilities, lacks good feedback systems, or lacks the self-awareness to recognize their strategic missteps. Conversely, a junior developer who talks about betting the company on the wrong database is either inflating their actual influence or has been working in an extremely immature company. The sweet spot is when candidates describe mistakes that match their claimed level of responsibility — senior folks should have examples involving architecture, strategy, or team direction. Their examples should show they understand the weight of irreversible decisions and have lived with the long-term consequences of their choices. If they haven't, they're probably not as senior as they claim.

4. It evaluates potential vs. actual bounds of expertise

The framing of the question seeks out an interesting situation: the candidate was allowed to take on some work, but ended up being wrong in some way about it. This situation describes the upper bound of a candidate's expertise at some problem set at a point in time.

That they were allowed (or assigned) a certain level of work means they had organizational trust to take it on (most people assign work to meet a person's level). That they did not meet that goal in some critical way means the work had some elements beyond the candidate's capabilities at the time. That's their local upper bound at the moment, the level between their perceived expertise and their actual expertise. The difference between the two shows how close (or far) they are from closing that gap.

Of course, not every example sends this signal. A strong followup to probe whether it's a true signal is: "Was this kind of decision/project representative of the work you were doing at the time?"

5. It also describes the candidate's operating environment

When interviewing engineers, the error they describe making is a good baseline for the level of autonomy and trust they had in making decisions. It's the difference between "my error was a bug" vs. "I prototyped a production system on Google Apps Script and was stuck maintaining the prototype for a year" vs. "I chose the wrong language for the project."

How such errors were caught and corrected says a lot about the support systems around them — whether they learned early through mentorship or only after consequences. It's the difference between "… and my manager caught it in review and told me why this was wrong and explained the right way of doing it" vs. "… and we shipped that code and six months later we had to rewrite the whole damn thing because it became unmaintainable under pressure." That typically gives me a range of whether the candidate is a generalist who can do a lot on their own or someone who really thrives on a team with specialization and well-defined roles/responsibilities.

6. It explores comfort with learning

At Amazon, there's a leadership principle that goes:

Leaders are right a lot. They have strong judgment and good instincts. They seek diverse perspectives and work to disconfirm their beliefs.

In building a high-learning team, we flipped this on its head: team members are allowed to be wrong, a lot. But they're wrong in constantly new ways. They test boundaries, push their capabilities, and experiment. The only "sin" in a learning environment is repeating the same mistakes over and over.

Given that growth comes from working at the edges of your expertise, my corollary is a belief that you learn more from mistakes than successes. Our team's operating philosophy was to lean into mistakes as learning opportunities and good signals to check our understanding and assumptions around a problem. So having someone comfortable with airing out embarrassing details and thinking critically about them was a good cultural signal.

7. It pokes at confidence and ambition

It's critical to understand if candidates will speak up and be willing to be wrong in public ways — their confidence in their expertise and standing among peers. How candidates advocated for their (wrong) ideas shows their willingness to speak up and defend what they thought was right — critical for healthy technical discussions. It's a reliable signal for both personal confidence (speaking up) and technical confidence (details of their solution). Someone's level of confidence to put themselves out there is generally a good sign of someone's tolerance for risk and potential for growth.

However, you don't want to bias your interviewing for overconfidence. Double-click into why someone thought they were right and why they were willing to defend that position. The useful followup around this is: Why did you believe something? What path of analysis or thinking got you to that place, and got you to defend that position?

As a corollary to unpacking a candidate's operating environment, a good followup is: "Was it normal for suggestions on how to solve a problem to come from the team or was there something unique in this situation?"

8. It shows how someone generalizes information

I care about what they take away from the error. Some candidates learn great lessons; others have takeaways I would have facepalmed myself over, had I not been on camera. Do they view mistakes as valuable learning opportunities or as failures to be minimized? Do they generalize a lesson or walk away with a narrow, situation-specific takeaway?

The prompt for "bigger mistakes" usually has a stronger correlation to more interesting lessons. A memorable exchange with one candidate was about how a technical solution that worked for them at one company failed when they applied it to another company — all sorts of healthy discussion around that!

Following up with "... and what might have needed to be true for you to be right?" probes someone's openness to change and where they place responsibility. It's the difference between "other people should have been different" or "I should have known to check for X details first."

9. It reveals individual contribution (not team achievements)

One of the most critical signals this question uncovers is the difference between individual expertise and organizational expertise. Many candidates unconsciously slip into "we" language when describing their work: "We decided to implement microservices," or "We realized the approach wasn't working."

This matters because you're hiring an individual, not their previous team. When you hear "we," always follow up with clarifying questions: "To clarify, you personally made that decision?" or "What was your specific role in realizing the approach wasn't working?"

The best candidates can clearly articulate their personal contributions while still acknowledging team dynamics. They'll say things like "I advocated for the approach, and convinced the team because..." or "The team was split, but I pushed for X because Y." This precision reveals both their actual expertise and their self-awareness about their role in group decisions.

Watch out for candidates who can't differentiate their contributions from their team's. If pressed for specifics, they either deflect ("It was really a team effort") or may claim credit unconvincingly ("Yes, I did all of that"). Both responses suggest either a lack of individual impact or a lack of honesty — neither of which you want. And if you're not confident in their answer, continue to push into the details; an intricate understanding of the details – and the ability to navigate them – is the lifeblood of expertise.


Will It Work For You?

Before writing this post, I shared an abridged version in a forum of colleagues. Within days, folks began to share stories about what they were able to identify in candidate conversations that wasn't obvious before. Here's an example:

"I used it this week to good effect. The candidate positioned themselves reactively. Their decisions were good, but they didn’t really come across as the protagonist. It indicated a need for a bit more structure, but we’re interviewing for a newer role where we need them to help define what the job actually is."

This technique helped identify not just technical competence but also how candidates operate within organizations—whether they drive initiatives or primarily respond to direction, whether they learn from mistakes or repeat them, and whether they can adapt to your specific context.


Closing Thoughts

To implement this in your own hiring process:

  1. Make the time, set the tone
    Introduce the question early in the interview to set an authentic tone. Make sure you can spend time digging into the details and pulling on the various threads that come up.
  2. Listen actively, follow up
    Listen for signals beyond the technical details of the mistake. Don't make assumptions about the candidate, ask followup questions.
  3. Double down on specificity
    Seek out individual experience. Avoid hypotheticals or generic answers. Seek out the context around the answers.
  4. Compare patterns across candidates for the same role.

Remember, the goal isn't to judge candidates for making mistakes — it's to understand how they process, learn from, and adapt after those mistakes. That capacity for growth and self-correction is often a stronger predictor of success than any perfect track record. And by leading with vulnerability (asking about mistakes), you create permission for honesty. It doesn't just reveal how candidates think — it changes how they engage.

The best interviews with this question don't feel like interviews at all. They feel like two people working a problem together.

And isn't that exactly what you're trying to understand?


Appendix:

Interesting in continuing to learn more about teasing signals of expertise from hiring interviews? This post has an appendix with additional techniques and information:

Appendix to “Identifying Signals of Expertise”
In “Identifying Signals of Expertise”, I ended up cutting almost 2000 words to keep things succinct and focused. But that stuff was useful! If you’re looking to upskill your interviewing skills further and dive deeper into identifying signals of expertise, read on; I’ve included three sections that…
]]>
<![CDATA[Field Testing Claude vs. ChatGPT for Marketing Strategy and Advertising Analysis]]>https://sharedphysics.com/field-testing-llms-for-marketing-and-advertising/681b8007e885df3fd0c2590dTue, 13 May 2025 21:16:32 GMTOver two weeks, I put Claude 3.7 and ChatGPT 4o through their paces on analyzing real marketing data and creating strategic and tactical artifacts. My goal was to determine if LLMs (Large Language Models) could actually steal my job or just make it easier.

The verdict? LLMs excel at busywork but struggle with nuanced analysis. LLMs swerve toward the generic unless heavily prompted otherwise. They regularly get you to 80% done, but human intervention is required to cross the last 20%. They're powerful copilots for experts but potentially dangerous autopilots for novices. Oh, and templating prompts doesn't seem to have the magical-incantation-for-great-results that some AI influencers suggest... but templates are still great tools for forcing you to organize and prepare information for better input/outputs activity.

Here's what I learned from feeding five years of ad data, eight years of order data, website and marketing copy, and hundreds of customer reviews into these models.


Methodological Details

  • Data: Export of five years of Meta advertising data (all available columns, aggregated by week), eight years of order data, four years of customer reviews, export of website copy (homepage, product pages). Personally identifiable data was removed or anonymized as appropriate.
  • Prompts: Nine prompt sequences about marketing strategy, five prompt sequences on data analysis, two templating experiments, and notes from various other experiments. I provided limited prompting to not bias the models toward foregone conclusions—I wanted to see what they would come up with, not what I had already concluded. I used identical prompts for both models, with exceptions when one would suggest additional deliverables or require need a nudge to get to a comparable outcome.
  • Models tested: Claude 3.7, ChatGPT 4o. Gemini (2.0 and Reasoning) disqualified itself by not being able to ingest .xlsx or .csv data to even get started.
  • Caveats: LLMs have a "temperature" setting, which reflects randomness for outputs. This means the same prompt and same starting point can generate different outcomes. This makes controlled experiments and replicable results really hard! So this experiment is not a benchmark – my goal is to convey learnings from real world usage on a limited scope project.

Index:

  • Part 1: Advertising Data Analysis
  • A Quick Detour: Does Prompt Structure and Templates Design Actually Matter?
  • Part 2: Marketing Strategy & Persona Development
  • Part 3: Creative Generation
  • Part 4: A Meta, Meta-Analysis (Editing This Blog Post)
  • Conclusions and Takeaways

Part 1: Advertising Data Analysis

I ran eight years of Meta advertising data through both models for analysis and insight. Here’s what I learned:

Interesting Insights

Both tools created visualizations. Charts and plots are always useful, even for things as simple as “a metric over time”.

A sample chart ChatGPT generated from a long-running campaign.

Claude identified seasonality patterns, highlighting recurring performance issues in Q1 and Q2 with increasing severity each year. It discovered this from advertising data alone. My partner and I had guessed at this seasonality, but it was easy to miss in order data alone due to non-advertising marketing efforts during those periods.

Claude further identified audience saturation, creative fatigue, and conversion funnel issues as root causes of recent performance drops. It also suggested investigating competitive landscape changes and algorithm changes outside what could be gleamed from the dataset. Audience saturation and creative fatigue proved correct on independent analysis, but conversion funnel issues were disproven when I checked against my actual website data. The 'change in ads algorithm' assumption was spot-on, correctly pinpointed to Q4 2024.

Most importantly, it highlighted Meta's tracking gaps, which could lead to incorrect conclusions and bad decisions in self-optimizing campaigns.

ChatGPT’s analysis was much more basic, with gems like “The plot shows a clear upward trend in ‘Cost per Results’ recently—confirming that ad performance has worsened (more cost for the same result).” Not quite the level of insight I was looking for! It also highlighted a potential conversion funnel issue… but again, bad data in results in bad conclusions out, and that’s on Meta.

Raw Analysis Limitations

Insight: LLMs struggle with undirected and unprocessed data analysis. They're bad at discovering useful correlations in messy data sets.

Claude and ChatGPT both struggled with undirected analysis and unprocessed data.

Real-world data typically has many direct relationships in its columns. For example, a healthcare data set may include a hospital name, shortcode, and address that can all be described as “structured atomic units of a single datum”. Marketing data has similar interrelationships between columns. This messiness results in superficial LLM gems such as “Campaign name had the strongest correlation with daily spend” or "Results count has the strongest correlation with Cost per results." True, but often not useful.

Separately, the models both began to hallucinate when prompting took the insights back into deliverables. I.e., a prompt around "funnel metrics to monitor" made up columns that did not exist in the data, but looked like it could have.

Model Output Styles

Takeaway: I preferred Claude’s plainer style over ChatGPT's embellishments.

Claude was generally verbose, analytical, plainly written, and provided more interesting insights. It accurately identified plausible algorithm changes and their dates, made reasonable assumptions about performance changes needed, generated better visualizations, and produced more robust analytics code compared to ChatGPT. But that verbosity came at a cost: it frequently hit artifact length limits, requiring reprompting. Longer conversations led to degraded output and occasional interface bugs.

ChatGPT’s outputs showed a preference for emojis and overstyling text (bolding, italics, bullet points, etc.). In short conversations, this was useful. But in longer outputs it felt juvenile, as though someone decorated the output with stickers. ChatGPT also often meta-analyzed a prompt instead of executing it: it would tell me how it would do the work, then ask if I wanted to proceed as described. This pattern appeared more frequently after OpenAI's sycophancy rollback and led to hitting usage limits faster.

ChatGPT also routinely applied a much heavier hand in editing for voice, tone, and content. Any output run through ChatGPT — even with prompting to maintain tone and style — resulted in a variation of the distinct “ChatGPT Voice” that folks complain about. This was especially noticeable when comparing LLM outputs.

Data Gaps and the Principal-Agent Problem

Insight: Any data set is a selective view of the world. Gaps in data lead to poor analysis and inadequate conclusions. Paired with a misalignment between platform and advertiser goals, this can lead to poor optimization.

In reviewing advertising data alone, both LLMs suggested performance dips due to gaps at the point of sale (website funnel and checkout). However, I knew that nothing on the website had changed in 12 months, and website data showed the conversion funnel remained steady. The gap was in the advertising tracking. A self-optimizing campaign at this point would have started to optimize for the wrong (or at least, incomplete) data.

Further prompting and context/data for the LLMs helped to clear this up and come to better conclusions, but autonomous LLM deployments don't have that benefit.

This aligns with a broader concern: Meta and Google's ad optimization incentives are misaligned with user goals. Their interest is in increasing ad spend, with customer outcomes as a corollary, not an end goal. Better performance can lead to less ad spend, and platforms optimize for more spend. (Separately, this is why I regularly opt out of platform-suggested optimizations.)

Claude noticed this pattern in subsequent prompting:

Budget Utilization
- Despite decreasing results, your campaigns are consistently spending the full daily budget
- This indicates Facebook's algorithm is struggling to find efficient conversion opportunities

- The platform is prioritizing spend over performance optimization

Improved Analysis Through Applied Expertise and Supporting Context

Insight: Domain expertise dramatically improves results by enabling better prompting but changes the LLM-User relationship.

Analysis gaps stemmed less from LLM limitations and more from the fundamental garbage-in, garbage-out nature of statistical analysis. The difference between asking "find any correlations" versus specifying a "multi-variable ANOVA-validated analysis with p-value filtering" is enormous. Domain expertise dramatically improves results by facilitating better prompting.

Once the data was cleaned of pseudo-duplicate columns and the LLM was given additional instructions (what to look for and what to ignore), it produced useful insights. Here's an example of such prompting:

"What about a correlation between 'Cost Per Result' with metrics like 'Frequency', 'Adds to Cart', 'Checkouts Initiated'? Nothing on the website or product pricing has changed since these ads started, leading me to believe that recent performance drops are related to something in the advertising performance data. My goal is to identify leading indicators of performance and monitor them when implementing a new ad strategy."

However, this changed the relationship with the LLM. It became less about “can it generate insights for me” (strategic partnership) and more about “can it do some of the typing/grunt work for me” (tactical execution). Still useful, but a different use case.


A Quick Detour: Does Prompt Structure and Templates Design Actually Matter?

If you're a close reader, you might have paused at the section above thinking: "Those are awful prompts! Aren't you using best prompting practices and templates?"

Fair question. I was curious about this too and ran some separate experiments to test if prompt templating significantly impacts output quality.

The Template Experiments

I ran two experiments with ChatGPT:

  1. Create a children's fable about a fox
  2. Create a press release for a new product launch

Each experiment had four different prompts:

  1. A "lazy" prompt (minimal details in a text blob)
  2. An unstructured but detailed prompt (blob of text, but with more relevant details)
  3. Two different template variations (for the press release, they were the same 'template' but had different levels of details and color; for the story, they were different templates).

Each prompt was run twice in temporary windows to avoid chat history bias. All four prompts generated similar stories and press releases with interesting unprompted similarities:

  • All fox characters had names starting with "F"
  • Stories launched in rhyme-sounding woods
  • Press releases assumed San Francisco headquarters
  • Included quotes from female company representatives
  • Featured client quotes from the midwest

Re-runs generated different outputs due to LLM temperature settings creating randomness, and no prompt produced consistently superior results. The prompts and results are available here for review.

My takeaways:

Templates are guides, not magic incantations
Conversations about prompt templates mirror debates about PRDs (Product Requirement Documents)—hundreds of templates exist and everyone seems to have a strong personal preference. But templates serve one purpose: guiding you through organizing information to reach desirable outputs. If you need the structure, use them; if you can organize information effectively without them, skip them.

Frankly, good "prompting" isn't too different from providing good requirements and instructions to a human team. The same principles apply: provide necessary context, details, expected outcomes, and any other critical operational information. In that sense, a great template is – to some degree – a distillation of operational expertise.

Iteration and information matters more than format and structure
Every output needed refinement. Reading outputs revealed obvious directional and informational gaps in the original prompt better than any template. Rather than treating templates as a holy grail, use them as a starting point for iterative development. Information content matters more than information structure, and too much information could be just as bad as too little information.

Production use is different from one-off prompts
There's a crucial distinction between one-off prompting and embedding LLMs in an application. Given output variability, figuring out how to get to reproducibility is critical for productization. Well-defined, multi-layer, iterative, and extensively tested templates are essential for embedded, repeatable use cases and application design.

Bottom line
For the sort of "field deployment" that I'm working on, spending time perfecting (or finding the perfect) prompt templates yields diminishing returns. But I would certainly recommend templates for anyone who is getting started with LLMs as a good way to start thinking about how to effectively engage chat-based interfaces. Templates are also a useful shortcut for figuring out how to deploy an LLM for new use cases and has some merit as a document of creativity and tacit expertise.


Part 2: Marketing Strategy & Persona Development

Both Claude and ChatGPT generated marketing artifacts: strategy guides, personas, advertising content suggestions, positioning, SEO analysis, and landing page rewrites. Both models performed adequately. If a junior marketer had created these outputs, I would have been satisfied enough to move forward.

What Worked Well

  • Comparative Performance
    Claude and ChatGPT produced similar and complementary outputs. They identified broadly similar personas while suggesting different campaign ideas. Both gave better marketing recommendations than the agencies I'd been speaking with—despite agencies having access to the same data (my website) to work with.
  • Validation of Intuition
    LLM recommendations aligned closely with what my partner and I intuitively knew and had been planning to work on. This occurred even when I reviewed my prompts for potential bias (none found). As an extremely analytical person who doesn't always trust his gut feelings, this was healthy reinforcement that I was on the right track. It helped break me out of analysis paralysis and move forward with implementing ideas I had been on the fence about.

LLMs Excelled in Busywork

The 80% Rule: LLMs excel at synthesis and artifact generation but need human refinement.

LLMs excelled at marketing busywork. For example, I don't know a single person who likes creating personas... and consequently most marketing personas suck and exist only to check boxes. But both ChatGPT and Claude created —for probably the first time in my life—actually useful personas. (Not revolutionary, but very practical for purposes of using them to segment advertising campaigns). This included:

  • Technical details for targeting criteria, demographics, and keywords
  • Advertising and media ideas
  • Real quotes from review data
  • Statistical validation of suggestions (e.g., "80 mentions of Dragon Blood Balm for pain management, but only 15 mentions for joint issues")

Synthesizing datasets into usable outputs typically takes me a week of manual work. Having it done in an afternoon of prompting felt revolutionary. But outputs weren't final—I needed another full day of manual review for adjustments and cleanup.

Meta-Analysis Capabilities

After several rounds of prompting, I asked both Claude and ChatGPT to generate documents and guardrails for further prompting. I then ran that document as supplementary data in a prompt to generate a landing page. The results were promising but left me wary of potential content drift over time.


Part 3: Creative Generation

After all that prep work, were the LLMs good at generating media, content, and advertising copy?

Kind of, but not really.

The Generic Trap

Core Problem: LLMs will always swerve toward the generic, the common, and the default unless told otherwise.

Both generated interesting headlines/body copy, but I wouldn't have been comfortable using any without rewrites. For example, one LLM generated: "Plant Power, Not Petroleum”. Snappy, but am I selling clean and renewable energy or a balm for skincare? A rewrite to "Heal with Plants, Not Petroleum" captured the same essence but with more clarity and focus. Or to keep pumping that theme — pun intended — "Jelly should go on your toast, not on your hands", "Petroleum is for engines, not your body". See what I did there? See what the LLM didn't?

Other ads were just slop (and not even AI-level slop). "Harness the Plant Revolution" is a line that could be applied to ... checks ad bank… any wellness-oriented product. Or: "Don’t let pain slow you down". That one's good for... checks ad bank again... any pharmaceutical. That's day 1 intern-level copywriting.

What Makes a Good Ad?

Beyond classic performance metrics, good ads stop your scroll, catch your attention, and encourage engagement. Great advertising goes further—it's a mechanism for discovery, helping articulate feelings and needs people had but may not have expressed yet. They can create interest where none exists. Great advertising balances targeting the right audience with speaking to them effectively.

There’s very little great advertising out there; most people are familiar with bad ads, and bad ads are also thus over-represented in training data. They’re bad because they’re targeting the wrong people (and thus are jarring) or because they’re targeting the right people but are boring or annoying (sometimes offensively so).

Both good and great ads tend to bat left-field. Consider the following ad I ran in 2018:

This was before Oatly used that voice with a national campaign. I thought I was pretty clever with this at the time. Did it work? Given that we're still in business, it certainly didn't hurt ;)

I don't want to go so far as to say that an LLM can't come up with something like this (or so self-unaware as to pat myself on the back for this as a great ad), but I'm pretty sure an LLM wouldn't come up with something like this without extensive and extremely specific prompting.

Or consider the following:

Is this a great ad? Probably not, but it’s better than "Harness the Plant Revolution". At the least, it gets you to stop and read. And yes, you're not alone if it reminds you of a 90s magazine ad.

Again, you could figure out the right prompts to generate something like this, but at that point you fall back into the expert/novice dichotomy in how you’re using LLMs. An expert’s creative direction would get you there but they wouldn’t need the LLM. A novice would get you to slop, and the novice probably wouldn't understand why it’s slop.

I can't emphasize enough that LLMs are statistical best-fit models and so you're always going to get the most probabilistic/common output based on what you put in. Said another way, LLMs will always swerve towards the generic, the common, and the default setting unless told otherwise. And when told otherwise, they will continue to swerve to the version of generic, common, and default that fits your refined prompt.

Content and Landing Pages

Many of the same principles applied to content generation—blog posts, newsletters, and landing pages—with the familiar refrain: garbage in, garbage out.

  • Context is everything for content
    Prompts lacking context and information produced boring or bogus content. The more detail I provided, the better results I got. My most successful iterations started with loose content outlines and supporting details, then I worked with the LLMs to refine the details and presentation.
    ChatGPT outperformed Claude at pulling in additional information, but did not always outperform a web search. The key insight: drafts that started from nothing (e.g., "Draft a blog post about X ingredient") resulted in throwaway work.
  • LLMs can't replace experience and perspective
    Blank-slate drafts resulted in throwaway work because they lacked a certain je ne sais quoi – if all you're doing is generating the most statistically probable content from a prompt, you're going to get the most statistically probable outcome... which in content land, means the equivalent of SEO-farm content. Blog posts that started with a draft outline that I provided were better because they offered a perspective and angle. The best content tended to be LLMs augmenting personal experiences and perspective that I brought to the table with their editing – I brought things that they didn't have, and they provided processing power.
  • Landing pages were a multimodal challenge
    Both LLMs struggled with landing page content, which is inherently multimodal. As any designer knows, describing website layouts with text alone is frustrating—and that's exactly what LLMs force you to do.
    While both Claude and ChatGPT could convert descriptions into HTML mockups, the results differed significantly from my actual site. Worse, generating SVG icons or CSS styles consumed significant context window space. The disconnect between textual description and visual execution highlighted a fundamental limitation: LLMs excel at text but struggle with the spatial and visual thinking essential for effective web design.

The pattern held: LLMs are powerful assistants for ideation and initial drafts, but the gap between AI-generated templates and production-ready content remains substantial.

Technical Note: Image Generation

Separately (and this probably goes without saying) — ChatGPT’s image creation overperformed Claude. No comparison here. This meant ChatGPT was more useful for multi-modal content creation. Still, even ChatGPT’s new model wasn’t perfect… Here’s one recent outtake from the post-Ghibli era:

That arm... That hold... That smear of balm...

Part 4: A Meta, Meta-Analysis (Editing This Blog Post)

After drafting my initial conclusions, I ran this post that you're reading through both Claude and ChatGPT for editorial review and asked them to rewrite it based on their suggestions.

Voice & Structure

Takeaways: Claude provided light updates, while ChatGPT generated significant rewrites. Use (and compare) both models for the best results.

Claude typically provided minor updates, removing fluff and cleaning typos. It maintained the original voice and structure of my drafts.

ChatGPT provided complete rewrites with extensive tonal changes (even with prompting to not do that), shifting content towards the "ChatGPT voice" in either large or small ways.

To take a random example, consider this section:

My original draft:

Claude and ChatGPT both struggle with undirected and unprocessed data analysis. Specifically, they are bad at discovering useful correlations in messy data sets. Real world data (such as advertising and healthcare data) typically have many columns that have direct relationships with each other. For example, a hospital name, code, and address are all directly related to each other. A clickthrough rate, volume, and metric per 1000 clicks will also typically have strong correlations. This sort of 'dumbness' results in insights such as "Results count has the strongest correlation with Cost per results." True, but not useful.

Claude rewrite:

Claude and ChatGPT both struggle with undirected and unprocessed data analysis. Specifically, they are bad at discovering useful correlations in messy data sets. Real world data (such as advertising and healthcare data) typically have many columns with direct relationships. For example, a hospital name, code, and address are all directly related. A clickthrough rate, volume, and metric per 1000 clicks will also typically have strong correlations. This sort of 'dumbness' results in insights such as "Results count has the strongest correlation with Cost per results." True, but not useful.

Chat GPT rewrite:

Claude and ChatGPT both struggle with messy or undirected data. Real-world marketing data (like ad performance metrics or healthcare data) has many columns with known interrelationships—think "clickthrough rate," "volume," and "cost per 1000 impressions." Yet LLMs often miss the obvious. Example:
"Results count has the strongest correlation with Cost per Result."
Technically true, but completely unhelpful.

My Final Version:

Claude and ChatGPT both struggled with undirected analysis and unprocessed data. Real-world data typically has many direct relationships in its columns. For example, a healthcare data set may include a hospital name, shortcode, and address that can all be described as “structured atomic units of a single datum”. Marketing data has similar interrelationships between columns. This messiness results in superficial LLM gems such as “Campaign name had the strongest correlation with daily spend” or "Results count has the strongest correlation with Cost per results." True, but often not useful.

Differences in Analysis vs. Artifacts on Iteration

Observation: Both models occasionally struggled translating their own analysis into artifacts. Iterative prompting generated output inconsistencies.

ChatGPT and Claude both periodically struggled with translating their own analysis into artifacts (i.e., Prompt 1: "Analyze this text", Prompt 2: "Implement the recommendations"). A close read of both the analysis and subsequent artifacts revealed an occasional lack of continuity between the outputs. Claude had minimal differences, but ChatGPT often went wild with rewrite upon rewrite at each prompt, including regularly changing details, inventing (hallucinating) things, and sometimes changing the intent of a section or providing a different conclusion.

This behavior wasn’t consistent. Sometimes repeat prompting resulted in net improvements over time. Other times, repeated prompting felt like running the same photoshop filter over and over on an image, gradually getting more and more degradation.

That said, I’d still advocate for iterating on an artifact. My go-to prompt was along the lines of, “I rewrote the article and implemented your suggestions. Please go through [analysis details] and suggest changes.” Like collaboration with any editor, subsequent revisions could reveal new potential improvements… or hijack the tone of a work.

All in all, it reinforced the importance of human oversight in the editing process.


Part 5. Conclusions and Takeaways

First and foremost, none of this experimentation would have been possible without 8 years of advertising data, reviews, surveys, and content. Blank-slate experiments produced generic, boring, nearly useless outputs. But what seems "generic and boring" to an expert may be valuable to beginners, so your mileage may vary.

I started out trying to validate how effectively LLMs could process real data and be strategic partners. I was wondering if an LLM could “steal my job” (and hoping it could, because I left marketing as a career tears ago and hate being dragged back).

By the end I was alternatively surprised and disappointed. LLMs are both extremely useful and incomplete. I'm not worried about an LLM replacing me (though a manager may feel otherwise). I had to put in a lot of work to make the LLM pretend to be able to replace, which kind of defeats the point. I would also advise my team not to worry about being replaced by LLMs.

That said, if you’re not putting an LLM in your toolbox (beyond following the crowd to make Ghibli-themed action figures), you’re missing out. But if you’re deploying LLMs uncritically, that's worse. Taking full advantage of an LLM requires strong meta-awareness (thinking about how you think about things), and that's a skill worth exercising whether you're applying it to LLMs or any other work.

1. LLMs Are Statistical Best-Fit Algorithms

Even when an LLM looks like magic (or like emergent properties, for more technically-minded folks), it's helpful to remember that their core is a statistical model for predicting the best next token – kind of like how many web applications are basically data storage and retrieval systems with fancy interfaces. Keeping this in mind makes it easy to notice and avoid certain kinds of LLM-output traps, and to orient yourself for embedding them in applications or using them for one-off projects.

The two biggest applications of this reminder during this field test were that (a) LLMs will always default to the default, and (b) your perspective, expertise, and lived experience are unique things that an LLM can't replicate (but can augment) and that's something worth leaning into. For example: an LLM can create an "explainer post", but it can't create a post that talks about the lived and personal experience of working with the thing that's being explained.

2. LLMs Are Great for Experts, Dangerous for Novices

LLMs excel at ideation, reacting to specific prompting, and artifact generation. But they're unreliable analysts. My review of outputs suggests that it's extremely easy to be misled or to come to incorrect conclusions. Recognizing quality issues, identifying hallucinations, structuring useful prompts, and validating outputs requires at least some basic domain expertise.

With low barriers to use, it’s easy to go wrong without understanding why or where. Users need to bring a bar of competence to compensate for the gaps in LLM capabilities and behaviors.

3. Prompting Matters, But Not Always In Obvious Ways

Input variability, prompt design, and prompt sequencing significantly affect outcomes. But understanding what your inputs include and exclude is equally critical. Any dataset is necessarily a subset of reality and omits critical details. Understanding what's left out is as important as what's included.

Prompt designs can easily bias your outputs (for better and worse). Harking back to expertise, experts can create great prompts but those same experts already have a good idea of what they want. My guess is that as with any tool, experts in a field will use LLMs very differently than beginners and novices.

4. Hallucinations and Accountability

LLMs routinely invent data based on statistical inference from training data. This problem worsens in domains underrepresented in training data. Catching these fabrications requires close attention and cross-checking.

This is why "AI agency" concerns me—tools can't take responsibility for outcomes or be held accountable. "AI agency" obscures critical human responsibility and necessary oversight. Innovations such as Retrieval-Augmented Generation (RAG/Reverse-RAG), Clustering Using Representatives algorithms (CURE), and multiple layers of cross-prompting make significant progress towards output quality, but don't address the ultimate accountability and responsibility questions of when a tool produces bad (or harmful) output.

5. Training Data Is Your Differentiator

My experience suggests custom training data is key for LLM deployment in technical fields. Many companies focus on prompt engineering and basic API integrations. I strongly believe that effective (or innovative) LLM implementation requires, at minimum, custom training datasets or fine-tuning. Augmenting prompts with data is a hack that consumes context windows.

6. The 80% Trap (Copilots, not Agents)

LLMs regularly get you 80-90% of the way to ‘done’. The last 10-20% makes the difference between slop and impact. As output becomes easier and faster, editorial judgment and human refinement become critical skills. Multiple rounds of LLM iteration are helpful, but human synthesis and taste remain essential.

7. If You’re New to AI, You've Got a New Vocabulary

Prompting, artifacts, context windows, training data, tuning layers, temperatures, and different model names are all required learnings for anyone diving into working with LLMs. It is also extremely helpful to understand different kinds of “AIs”, especially the differences between an LLM vs ML (Machine Learning) and NLP (Natural language processing) systems. It’s also helpful to understand where LLMs lead to genuinely emergent behaviors (or emergent-like) and where they behave more similarly to a linear regression model.

8. What This Means for You

  • If you're a marketing expert:
    You can/should use LLMs for data visualization and reporting, persona creation, and refining first drafts. They're powerful copilots for ideation and busywork but should be closely monitored for quality control. Maybe you don’t need that junior hire you were thinking about.
  • If you're new to marketing:
    Be extremely cautious about trusting LLM outputs without expert review. The confidence of an LLM output masks all the potential ways in which it is wrong. Working outside of an LLMs (books, people) is probably more critical than ever for your development. Lean on LLMs to create fast feedback loops but be wary of their propensity towards positive validation. Work on understanding your environment (company, industry); lean in to developing taste by keeping your eyes open extra wide to everything around you.
  • If you’re an agency:
    LLMs are coming for commodity work. If you’re not augmenting your pitches with LLM analysis, you’re falling behind. Deploy LLMs for the grunt work and specialize in strategy and creative insight.

9. Practical Workflow Recommendations for Marketing LLM Use

  1. Prepare your data: Clean datasets beforehand, remove pseudo-duplicate columns.
  2. Structure your prompts: Be specific about analysis methodology and objectives. Provide adequate context. Present some idea of what you’re trying to accomplish.
  3. Plan your sequence: Use cumulative approaches, building on insights across multiple prompts. If you have chat history on, turn it off if you find a model going off the rails, and start over.
  4. Cross-validate: Use personal intuition and multiple LLMs for different perspectives. Be critical.
  5. Plan for refinement: Always allocate time for the critical last 10-20% of human polish.

The future isn't LLMs replacing marketers; it's marketers using LLMs to do more interesting work while leaving the busywork to algorithms.

]]>
<![CDATA[Using Risk to Drive Outcomes and Innovation (In Healthcare vs. in Marketing)]]>https://sharedphysics.com/what-healthcare-gets-right-about-risk/680e2245e885df3fd0c25591Sun, 27 Apr 2025 21:55:57 GMTSubtitle: How Studying Healthcare’s Contracting Models Changed the Way I Buy Ads

The Promise of Outcomes

Raise your hand if you’ve ever worked with or had to hire an advertising/performance marketing agency and been underwhelmed by the promise – but not guarantee – of improvements and value.

✋ ← That’s me.

I recently went to hire a digital ad agency for Dragon Blood Balm and every pitch sounded the same:

“We’ll create media. We’ll run ads. We’ll optimize through experimentation. We’ll grow your ROI.”
- Every Agency

But the pricing? A kickoff fee, monthly retainer, and collection of a percentage of ad spend. All risk-free... for the agency.

Everyone talked about outcomes, but no one wanted to get paid based on them. I asked a dozen marketing agencies to tie their pay to actual sales growth, and they all said a variation of the same thing:

"We can't/won't do it. We can't guarantee results."

That’s when I realized: I’ve seen this before… in healthcare, of all places.

In the traditional US healthcare model, doctors get paid for every test, visit, and prescription—regardless of whether patients get better. But healthcare has been shifting. Slowly, painfully, but perceptibly.

And what healthcare learned about outcomes and incentives, performance marketing still hasn’t. Why’s that?

How Healthcare Aligns Risks and Incentives

There’s a concept in healthcare called Value-Based Care (VBC). Stripped to its core, it means:

Pay for results, not just effort.

VBC is a major and long-running shift in healthcare, but you're probably more familiar with its opposite: the traditional fee-for-service (FFS) model where care providers and vendors get paid for doing things—whether those things make patients healthier or not. VBC flips the incentive: it’s outcomes, not activity, that get rewarded.

To illustrate, imagine you run a company that helps reduce flare-ups in a chronic condition.

  • Value-Based Care (VBC):
    In the simplest VBC implementation, you’re compensated a fee based on decreasing the average volume of flare-ups among your patients. There's a control group against which you're measured to account for externalities. The more you move the needle, the better you're paid.*
  • Fee-For-Service (FFS):
    You get a flat fee every time you perform a service—regardless of results. Your work maps to a CPT code (Current Procedural Terminology) with a standard payment plus or minus some time, effort, and complexity considerations. Want to make more money? Do more services. Whether patients improve is beside the point.

(* Fastidious note: There are many types of VBC models. “Getting paid” often means sharing a pre-negotiated percentage of savings, and outcomes are usually risk- and impact-adjusted. Quality and process metrics can be included in this calculation, above and beyond ‘hard’ clinical outcomes such as “patient didn’t get sick.”)

VBC (and Risk) Leads to Innovation

VBC aims to align incentives between payers (Insurers, Hospitals) and care providers. Better care results in better outcomes, which results in better compensation.

Moreover, untangling the payment model from the activity of care has allowed providers and vendors to experiment with different models of care. There is no financial penalty in the form of lost fees to stopping an activity that’s not working. And there’s no penalty (in the form of uncompensated work) to trying something new.

In the past, the only care providers or organizations that could afford to experiment with care were ones that had huge endowments or financial subsidization, direct payment relationships with patients, or academic institutions. Payer openness to VBC arrangements has opened the door to any organization to drive care forward through experimentation, innovation, and challenging conventional wisdom and common practices. That’s a good thing for everyone involved.

But Implementing VBC Isn’t Easy

And if you talk to healthcare executives, you’ll hear similar near-unanimous consensus that VBC is the way to go. Even federal agencies have been pushing (slowly) toward it. Yet much of primary care and hospital contracting in the U.S. is still FFS-driven.

That’s because VBC is had to implement. Here’s why:

  • Stability, Familiarity
    FFS is stable, easy-to-administer, and has predictable payment structure. It's easy to work with, even if inefficient and disliked. It's been around forever, and its implementation is standardized... whereas VBC contracts are all uniquely tailored to the arrangement between a client and vendor.
  • Complex patients = complex care
    Healthy people don't need outcomes-based contracting. Complex populations — the people who need better outcomes — are much harder to drive outcomes for, are likelier to have multiple caregivers and vendors involved, and improvements aren’t always immediate or linear.
  • Attribution is messy
    It’s hard to cleanly tie specific actions to specific outcomes, especially when multiple vendors are involved. Everyone wants to claim credit. Moreover, the “shared savings” pool for a given patient population can quickly evaporate as additional vendors are added in.
  • Believability gap/Burden of proof
    Everyone pitches outcomes. Few can prove them, and real results take time. Organizations often start with a FFS model (or hybrid, more on that below) to de-risk the proof-generating phase. Later vendors try to renegotiate for a bigger share of the upside. By then, the contracting organization has little incentive to give up a low-cost, high-return setup.
  • Access to Data
    Proving impact requires good data... and most healthcare orgs don’t have it. Vendors often depend on clients for data access and impact analysis. This is why typically only large healthcare organizations (hospitals, payers) can offer VBC arrangements: they have the resources to manage control groups, untangle attribution, and absorb financial risk. It’s also why VBC tends to reinforce healthcare’s existing cash flow and power dynamics (but that’s for another post).
  • Impact
    VBC and taking in risk is no guarantee of results. There are plenty of VBC programs that fail. Many organizations have a low tolerance for vendor failure and negotiate VBC arrangements by leaning into managing the downside risk, while vendors and providers negotiate to raise their upside value capture. The negotiations pull in opposite directions.

“There is No VBC, There’s Only the Hybrid Model”

None of this is insurmountable. But it is difficult. So instead of pure VBC, many settle on hybrid models that offer value-based upside payment bonuses while managing risk through FFS-like payment floors.

The two most common hybrid models are:

  • Capitated Risk
    You get paid to manage a population. This is often simplified to "PMPM" or "per member per month". This accounts for a population where the risk is unevenly distributed. You’re compensated for the healthy and the sick alike—extending the insurance pooling model to subcontractors. The flipside is that if a patient population has low utilization or engagement in a given year, you'll get pushback to lower costs or shrink the population. Because no one wants to pay for unused services, ya know?
  • Bundled Payments
    Another model is called "bundled payments" which can be better described as “FFS+”. Bundled payments create a single custom payment for a package of services (often tied to a multi-day episode of care), and flattens what would otherwise be multiple CPT codes and procedures into a single cost "package". This flattens the risk of FFS justifications/denials at each step of care and reduces administrative burden that would be associated with piecemeal billing. (This model gained a lot of popularity when it was subsidized by government programs for a few years).

Most hybrid models will also include some bonuses for hitting certain outcome benchmarks (some more of that upside without giving away the cow). Additionally, risk of poor outcomes outside of a provider's control for any single patient is offset by the average net positive results for a population.

These models all wrestle with the Principal-Agent Problem: how do you align the incentives and coordinate the activity of two independent actors, when one is supposed to act on behalf of the other?

The business answer is to balance stability and risk-sharing. These hybrid models exists because they help hedge financial risk, create financial predictability, and ease operational challenges while nudging incentives toward better outcomes. Hybrid models protect both the vendor and the payer (insurance or hospital). Base fees smooth cash flow, while value components incentivize performance.

And crucially, they provide a model for what should look for in other industries where results matter more than the activity itself.

The Curious Absence of Risk in Marketing

Let’s revisit the standard marketing agency pitch with that in mind: flat fees (and often growing higher the more you spend), the promise of outcomes, and no downside risk for not delivering on that promise. And every pitch offers the same set of services: media generation, implementation, data analysis.

Compared to healthcare, why is there so little appetite for risk in marketing?

  • Is it because attribution of results is hard?
    They have access to all of my data and it’s way simpler and less regulated than healthcare data.
  • Or because they lack the data to measure their true impact?
    Again, they have all the same data I do.
  • Or because they’re not confident in their ability to deliver adequate results?
    Valid for agencies that don’t handle implementation and aren’t given full implementation control. But for Agencies of Record that own the full delivery cycle, it's all in your hands!
  • Or because potential clients are too small to capture meaningful revenue from risk-based arrangements?
    Fair concern, but an engine for turning $1 into $2 is limited only by CPG production and fulfillment limits. Prove the model and growth is unlimited.
  • Or is it because there is risk that a client won't actually get results?
    Risk-based models work by aggregating the risk of any one campaign/client failing across the average success of all clients. The risk of failure should be priced in to the cost of services and the knowledge of how clients perform in the aggregate.

If anything, performance agencies have more data and ability to tie payment to outcomes than healthcare organizations, but they routinely don't.

Is There a Relationship Between the Lack of Risk and Commodification?

I'm pressed to believe that for standard campaigns (I don't think I'm special here), performance marketing has become commoditized.

In a commodified field, there is no secret knowledge; all that matters is execution quality. It also means that vendors are largely interchangeable with downwards price pressure – you can easily pit vendors against each other on easy-to-measure and easy-to-compare axes.

And to be sure that I don’t hand-wave all marketing agencies away, there can be some secret sauce, if operating at a large enough scale:

  • Aggregation of learnings across clients and practice (operational efficiency).
  • Scale efficiencies if you're buying advertising from traditional sources (buy at scale and resell inventory across your client base).
  • Relationships with third parties that can be leverages (especially important for influencer and media relations).
  • Population Taste. Extremely large agencies add value in the form of practical population insights, creative testing frameworks, and culture-setting.

Without risk, agencies are encouraged to run the same best-practices playbooks that everyone else is running. After all, it’s a “best practice” for a reason. And that reason means that there is little appetite for trying to do something different.

Doing something different requires you to sell a more complicated pitch, and to balance the cost of potential failure against the greater upside of potential success. Risk-based arrangements like VBC give organizations greater operational flexibility to experiment by decoupling the work from the outcomes. That’s why healthcare is seeing a surge of innovative care models…. And marketing isn’t seeing a surge of innovative anything.

Risk-sharing: an antidote to commodification, a force to drive experimentation.

Back to the Drawing Board: A Risk-Based Model for Performance Marketing

So, I flipped the pitch.

Instead of hiring on a fee basis, I’ve been going back to any vendor that’s pitching me and offer two models:

  1. The Unlimited Upside Model
    Get paid a meaningful percentage of net new attributed revenue. No base fee. No cap on upside.
  2. Capped Downside Model
    Modest base fee. Smaller upside share. Measurable results within 3 months or contract ends.

This structure aligns incentives perfectly:

  • The agency gets paid more if we collect more revenue
  • We can earmark that cost to an acceptable bound within our profit margin.
  • The agency is incentivized to get results, not to do 'stuff'.
  • Growth is built in, because the incentives drive everyone to build a machine for turning $1 into $2.

It’s simple. It’s fair. It puts outcomes at the center.

(And if you’re an agency that’s interested on taking on a risk-based contract, let me know!)

Skin in the Game

The other nice thing about a commodified field with best practices is that those practices are learnable. And in marketing, there is no gatekeeping.

For a small consumer product good brand like Dragon Blood Balm, growth is too important to outsource. If execution is the differentiator, I can do that myself (with benefit of becoming a more informed client in the future). And if no one wants to bet on results, I’ll bet on myself.

Final Takeaways

If you’re hiring a marketing agency, ask them to share risk and to tie payments to outcomes. If they won’t, dig into that to understand why, and use the gaps you identify to drive process improvement. Interrogate that. Get curious. Address it. Lean into understanding what they're worried about, because their blockers/concerns about success are ultimately going to become your problems, one way or another. Give your vendors (or team!) the flexibility, guardrails, and ownership to try different things. And if you can't find a good partner, learn the game. Buy your own ads, run your own processes.

Because the best growth (and innovation) engine is one you control.

Execution beats promise. Risk and accountability creates opportunity for success and innovation. Healthcare learned that the hard way. Marketing is next.

]]>
<![CDATA[Goodbye Books, Post-Literacy is Here to Stay]]>https://sharedphysics.com/the-dawn-of-post-literacy/67dadbcbacfd7e3c658e256dThu, 20 Mar 2025 20:18:42 GMTIn the early 2020s, FTX founder (and eventually convicted criminal) Sam Bankman-Fried remarked, "I'm very skeptical of books. I don't want to say no book is ever worth reading, but I actually do believe something pretty close to that. I think, if you wrote a book, you fucked up, and it should have been a six-paragraph blog post."

A lot of folks in the "I write and read things" space reacted with incredulity and mocking. I did too. But Bankman-Fried's sentiment is worth taking seriously. It's more common than folks who have a stake in reading and writing might want it to be.

Recalling Bankman-Fried's comments was a coda on a series of "noticings" in 2024, starting with a silly but thought provoking article about Star Wars by Ryan Britt: "Most Citizens of the Star Wars Galaxy Are Probably Illiterate".

Here's his take:

[F]inding a science fiction or fantasy universe richly populated with its own indigenous art—and more specifically, its own literature—is rare. As Lev Grossman has pointed out, “No one reads books in Narnia.” Harry Potter himself doesn’t really have a favorite novelist, and most of the stuff Tolkien’s Gandalf reads comes in the form of scrolls and prophecies…not exactly pleasure reading. Fantasy heroes don’t seem to read for pleasure very often, but usually you get the impression that they can read.

[But if] you simply stick to the Star Wars films, there is no news media of any kind. Despite the fact that we see cameras circling around Queen/Senator Amidala in the Senate, they don’t seem to be actually feeding this information anywhere. Are they security cameras, like the ones that recorded Anakin killing little tiny Jedi kiddies? This theory achieves a little more weight when you consider that the conversation in The Phantom Menace Senate scene is all about how Queen Amidala can’t verify the existence of a coming invasion. She’s got no pictures, and stranger still, no reputable news source has even written about the blockade of Naboo. Even if we put forth that cameras in Star Wars are only for security and not for news, that still leaves the question of why there are no journalists. A possible answer: it’s because most people don’t read, which means that over time most people in this universe don’t ever learn to read.

Britt continues to list off example after example , illustrating a culture where there are almost no books, where buttons and interfaces are all pictograms, where almost all communication and record keeping is through audio and video (hologram), and where society has long fallen into a "highly functional illiteracy... Surely, for these cultures to progress and become spacefaring entities, they needed written language at some point. But now, the necessity to actually learn reading and writing is fading away. Those who know how to build and repair droids and computers probably have better jobs than those who can’t. This is why there seems to be so much poverty in Star Wars: widespread ignorance."

Here's the kicker:

Maybe the humans and aliens populating A Galaxy, Far, Far Away are totally boring people who simply used the written word for the purposes of getting their basic culture off the ground – for commerce only, rather than for reflection or pleasure.

The final nail in the coffin which proves widespread illiteracy is how fast stories of the Jedi mutate from a fact of everyday life into legend, seemingly overnight. This is because the average citizen of the galaxy in Star Wars receives his/her/its information orally, from stories told by spacers in bars, farmboys on arid planets, orphans in crime-ridden cities, etc. Without written documents, these stories easily become perverted and altered quickly. This is the same way Palpatine was able to take over in Revenge of the Sith. He simply said “the Jedi tried to kill me” and everyone was like, “okay.”

Britt wrote this in 2012 but it's too close to home in 2025. You can read it as a satirical article poking fun at a made-up universe, but I think there's a non-zero probability that it's describing a realistic scenario for our society as well. For example, in Debt: The First 5,000 Years, David Graeber makes a not dissimilar argument that commerce and exchange (and more specifically, the element of debt/indebtedness) drove a lot of development language and culture. This matches some of the archeological record as being dominated by technical texts and record keeping logs (though love poems and records of rulers also exist).

That's ancient history, but it's a good starting starting point. It gets us from pre-literacy to literate culture, and a reason (however contested) for that transition. Getting from literate to post-literate isn't as straightforward but we're seeing the start of that trend. For example, consider the thesis in Samo Burja's "The Youtube Revolution in Skill Acquisition":

Yego [a olympic javelin thrower]’s rise was enabled by YouTube. Yet since its founding, popular consensus has been that the video service is making people dumber. Indeed, modern video media may shorten attention spans and distract from longer-form means of communication, such as written articles or books. But critically overlooked is its unlocking a form of mass-scale tacit knowledge transmission which is historically unprecedented, facilitating the preservation and spread of knowledge that might otherwise have been lost.

Tacit knowledge is knowledge that can’t properly be transmitted via verbal or written instruction, like the ability to create great art or assess a startup. This tacit knowledge is a form of intellectual dark matter, pervading society in a million ways, some of them trivial, some of them vital. Examples include woodworking, metalworking, housekeeping, cooking, dancing, amateur public speaking, assembly line oversight, rapid problem-solving, and heart surgery.

Before video became available at scale, tacit knowledge had to be transmitted in person, so that the learner could closely observe the knowledge in action and learn in real time — skilled metalworking, for example, is impossible to teach from a textbook. Because of this intensely local nature, it presents a uniquely strong succession problem: if a master woodworker fails to transmit his tacit knowledge to the few apprentices in his shop, the knowledge is lost forever, even if he’s written books about it.

I love that framing: "tacit knowledge as an intellectual dark matter", underpinning most of our world but not documentable by text. Tying it back to illiteracy in Star Wars for a second, this describes a lot of that galaxy's operational knowledge: tacit knowledge in repair, manual work, trade, business, learned through apprenticeship or holograms. Tacit knowledge is so foundational to expertise development and to skill in business that CommonCog founder Cedric Chin wrote a whole series on tacit knowledge that has bled into business expertise, accelerated learning, case studies of Asian conglomerates, decision making, and more. Tacit knowledge is the truth behind "In theory, there's no difference between theory and practice. In practice, there is."

Burja posits that Youtube's rise for skills acquisition is facilitated by (1) cheap, quality digital cameras with mass adoption, (2) mass broadband internet access with uploading/transmitting video at scale, (3) search engines to help surface those videos, and (4) the ubiquity of portable screens.

He also makes the case that this is a net good thing: it has allowed more knowledge transfer, faster skills development, more advanced skills development. It has been a net enabler in knowledge transfer: more people, more learning, faster learning.

So that gives us the starting point (text-based literacy development led at least partially but critically in order to aid commerce and record keeping) and a potential jumping-into-the-future point (tacit knowledge, more pervasive and practical than written knowledge, is best distributed through non-text sources).

Now hold that thought for a second.

Aside from "video are being used for learning", there's been a whole lot of chatter over the last few years about how online information-seeking behaviors are changing. In 2022-23, it was all about how Youtube and TikTok are going to destroy Google's search monopoly. While people have been talking about a "pivot to video" for years (including one blovious self-styled Ceasar), I've personally seen amongst both tech-literate and non-tech-literate friends steadily shifting their information seeking to Youtube/Tiktok-first approaches... and how even with text searches as a starting point, the ultimate destination is often video content.

The narrative shifted in 2024-25: it became about AI-first search with OpenAI and Perplexity leading the conversations (the TikTok ban didn't hurt), but vibe stayed consistent: it was no longer about text-based research (in the way that "ten blue links" isn't radically different from a pulling "ten books on a table" in a research library), but about being given the answers... something that Google had been trying to do with generative summaries for years (and been lambasted for on their various implementations). Kind of like the heart of the Jobs To Be Done framework: you're looking to hang up a picture, not to use a tool to make a hole in the wall. So with information-seeking behavior: you're looking for an answer, not a set of things to comb through.

There's a lot of folks who argue that generative text summaries are a poor substitute for research and shouldn't be trusted, but I want to wave those concerns away as theoretical. It's like the 2000s all over again, when teacher and professors would rally against Wikipedia as a valid source. People will use AI answers without additional verification in the same way that people use Wikipedia without clicking through to the sources, unless the information is wildly and obviously incorrect. Even with advances in retrieval-augmented generation (RAG) that annotates text with source materials, the thesis holds – how often are you diving deep into those sources to verify that the text matches, when most questions have trivial stakes for accuracy? (Trivial being contextualized against, say, medical decision-making.)

Be honest: how often do you consider it, but only do the most superficial spot checking?

Be honest: how often is that spot checking just a glance to make sure the source URL looks reputable enough?

It doesn't even matter that LLMs are bad at summarization, their most common use case. It's doesn't matter that what they are doing is truncating and compressing information instead of summarization (which requires some understanding of implicit messaging and causal relationships in text), if what the user is looking for is a compressed answer to a query that meets the "good enough" bar.

So we have two trends in learning: answer-first interfaces, and video for skills acquisition and information transfer. Couple that with voice-first interfaces (Alexa, Google, Siri, and all the LLMs doing audio work), and you've got the recipe for post-literacy in the sense of a strong decreasing reliance on text (Terminal to GUI, anyone?) in favor of graphical and audio interfaces.

That gets us to "post-literacy" as a non-hypothetical evolutionary endpoint if trends persist. So the next logical question is, "Are those trends actually persisting (or at least prevalent enough)?" Given that this is a blog post and not a formal paper for publication, I'll stake a "yes" in the ground. It's my opinion and particularly my vibes-based opinion (the word of late 2024, eh?).

Here are the receipts for my "yes":

  • "Less Writing" in Software Development
    We're seeing the growth of legitimate AI-first coding (not just fancy auto-complete). As in, "tell an AI to build a program... and it will". Maybe we're not on full-on "rebuild SalesForce" yet, but we're getting there. For example: "How WikiTok Was Created":

    > Gemal started his project at 12:30 am, and with help from AI coding tools like Anthropic’s Claude and Cursor, he finished a prototype by 2 am and posted the results on X. Someone later announced WikiTok on Y Combinator’s Hacker News, where it topped the site’s list of daily news items. “The entire thing is only several hundred lines of code, and Claude wrote the vast majority of it,” Gemal told Ars. “AI helped me ship really really fast and just capitalize on the initial viral tweet asking for Wikipedia with scrolling.”

    That's 90 minutes from idea to full working MVP. Forget Notion and Airtable prototypes. My own experience matches this: I made a robust google chrome widget for extracting information from websites in ... 20 minutes. 60 minutes all together for troubleshooting, testing, and further customization.
  • Less Writing, Period
    Non-tech fields are impacted by this as well as we see mass layoffs for content creators, starting primarily with text-based creators (sorry to all my copywriting friends out of jobs posting in LinkedIn).
  • The Covid Dip
    Covid and school lockdowns led the way for accelerated transitions to video-based learning, which in turn have driven dips in literacy and reading skills. You can do a quick search on this and pull up 20 different articles highlighting it, but here's just one from the New York Times: "America's Children's Reading Skills Reach New Lows" (2025).
  • The Reading Dip More Broadly
    And it's not just children: just a few months ago, the Atlantic published a viral and controversial article about "The Elite College Students Who Can't Read Books":

    > Lit Hum often requires students to read a book, sometimes a very long and dense one, in just a week or two. But the student told Dames that, at her public high school, she had never been required to read an entire book. She had been assigned excerpts, poetry, and news articles, but not a single book cover to cover.

    This couples nicely with the US Bureau of Labor Statistics' survey on how adults spend their leasure time (20-30 minutes per day on reading, which holds true for gender, age, and education level compared to watching TV or partaking in phsical activity; the two outliers being people over 70 years old and people with advanced degrees) and with "Reading for pleasure is going down" from the Pew Research Center... in 2023:
Data from US Bureau of Labor Statistics, 2023
Pew Research, 2023
  • Alternatives to Reading For Pleasure
    In terms of reading for pleasure, adults are spending 2-10x the amount of time on alternatives such as videogames and watching television. TV has had a lot written about it – I'm old we're in a peak golden age of TV drama – but I've also noticed a trend of "visual novels" and gaming as quality alternatives to the role fiction, epics, and mythologies may have filled in the past. I've come across an entire genre of narrative games / visual novels that are better than most novels/pleasure reading I've done in a decade. This has included Citizen Sleeper (1 & 2), Disco Elysium, and 13 Sentinels: Aegis Rim as the more visual-novel end of the spectrum. It also includes God of War (and sequel), Death Stranding, The Last of Us (Part 1, Part 2), Horizon Zero Dawn (and sequel), and Cyberpunk 2077 (go ahead and hate on it) as the more game-but-with-a-beefy-story end of the spectrum. I've been happily engrossed in many of these for 40-60 hours at a time, longer than I spend on books. (With some irony, Star Wars Outlaws comes to mind as a game with an embarrassingly cheap and lazy story, for all the richness of that world and its impetus for this whole post).
  • Banning the Books!
    Meanwhile, we're also seeing an increase in book bannings across the country. The destruction of books and knowledge isn't anything new (I wave Burning the Books by Richard Ovenden and A Universal History of the Destruction of Books by Fernando Baez as guideposts to that), but it is troubling when viewed alongside the decreasing literacy trends.
  • Broad Attacks on Humanities and Post-Primary Education
    And need I remind any reader on the prolonged (decade-plus) attack on teaching arts and humanities in both K-12 and undergraduate levels, in favor of STEM or vocational subjects? And related, the current administration’s efforts to shut down the Department of Education, which will likely affect the support that exists for undergraduate and graduate education via student loan distributions?
  • Do Books Even Work?
    Lastly, "Why Books Don't Work" makes a compelling case that the core concept of learning from books – that "people absorb knowledge by reading sentences" is also likely an inefficient axiom, especially in light of Burja's aforementioned article on skills acquisition. It's worth a full (and nuanced) read, because it makes the case that while books are an amazing store of information, they're very inefficient for learning, and lack replicability. Books as education work for some people, some of the time – which research backs up with studies on differences in people's learning styles.

Does this mean that we're all about to stop reading and switch to holograms and become an illiterate society? I don't think so. It's worth emphasizing a difference between illiterate and post-literate, though with a long enough time horizon they may end up in the same place. Text has been reinvented multiple times through history, and will probably continue to to be naturally reinvented as a low-tech communication and documentation fallback.

What’s different in this go-around is that there are environmental differences that trend a person who doesn't read from the illiterate (i.e., "unable to read or write/having little or no formal education/marked as inferior") to the post-literate (i.e., "educated and informed but lacking traditionally-measured text-based literacy skills"). Driving that trend is Burja's identification of four factors for why Youtube is creating a revolution in learning tacit knowledge. So I don't want this to be read as "we're getting dumber" even if literacy and intelligence have historically gone hand in hand.

I'd argue the opposite – the speed of development of skills and expertise and the range of basic knowledge available (and built upon) is tremendously faster/larger than anything that has come before us. I don't believe literacy-based intelligence assessments such as reading comprehension tests capture this well, and may in fact show a general decline, but that's because those tests aren't keeping up as an effective way of evaluating a post-literate society.

In that sense, lamenting a decrease in reading comprehension at the dawn of post-literacy is like lamenting the decline of oral culture at the dawn of mass literacy. Or, it's like professors and teachers complaining about wikipedia (or rather, the "Internet") being an unreliable source in the early 2000s and preventing it from being cited. There's value in being able to write and read and research on your own without computer aid, just like there's value in handwriting and being able to compose your own essay on a topic. But we shouldn't conflate what's valuable about those activities with doing those activities for their own sake. Being able to compose (or understand) a well-reasoned and well-argued position on a topic is not the same as measuring your ability to write a 5-paragraph structured essay, even if the ability to write a 5-paragraph essay may have been a good proxy for evaluating your reasoning and arguing abilities. I'd even go so far as to argue that the ability to transform and verify AI-generated raw content for an applied purpose is probably the "next step" comprehension assessment we should be moving towards; it requires the same critical thinking skills, even if you use more AI to accomplish it.

So my general argument – besides that we seem to be trending in a post-literate direction – is that the primacy of text-based literacy may be a historical aberration, given a long enough timeline of technology development. As in, the literacy that has underpinned most of our recorded knowledge and subsequently the grand measure of culture today might be just a stepping stone to a better version of what came before it. And that's... weird. At least for me.

So, it's an observation. It may also be in itself a compelling narrative with cherry-picked facts, rationalizing a neat opinion. But I’m taking it quite seriously, even if it isn't immediately useful (or possibly even correct - the factfulness of an idea does not always correlate with its usefulness).

That said, I'm still figuring out what to do with it as an observation.

One approach is in the vein of "is blogging/writing still useful in the age of AI/Video"? I think so. There's still business utility in it – creating a documented persona, building a following, facilitating asynchronous communications with low production costs, etc. It may not be the most effective way to do things, but it's still an avenue in the way that releasing music on vinyl is still an avenue.

Another answer is that yes, writing is useful for organizing and clarifying thoughts. I write to figure out what I think. Sometimes it's more typing than "writing", because the process of putting things down is the process of thinking. That's what results in "it's easier to write something long than to write something short," to paraphrase Mark Twain. Or as Joan Didion quipped, "I write entirely to find out what I'm thinking, what I'm looking at, what I see, and what it means."

But if I was a speculative venture kind of guy, here are the threads I'd pull at and extrapolations I'd take a serious look at:

  • Training and Education, Vocational
    Peering down the MasterClass/Khan Academy rabbit hole, but for vocational training and skills. I.e., "How TikTok is changing the Blue Collar Trades" and "Blue Collar Workers are the New Social Media Stars". I'd also take a second look at what private equity is doing in rolling up blue-collar businesses (something Matt Stoller had written a lot about in 2023-24) with an eye towards vocational education and staffing pipelines. Yes, Community's Greendale Air Conditioning Repair School comes to mind.
  • Training and Education, Executive
    Peering down the Lia Dibello / ACSI Labs, FutureView rabbit hole to figure out how recreating physical environments and business loops virtually can help with building business expertise and experimentation by creating iterative, replicable, and fast-feedback loops for leadership and decision-making practice. I'd love to see some commodification to more broadly apply some of the bespoke work they're doing.
  • Hardware/Software, Commercialization and Commodification: I think we're going to see a lot more embedded AI models in hardware in the next 5 years. This will be driven by faster cycle time to build and compress locally-run models (Llama, DeepSeek), as well as increases in chip capacity and decreases in their cost and size. Apple Silicon is leading the way on this but I expect there to be Nvidia-quality Raspberry Pis running local models without breaking a sweat at some point. If not, someone should start get on that! This trend will also be driven the the commodification of applied AI modules (i.e., machine vision, video, and audio processing). Humane and Rabbit R1 were ahead of their time (and an aberration on form), but we'll likely see more things like that, or actually-smart appliances (i.e., a fridge that can adjust cooling based on sensors and manage a real-time log of what's inside at any time?).
  • Hardware/Software, Robotics: One of the biggest gaps in AI is their inability to interact with the world ("agentification" notwithstanding). The gap is that there's only so much that can be learned from text without engaging with the world. (This thesis makes a lot of sense considering everything we just talked about re: tacit knowledge as intellectual dark matter.) As an illustration, one recent research paper asks "How Large are Lions?" and posits that it takes an LLM/NLP system a massive volume of information to create probabilistic estimates of the size of lions relative to other animals (i.e., wolves), but anyone with vision can just look at a lion relative to the environment and immediately understand their size (a tiny dataset). So having AI process and react to real-world input is extremely interesting, and we're already seeing where this is going with Google's Gemini Robotics division. In the spirit of this post starting with Star Wars, I think recreating a fully functional C3P0 or R2D2 today is not just possible, but probable and likely even unimaginative.
  • On Demand Media: This is the dumbest forecast, but I think we've only scratched the surface of what's possible with generative media (art, video, audio, text). If I were at Netflix (for their user data) or a broadcast station (with a massive media archive), I'd currently be over-investing in AI models to try to generate on-demand shows, episodes, news, and more. I'd be pitching the board on a vision of "Imagine sitting down on your couch and asking your TV, 'play me a scifi romcom in the style of nora ephron but with aliens and john-wick style fights, but it needs to be an hour and a half because I need to go to sleep early so skip the exposition and make the dialogue very clear' ... and getting exactly that." I've already tried this with folk tales and bedtime stories, and I've seen versions of this used by creative directors for exploratory work. I'm bullish on this because unlike the requirements for veracity and verifiability in non-fiction, generating fiction only requires quality and internal consistency.
  • The Implications of Post-Literacy on LLM AI Models
    ... particularly in what happens with model collapse as AI slop floods the training data and people move to post-literacy. I'd spend a lot of cycles figuring out how to do for audio and video content what was already done for written content.

So those are some initial thoughts on how to carpe the post-literacy diem, but the internet is far and wide and smart and I'm sure someone will message me with better ideas as soon at this goes out into the world.

But in the meantime, luddite that I am, I'm going back to reading a book for pleasure.

]]>
<![CDATA[A Year In Reading, 2024]]>In ten years of public reading and writing, last year was an outlier.

I wrote almost nothing public last year, and read the least since I starting tracking it in 2017. I'm chalking this up to work, fatherhood, and training.

  • Work: I wrote a lot this year and
]]>
https://sharedphysics.com/a-year-in-reading-2024/679e752aacfd7e3c658e2239Wed, 19 Mar 2025 16:41:01 GMTIn ten years of public reading and writing, last year was an outlier.

I wrote almost nothing public last year, and read the least since I starting tracking it in 2017. I'm chalking this up to work, fatherhood, and training.

  • Work: I wrote a lot this year and all of it was private. Most of it was healthcare research (just ask me about what drives hospital readmissions or how to model efficient routing!). Almost none of it is publishable because it is hyper-specific to our company's operations and lifecycle. As much as I love generalizable takeaways, most of my work these last two years have leaned into bypassing best practices and industry generalizations to become laser-focused on the specific problems we have and avoid the problems we don't have. (That might be a blog post in itself, but I'm not sure what the useful bits might be). It's made for some very boring PR pitches this year – "We don't solve industry problems, we solve our own problems really well to get differentiated outcomes" – but it's also made for some honestly exciting internal conversations. I did however put together a few internal Q&A presentations, one of which I'll eventually publish here.
  • Fatherhood: My little girl is awesome! But also, let me rattle off some cliches that I have lived the last year and a half: kids grow up fast. The days are slow but the weeks are fast. If you don't make the effort to be there for them now, they won't need you later. Kids are a lot of work, especially in the beginning, and then even so more later. You don't regret the time you spend with your kids. So I've spent a lot of time in dad mode.
  • Training: I've been training for a marathon on and off for almost five years now. First I was derailed by covid in New York. Then I injured my knee running in the mountains in Colorado. Then I moved to Arizona and was knocked on my butt by an 8-month insane heat wave. Then I became a dad. Each year, I got to comfortable half marathon distances and improved my time. I was on track for under 2 hours in Arizona but then got caught in a monsoon midway through the race. This year I made training out of stroller runs and managed to squeeze out a PR with a 2:05 clock time in the Boulderthon in Colorado. Will this be the year? Probably not, but I'll keep trying.
Left to right: (1) Training in the desert is no joke. It gets hot with no shade and so I regularly ran with at least 2 liters of water... is that light rucking? (2) Flash monsoon during the 2023 Gilbert Half Marathon, including some brief sleet. Arizona streets aren't designed for rain. (3) At the start of the 2024 Boulderthon. (4) At the finish of the 2024 Boulderthon Half Marathon.

The Best Things I Read This Year

  • The Power Law: Venture Capital and the Making of the New Future by Sebastian Mallaby (2022)
    I picked up this book and Scott Kupor's "Secrets of Sand Hill Road" around the same time. It was slow to start but once I got past the second chapter, I couldn't stop. I read Kupor's right afterwards and couldn't get past 30 pages – it felt skimpy in comparison. I knew very little about the business, economics, and history of venture investing and like any good history, it was filled with vibrant characters and enough context to make sense of why VC looks the way it does today. Worth a read for anyone who works at a startup or thinking of starting a business (even if it isn't VC-funded).
  • Blockchain Chicken Farm and Other Stories of Tech in China's Countryside by Xiaowei Wang (2020)
    I'm a giant near-future sci-fi geek and Wang's "Blockchain Chicken Farm" hit all the Gibsonian notes of "the future is already here, it's just not evenly distributed." Even better, it is nonfiction and grounded in the reporting of individual's stories and not in vague and hand-wavey abstractions. It was also a refreshing perspective shift on what's possible outside of a US-centric worldview, challenging how technology can be applied to problems, emphasizing that "things can be different" without judging those differences. It was simultaneously hopeful and inspiring and mournful. Still futuristic vibes even though it was published five years ago.
  • Things Become Other Things by Craig Mod (2024, Special Projects Edition)
    Craig Mod is an amazing observer, documenter, reflector, and craftsman – to call him simply a writer feels like an injustice. These qualities all come to the forefront when he makes a book... and I do mean "makes". I've been loving the craft and thoughtfulness behind his carefully self-published books since I came across Koya Bound on Kickstarter. Each book leaps forward in design, binding, paper selection, and printing. He calls them artist editions, but they're just really quality, care-made books. But a nice book isn't worth much without great writing or photography to fill the pages: Mod's elegiac story of his childhood juxtaposed against the elegiac walk through forgotten-by-time towns off a pilgrimage trail in Japan is incredibly readable and oft-relatable, with each passage and photo given the same attention as the book itself. Excited to read his revised and expanded mass market version when it comes out.
  • Hilda: The Night of the Trolls by Luke Pearson (2023)
    The Hilda comics have been on my recommendations long list for a while now. I picked this collection up on sale and wish I hadn't waited so long! The paneling takes excellent advantage of the oversized pages, and the story has a lot of heart. This one in particular pulled all the parent/kid strings so maybe it was a right time/right place for me, but I'm now looking forward to reading rest of the Hilda comics.

Runners Up

I read a lot of memoirs this year: Crying in H Mart by Michelle Zauner and Stay True by Hua Hsu both made me tear up in a good way (but also in a sad way, consider yourself warned).

The Maniac by Benjamin Labatut was really, really good (thought not as great at his When We Cease to Understand the World) and probably would have been a top pick for me if I hadn't already read Turing's Cathedral by George Dyson years ago and been totally blown away by it (one of my cornerstone books and seemingly a heavy source for Labatut) or watched the AlphaGo documentary that serves as the backbone for The Maniac's extended epilogue.

I read Glossy by Marisa Meltzer about Glossier and Going Infinite by Michael Lewis about SBF/FTX back to back and they felt like similar books going in opposite directions. (I did think that Michael Lewis was a bit unfairly dragged through the mud for being too close to SBF... I think his accounting at the end of the day was much more level than many reviews would have you believe, and I read the book after the trials concluded).

Leap Before You Look by Michael Wolff was a really useful monogram, and I don't say that too often. The quality of the book was meh (many case/brand images were old and so the visual records looked like they were dug up from old scans or photos) but Wolff's thoughts and observations on branding were really useful. I copied down and shared out a lot of Wolff's quips, which qualifies it as one of the better books on design and branding I've come across recently. I knew of Michael Wolff only as half of Wolff Olins, the agency that did the branding for the 2020 London Olympics), so a lot of the work here was a big surprise to me.

I also really enjoyed The Dragon's Banker by Scott Warren. It's a "operational" novel about banking, financing things, and investing, told through a thriller-like/mystery-like story about running a merchant bank in a fantasy world with a mythical dragon as your secret client looking to move generational wealth. I joked to friends that I'm trying to expand my horizons but aside from the fantasy setting, I think this is very much in the speed of The Phoenix Project (about IT) or The Rebel Allocator (about capital allocation and operations).

Like every year, I left a lot of books half-read (or didn't have a chance to finish them yet). Michael Maubussain's The Success Equation is my perennially unfinished book which is one of the biggest disappointments because it's a great book and I constantly recommend it. Dan Davies' The Unaccountability Machine was something I started, absolutely loved and recommended the heck out of, but put it down one day midway through and then life happened. I'm hoping to finish it next year. I read a chapter of Kent Beck's Tidy First each month and the chapters are short but good, but there are more chapters than months so it's still unfinished.

Do Last Year's Books Hold Up?

Yes. How Big Things Get Done by Bent Flyvberj and Mismatch by Katie Holmes have been absorbed into my general vocabulary of work. I'd recommend them again (and I do). I also continue to stand by my 2022 books, which was an exceptionally good year of reading.

My 24/25 Firehose

I've pruned a lot of content off my firehouse of information, and I still generally stick with newsletters, podcasts, rss feeds, and individuals as my preferred sense-making go-tos. I try to balance things that are immediately practical (tech) with things that are interesting but not relevant (fashion, manufacturing). I also try to include some folks that I disagree with to stay honest and aware of different perspectives. If someone writes on X or other social media, I'm intentionally blind to that.

So here's what in the hydrant:

Here's what fell off since my last update:

One of the things I started tracking this year was if I read books as print or digital copies.

I have a Kindle, but I can count the number of times I used it on a single hand. (I had a Barnes and Noble Nook a decade ago and I used that a lot, but mostly because I routinely bet on the wrong horse. I also had a Creative Zen mp3 player and later a Zune instead of an iPod, had a Sega and not a Nintendo console, liked Digimon more than Pokemon, preferred Hipstamatic over Instagram, sold all my bitcoin in 2013, and have routinely rooted for the band/show/team that doesn’t become a breakout hit).

I like physical books for all the common reasons: they're a good disconnect, no distractions, easy to write in and annotate (if a book isn't marked up or dog-eared, I probably didn't like it). The tangibility of the books helps me with recollection (I can recall physically where a passage was, relative to the pages).

I like digital books because they're more portable and available in every which way. Again, it's all the obvious reasons: carry a library in your pocket, save and share annotations/highlights. They're dangerously easy to buy and start reading when inspiration hits.

(And I can't do audio books because I immediately tune them out and turn them into background noise).

Anyway, I read a lot of articles, websites, and RSS feeds, but turns out I read very few actual books digitally. I've made an improvement on that this year because I've had to travel a lot (and packing with a little kid in tow doesn't leave much room for more than a slim paperback). I've noticed that most of the books I read digital are either page turners or technical books. For example, I read a lot of science fiction digitally (most scifi novels I've read have been digital?), and I also read long, dense, and technical books digitally as well (Kill it With Fire, The Secret Life of Programs, and Accelerate come to mind). I also read business biographies (Super Pumped, No Filter, Glossy, Bad Blood, The Nvidea Way, etc) almost exclusively digitally.

All of these books tend to be "scannable" – easy to flip through, highlight important concepts, and more or less speed-read. They tend to support instant gratification (little effort to buy and read) or are extremely useful for highlighting and sharing those highlights out to a colleague (technical books).

The books that I tend to read on paper tend to be slower, more intentional books. The sort that I might start and stop, pause to think about (and not be distracted by an app or alert). Memoirs, more literary works, philosophical texts, history books, things I know I'll want to reread (or at least revisit), triggering memories from a bookshelf. Art books and artists' books (ala Craig Mod's Special Projects editions or unique print runs, zines, limited editions) make that cut too.

To tie all this together, I've been working on "noticing" as the first step to changing. In this case, noticing that I read certain kinds of books digitally, and certain kinds in print. Noticing this has been unintentionally useful – I'm not getting off the fence towards ebooks or print, but it has helped me notice that I sometimes buy books in mismatched mediums, and never get around to reading them. As a result, I changed by book acquisition habits for 2025 and I've already ended up reading more as a result.

Children's Books

I read a lot of children's books this year but I did't document them. I guess any book that I can read in under 5 minutes, or where circumstances are "me reading it to someone else" don't make the cut in my accounting of a year in reading?

It's a shame because I've come across some wildly hilarious and inventive children's books, much more than I have in adult books. These books are visually innovative, push narrative into interesting places, and are really just plain enjoyable. I've also noticed a lot of darkness in children's book's humor... or at least the ones I've read.

Here's some that come to mind:

  • The Day the Crayons Quit by Drew Daywalt
  • The Gruffalo by Julia Donaldson
  • I Want My Hat Back by Jon Klassen
  • A Color of His Own by Leo Lionni
  • Everyone Poops by Taro Gomi
  • Triangle by Mac Barnett
  • The Lorax by Dr. Suess
  • They All Saw a Cat by Brendan Wenzel
  • I Thought I Saw An Alligator by Julia Nichols
  • Astrophysics for Babies by Chris Ferrie
  • Acorn Was a Little Wild by Jen Arena
  • What You Do Matters by Kobi Yamada
  • Knight Owl by Christopher Denise
  • Namaste is a Greeting by Suma Subramaniam
  • Dragons Love Tacos by Adam Rubin

I also read a lot of bad children's books. Books that were boring, too complicated, too on-the-nose, too pedantic, or books that were confusing (for example, why do you have "Barn Owl" as "B" in your alphabet book? That's way too specific. It's "O" for "Owl"). Every bad book inspires me to sit down and write better books – books that have humor that works on multiple levels (for adults and for kids), books that are fun to look at and easy to read, books that make more sense, books that are endlessly re-readable with new things to notice in the art each time (because you'll read some of your kid's favorite books at least 200 times). Maybe that will be a side project this year.


And Now, the Complete List

(In chronological order)

  • Stages of Rot by Linnea Sterte (2023)(Print)
  • Gratnin by Ronald Wimberly (2023)(P)
  • My Body by Emily Ratajkowski (2021)(P)
  • Cribsheet by Emily Oster (2020)(P)
  • Stay True by Hua Hsu (2022)(P)
  • The Maniac by Benjamin Labatut (2023)(P)
  • The Power Law: Venture Capital and the Making of a New Future by Sebastian Mallaby (2022)(Digital)
  • Surf Shacks, Volume 2 by Matt Titone (2020)(P)
  • Harsh Prospect by Will Tempest (2024)(P)
  • Same As Ever: A Guide to What Never Changes by Morgan Housel (2023)(P)
  • Burn Book: A Tech Love Story by Kara Swisher (2024)(P)
  • Glossy: Ambition, Beauty, and the Inside Story of Emily Weiss's Glossier by Maria Meltzer (2023)(D)
  • Going Infinite: The Rise and Fall of a New Tycoon by Michael Lewis (2023)(D)
  • Things Become Other Things by Craig Mod (2023)(P)
  • Silos, Politics, and Turf Wars by Patrick Lencioni (2006)(D)
  • The Tusks of Extinction by Ray Naylor (2024)(P)
  • Leap Before You Look: The Heart and Soul of Branding by Michael Wolff (2024)(P)
  • Crying in H Mart by Michelle Zauner (2021)(P)
  • The Basic Laws of Human Stupidity by Carlo M Cipolla (2011)(P)
  • Blockchain Chicken Farm & Other Stories of Tech in China's Countryside by Xiowei Wang (2020)(P)
  • The Dragon's Banker by Scott Warren (2019)(D)
  • In Training by Stephen Voss (2016)(P)
  • Hilda: Night of the Trolls by Luke Pearson (2018-2021)(P)

By the Numbers

  • 23 books total
  • 4 graphic novels, 1 photo book (22%)
  • 3 novels (13%)
  • 5 memoirs (22%)
  • 4 books published in 2024 (17%)
  • 7 books published in 2023 (30%)
  • 1 book published before 2010
  • 5 digital books (22%), 18 print books (78%)
  • 8 books from the non-default perspective (35%)

... and that's a wrap.

]]>
<![CDATA[A Year in Reading, 2023]]>What a year it’s been!

Eight months since the year ended and I’m just now catching up on my reflections. I’m closer to ‘books read in 2024’ than I am to 2023. It took a while to write this because last year

]]>
https://sharedphysics.com/a-year-in-reading-2023/66b97d53acfd7e3c658e2027Tue, 13 Aug 2024 02:06:29 GMTWhat a year it’s been!

Eight months since the year ended and I’m just now catching up on my reflections. I’m closer to ‘books read in 2024’ than I am to 2023. It took a while to write this because last year was a pretty crazy:

  • I moved twice in the same year, and not because I wanted to. The first place we moved to had a lot of problems and we had to move again a month later, repacking everything up. That’s five moves in four years across three states and three time zones. Expert library packer here.
  • I started an intensive consulting opportunity midway through the year, which turned into an intensive new role as the VP of Product and Engineering at myLaurel. Exciting stuff, but it took up a lot of the time and capacity I had.
  • I wrapped up working on Recommended Systems (renamed to Field Report). I realized I didn’t have the time or capital to grow it as a business (as opposed to as a tool) and am instead going to carve out some time to open source the work.
  • Dragon Blood Balm had its best year ever: our sales doubled in the beginning of the year... and then doubled again in November and December. This meant a lot of effort was spent on keeping up customer service and fulfillment. I took over and spent some time trying to optimize production in the beginning of the year, but with the work at myLaurel had to hand production (and eventually fulfillment) back off to my partner in North Carolina. We also hired a person to help us out!
  • I started training for a marathon, and ran a half marathon. I want to write a blog post about the things I learned here, but the short of it is that training for a marathon is completely unlike recreational running in terms of time and effort you need to put in. I’ve never done competitive sports and this was a big learning curve for me on the running meta, the mechanics, and how to train beyond amateur land. (Spoiler: I didn’t end up running because of moving (again) and the next point).
  • And my wife and I welcomed a little kid into the world!

That’s a lot! Reading and my side projects took the biggest hits. I read the least books that I’ve read in any year since 2017, and worked on zero side projects. I wrote nearly zero last year. Life was very much restructured and I’m halfway into 2024 just catching up on things.


Top Picks

Last year’s (2022) top picks left a lasting impression -- “Creative Selection” by Ken Kocienda, “Quit” by Annie Duke, “The Outsiders” by William Thorndike, and “A Mountain Under the Sea” by Ray Naylor. I regularly found myself thinking about these books, recommending them to others, and referencing them. 2022 was a good year.

2023 wasn’t without merit either. My favorite picks were:

  • How Big Things Get Done: The Surprising Factors That Determine the Fate of Every Project, from Home Renovations to Space Exploration and Everything In Between by Bent Flyvbjerg & Dan Gardner (2023)
    “How Big Things Get Done” left me with an idea per chapter and I’ve been able to apply each idea my professional work. More than anything, it has allowed me and my team to derisk large projects and deliver to spec and to timeline. For a technical book (or the pop-sci equivalent of technical book by an author who has written plenty of technical tomes) it was a pleasant, breezy read.
  • Barbarian Days: A Surfing Life by William Finnegan (2015)
    Another wonderful book. It won a bunch of prizes when it came out but I ignored it until after I went on vacation to Hawaii.... and didn’t surf at all. But I still picked it up on the strength of those recommendations, because the trip felt incomplete -- as if I had left a piece of my imagination there, a “what if we stayed here” wishful thought brought about by a long and stressful year that left me burnt out. The books proved to be 100% vibes. I’m not sure how else to describe 500 pages of Finnegan finding and surfing waves across the world, first as a kid and later as an adult, especially when the waves themselves all blend together after a while. Reading it slowly over the course of a few months felt stepping into a daydream.
  • Mismatch: How Inclusion Shapes Design by Kat Holmes (2018)
    Another technical, professional book. Easy to read and digest, and was eye opening for me on how accessibility and inclusive design betters the world for everyone, while unexamined biases end up creating a more painful one, sometimes physically so. Not a month goes by when I don’t think about some of the lessons here. Since I finished the book, I’ve noticed more how the world is largely inaccessibly designed, an observation heightened by new parenthood.
  • The Skull by Jon Klassen (2023)
    A children’s book, which starts from the trappings of a traditional Brothers’ Grimm tale and ends... in a very unexpected place. Morbidly funny and surprising, I sought out (and enjoyed) Klassen’s other children’s books, including “Triangle”, and “Where is My Hat”, all of which had the same line of unexpected and dark humor. “The English Understand Wool” by Helen Dewitt was a close runner up in this mode and I particularly like publishing initiative from New Directions: a ‘storybook’ series of novellas intended to be read in a single sitting. Lovely!

Runners Up

I always enjoy Joan Didion’s writing, including “Where I was From” (recommended by a reader – thanks Justin Duke!). “Nike: Better Is Temporary” was one of those useful corporate monograms that left me inspired, though the title is what I come back to the most. “Craftland Japan” is a constant reminder of the heart of craft, though I worry that for me it simply perpetrates a fantasy of a life I never lived nor likely will ever live. “Masters of Doom” and “Blood, Sweat, and Chrome” were great as well, for reasons I go into later.


Craftsmanship and Making Things

I found myself gravitating to books (and newsletters, articles) on making and makers. “Craftland Japan”, “Masters of Doom”, “Make Something Wonderful”, “Born Standing Up”, “No Filter”, and “Blood, Sweat, and Chrome” all went deep into the creative process and the business around it. Other books - the Nike monograph, “How Big Things Get Done”, “Echo”, “There and Back” are reflections on or compilations of produced work and stories from the production process.

I don’t have to like the subject matter; I’m not a Steve Martin fan (his movies or his comedy) but I enjoyed reading about how he developed and evolved his craft. I have only two pairs of Nikes (and a hat) but I loved the dive through their archives even if I don’t want to buy any of it. I hated Doom and Quake as a kid and was reluctant to read about Id Software and the Johns, but I found the pieces on John Romero’s buildout of game engines some of the most interesting technical reading that year.

The industry doesn’t matter either: making shoes and apparel, traditional crafts, video games, movies, and internet software all share similar creative pains and tensions. The deep exploration of processes across all of them created moments of serendipity: drawing a line from building 3d graphics engines (“Masters of Doom”) to connect or contrast that to building the real-world stunts of Mad Max (“Blood, Sweat, and Chrome”), or the content moderation challenges of Instagram/Facebook (“No Filter”) back to the pop culture impact of video games. And all of them on the long and winding road to overnight success and/or the denial of it.


The Disappointments

I typically don’t finish books that are disappointing; I’ll leave them midway done and donate them during my next move. But some books are disappointing not because they’re bad, but because they’re just... not good? Or don’t live up to what they could be based on an author’s prior works? Teju Cole’s “Blind Spot” was disappointing in that sense. I love Teju Cole’s early fiction and his non-fiction writing, but “Blind Spot” – a collection of photographs paired with reflective paragraphs — was just not great and I had to push myself to finish it, hoping for a reward on each page and getting it only once or twice. The stylistic throughvein of W.G. Sebald was on display, but I think Craig Mod has done it much better.

Other books were disappointing because they had interesting concepts but hit cliche after painful cliche, or were plain sloppy. “Fistful of Pain” was a comic in that vein which – if not for its brevity – would have been more pleasant to use as kindling for a fire. Sorry guys. (As an aside – I’ve been doing most of my writing on iA Writer and have a mode turned on which highlights all my cliches and unnecessary words... I hate how often I reach for cliches and noticing is the first step of change.)

Other books were disappointments because they did not live up to the hype. I’ve heard so much about the Culture series of books, loved the concepts for each one and the high concept Iain Banks brings to the table, love the wiki summaries and long dissections online about the novels, but... I hate the books. They don’t pull me in. They’re adequate writing but a slog. I gave up on “Consider Phlebas” halfway and it took me three years and a series of very long flights to finish “Player of Games”. “Excession” is sitting in my to-read pile and again – I love the concept – but I pick it up and my eyes glaze over. I’d take a Charles Stross, Neal Stephenson, William Gibson book any day. Which speaking of – many people criticized Stephenson’s “Termination Shock” as tired, not as dense, bitter, and not as good as his other books... and while they may be right, I enjoyed it all the same.


Less Parables, More Specifics

The more I’ve read, the more I’ve spotted patterns across books, both fiction and nonfiction.

In fiction, there are certain tropes and hero-journeys a reader can come to expect. In non-fiction, there’s a format that gets more mileage than a grizzled trucker: the kind where the author starts with a personal anecdote to draw you in, ties that back to the theme of the book, brings in research to justify their anecdote and subsequent conclusion, and moves on to the next chapter. I’m okay with that and I’ve gotten pretty quick at glossing over the routine filler bits that pad what could have been an essay into a book.

Beyond that, I’ve started to notice (and find incredibly annoying) how certain case studies repeat themselves across books. An example that I saw frequently over the last few years was a California rail project between San Francisco and Los Angeles. I saw that case study used to illustrate:

  • Cognitive fallacies
  • Knowing when to quit/walk away
  • Poor project planning
  • Government waste
  • How political processes go
  • Infrastructural complexity
  • A certain Californian wishfulness
  • What should be invested in (one political reading)
  • What shouldn’t be invested in (different political reading)
  • ... and more.

It feels like the story of blind men groping an elephant and coming to different conclusions about what it is: a rope! A tree trunk! A boulder! A fan! A snake!

On one hand, the world is complex and a single event can be illustrative of many different things. In this reading, consider John Salvatier’s “Reality has a surprising amount of detail”.

On the other hand, it is lazy writing. There is so much world out there! And yet, authors are relying on the same few stories. This is not just the California rail project... think about any pop-business book you’ve read in the last ten years doesn’t make an obligatory (and often gawking) reference at Google, Apple, Facebook, Netflix, or their ilk in describing excellence and performative differentiation. I’m guilty of that as well because it’s easy. But it’s lazy, and after you read enough of the references, it’s boring.

These anecdotes are also wildly western-centric! I started to notice this as a byproduct of going out of my way to read non-white, non-American, non-male writers. What of the businesses in Asia, or South America, or Europe, or Africa? What of smaller, non-titanic businesses? Are there so few examples in the world that we can’t be bothered to identify them? Cedric Chen’s work across Commoncog is amazing in this regard: he writes about Koufu, of competing rice conglomerates, of the “Chinese Businessman Paradox”, and of building a point of sale system in Malaysia. These stories can illustrate unique and differentiated complexities about the world, and are more interesting to read because it’s not the same tired anecdote, repeated ad nauseam.

Those anecdotes eventually become secondhand and thirdhand stories, devoid of nuance, taking on almost mystical proportions. I mean, did you know that Mark Zuckerberg used to tell his teams to ‘move fast and break things?’ Did you know Netflix had an extremely critical performance evaluation culture that eventually even pushed out the creator of that culture? Did you know Google had 20% time that led to the creation of Gmail? Over time, these stories turn into parables.

Perhaps more gratingly, they are parables that no longer reflect the state of things. Facebook no longer moves fast and breaks things – it is slow and broken. Google no longer has 20% time, it has 20% layoffs and work done for exclusively for performance evaluations. Apple’s build culture is different today than what it was with Tony Fadell, Jon Ive, and Steve Jobs. Things change. I wish more books would take those stories and follow through to see what became of them. What happened to the turtle who won the race against the hare? Maybe the hare woke up to its mistake and never repeated it. Maybe the turtle never won another race. Is the lesson from that parable still valuable in such a future? Or is the lesson no longer “slow and steady wins the race”, but instead “exploit your competitors temporary weaknesses if you want to win uneven races”?

Anyway, I talked about this last year and again the year before when I wrote about the reflexivity of skill and how certain genre “classics” have not aged well. This continues to be true.

So aside from bitching about it, what’s the alternative?

Seeking out different perspectives is an option (and I do mean “seeking”, because these are not easy to find). So is specificity of experience. Consider Ken Kocienda’s “Creative Selection”: an author saying “I can’t give you persistent truths/truisms, but I can tell you about what it was like to make this thing, what worked, what didn’t, and what we learned along the way. Here is a story of a particular moment in time.”

In that vein I loved “Blood, Sweat, and Chrome” about Mad Max. It didn’t tell the story of all movies, just the very particular challenges of making one (amazing) movie. It didn’t attempt to draw huge life lessons, but to tell the interesting details of a very singular experience, not to MBA-ize it into a case study. And in telling that story, it revealed the complexity of the world. It is “to see the world in a grain of sand/And a heaven in a wild flower” (William Blake). Instead of 10 thematically different books referencing a well worn parable, it is a richly-detailed specific experience that can be the read 10 different ways, depending on the lens you bring to the table. It has no agenda except to present the messy world as it was, as people experienced it.

Try it on for size: read it as a story of creative persistence in the face of repeated obstacles. Or as a management parable of hiring and inspiring people. Or a management parable about how many different people and skill sets come together in different ways to build a final product, where the “director” is a central thread holding things together. Or as a technical manual of how award-winning creativity and stunts came together. Or of how the Hollywood system operates and how it almost killed this film. Or as a story about the unevenness of creative throughput over a career. All of those are readable stories; the world is complex. “I am large. I contain multitudes.” (Walt Whitman)

The “Wet Streets Cause Rain” Problem

The last observation, a corollary to the “write about specifics”, one that friends have tired of my remarking on, is the pain of Gell-Mann Amnesia. Here’s Michael Crichton saying it better than I can:

You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.

In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.

Right: “Wet streets cause rain”.

Related to this is “Writing as an Operator” (https://lethain.com/writers-who-operate/) by Will Larson, where he cites the “disconnect between operational observations and eventually not having experience to draw on anymore” for why he does not want to be a full time writer. I fully agree!

Many journalists/writers never have operational experience. They try to explain the systems they write about from the outside in. They attribute logic and reasons in ways that make for good storytelling but if you work in the industry, know is often patently untrue or backwards. This calls to mind related patterns, such as “Can a Neuroscientist Understand a Microprocessor” which posits that the methods and tools we have to evaluate complex systems from the outside in are not adequate to evaluate and fully understand complex systems.

Working in business and reading journalism about business, working in healthcare and reading journalism about healthcare, and working in technology and reading about technology heightens this sense for me: writers attribute malice and ill intent where what more often exists is gross incompetence or good intentions with awful implementations. Business, healthcare, and tech are easy to villainize — and oftentimes rightfully so – but the ghost of “wet streets cause rain” and Gell-Mann amnesia has made me more sensitive to giving things I read — and more recently, people I engage with — the benefit of the doubt rather then ascribing to them motives or narratives. (It doesn’t hurt that Hanlon’s Razor — “Never attribute to malice that which is adequately explained by stupidity” — often applies as well in these scenarios).

That said, there needs to be fair warning about being sucked in to the the logic of the system, what folks call “drinking the kool-aid” and losing the ability to objectively describe the way things are. Executives do this the most, as they are the proverbial kool-aid factory.

So the pithy observation here is that few professional operators are good writers and many cannot be trusted. But many writers are poor explainers and few get things right. So read critically, and don’t believe everything you read… or everything you think. Pith as promised!


The Stats

This year, I read:

  • 25 Books or book-like things
  • 8 were fiction (two of which were comics)
  • 4 can be best described as art books or monographs
  • 2 were children’s books
  • 6 could be described as memoirs or memoir-like
  • 3 were mostly-technical or for-professionals books
  • 12 were from writers who probably wouldn’t identify as the default setting in American publishing

And that wraps up my annual reflections.

RK, 2024


The Full List, 2023

Strangers to Ourselves: Unsettled Minds and the Stories That Make Us by Rachel Aviv (2022)

Craftland Japan by Uwe Rottgen and Katherina Zettl (2022)

Blind Spot by Teju Cole (2016)

⭐ Mismatch: How Inclusion Shapes Design by Kat Holmes (2018)

Where I Was From by Joan Didion (2003)

Nike: Better is Temporary by Sam Grawe (2020)

Project Hail Mary by Andy Weir (2021)

Player of Games by Iain M. Banks (2009)

Griz Grobus by Simon Roy (2023)

A Fistful of Pain by Lindsay/Joyce (2023)

Born Standing Up by Steve Martin (2007)

No Filter: The Inside Story of Instagram by Sarah Frier (2020)

Make Something Wonderful: Steve Jobs in His Own Words by Steve Jobs (2023)

⭐ The Skull by Jon Klassen (2023)

Echo: A survey of 25 Years of Sound, Art, and Ink on Paper

There and Back: Photographs from the Edge by Jimmy Chin (2021)

⭐ How Big Things Get Done: The Surprising Factors That Determine the Fate of Every Project, from Home Renovations to Space Exploration and Everything In Between by Bent Flyvbjerg & Dan Gardner (2023)

Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture by David Kushner (2004)

Mazda Miata 2023 Car Manual

Little Labours by Rivka Galchen (2016)

⭐ Barbarian Days: A Surfing Life by William Finnegan (2015)

Termination Shock by Neal Stephenson (2021)

The Mysteries by Bill Watterson (2023)

Blood, Sweat, and Chrome: The Wild and True Story of Mad Max: Fury Road by Kyle Buchanan (2022)

The English Understand Wool by Helen Dewitt (2022)

]]>
<![CDATA[Building and Running Software for Non-Technical Folks]]>https://sharedphysics.com/what-is-software-development/65543f56706449438ede2526Mon, 27 Nov 2023 03:12:48 GMTI've recently found myself explaining what software development is to people who are on the periphery of technical work but don’t know how to approach it.

Some of these folks are aspiring software developers. Others are product managers trying find ways to work better with their technical counterparts. Others are business leaders trying to make sense of the engineering department and why headcount seems to grow so fast.

This is my reusable summary of what goes into “building and running software on the web” for people who aren’t software developers, based on those chats, emails, and conversations.


The “Work” of Building & Running Software

Software development can be split into six areas of activity:

  • UIs: Building user interfaces
  • Algorithms: Transforming data & logical operations
  • Data Management: Moving, storing, and retrieving data
  • Quality: Troubleshooting and testing
  • Operations: DevOps, managing environments and infrastructure
  • Technical Debt: Refactoring and performance enhancement
  • (And if you’re inclined: planning, estimating, scoping, and architecting are a seventh activity)

The UI Work: Building User Interfaces

The user interface (or UI) is the part of the program that most people interact with. This includes everything from accessing a web page, loading an app, clicking through a slideshow, playing a game, interacting with VR, and a whole lot of other stuff. If you interact with it, it’s a user interface!

UIs come in many different forms. The ones most people are familiar with are Graphical User Interfaces , or GUIs. You can also have text-based interfaces (such as for a chatbot/ChatGPT), voice interfaces (such as Alexa or Siri), virtual reality interfaces, and command-line interfaces (CLIs) for interacting more directly with the computer itself. You can also have physical interfaces (buttons, switches) but those move into the realm of hardware engineering.

The goal of a UI is to let you interact with the program. A UI allows you to do something, captures that input, and then returns something to you.

That said, a UI is different from the UX, or User Experience. The UI is strictly about the interface, while the UX is a more holistic view of a user’s experience, perception, and goals. It is kind of like the difference between tactics (how you end up doing it) and strategy (what you are trying to do). Software development can sometimes conflate the two, but the practice of UX can encompass activities beyond software development, including communications and copywriting, growth, marketing, graphic design, service design, and education.

UIs are the most prominent and visible parts of a program, for obvious reasons. When most people think of software, they think of the software’s UI… and not what’s under the hood.

The Algorithms Work: Transforming Data & Logical Operations

UIs are the facade that hides software's real work, which is to say the running of algorithms: rules for data transformation and logical operations. This is a fancy way of saying that “turning inputs into outputs”.

At the most basic level, an algorithm is a set of instructions. For example, a recipe is an algorithm. When a developer writes code, they’re writing a lot of different instructions for the computer to follow.

Some of these algorithms are simple, such as addition and subtraction. Some algorithms are very well known, such as Binary Search or Quicksort. Other algorithms are extremely complicated, sets of instructions upon instructions, loops upon loops of branching if/then/for/when statements. Many algorithms come prewritten into the tools (the programming languages and frameworks) a software developer is using and can be called upon as a named “function”, and some of those tools can be better and worse for solving the programming challenges on hand.

John Carmack is a very famous person in software development, and has created many innovations in gaming and graphics processing. (Link)

When we talk about algorithms in software development, most of the time it is either to transform data (i.e., 2+2=4 … or 2+2=22 if you want), or to perform a logical operation (such as if/then, or, and, for, when, etc). Here’s a pseudocode example:

if form_data="lettuce" {
    select field "color"
    update text="green"
}
else if form_data="radish" {
    select field "color"
    update text="red"
}

When you’re writing code, you can give it a name and then reference it again later. For example, if I write

define function foodColors() {
 (insert psuedocode from above)
 };

… then I can reuse the algorithm/function foodColors() by writing just the name instead of the entire block of code.

Algorithms, data transformations, and logical operations are what the software does. In the same way that the UI is the piece that most people think is the software, the algorithms are the software-y part of the software.

The Data Work: Moving, Storing, and Retrieving Data

While you can write software that starts up, does one thing, and then ends, most software is designed to work over an unspecified period of time. You need to be able to move, store, and retrieve data to allow this to happen.

Data can be stored in many different ways. You can store it in the computer’s temporary memory or cache, in a file (such as a text or a .CSV – comma separated values – file), or in a database.

Once you store the data, you need to also to retrieve it… which is a little bit harder. You need to build a query for this, which is a combination of filters, sorts, limits, and identifiers that get you the exact thing you’re looking for. If you don’t properly label and organize your data (such as with unique IDs per row), retrieving it can be a mess!

So organizing and labeling data is one of the most important factors in storage and retrieval. Data qualifiers and metadata like the aforementioned unique IDs are one step of the puzzle. The other is deciding if you make the data hard to read (for analytics), or hard to write (for running the software).

“Hard to read” is essential when you ask a developer, “Hey can you pull a spreadsheet of people’s contact information and services they had recently” and they give you a convoluted answer about how the data is split across many different tables and it will take a week to get you that report. It’s hard to “read” the data because the data was organized in a way that makes sense for the application and its performance, but not in a way that makes it easy to read for a layperson.

Here’s another example: let's say you’re making a task management tool. As a developer, it would make sense to have one table store all tasks, a different table to store all users, another to store all comments, another for a list of all projects. It makes sense to separate them out because they might have different relationships; one task might be associated with different projects, some comments might not be visible to some users, and so on. Now if an analyst comes knocking and asks for a report on how many comments a user made on a certain project, the pain is that you have to join and filter a lot of different tables to answer the question. There's no one pre-existing table that shows you those relationships, because those relationships are generated on demand when the application needs them.

“Hard to write” is the opposite: it means organizing information in a way that makes sense for an analyst’s report, but creates application inefficiencies and difficulties. Because of this, developers naturally gravitate towards the “hard to read” approach.

Of course, hard to read/write aren't mutually exclusive approaches. For example, a software developer can design a data “object” by pre-linking data sets with different primary/foreign keys to make it easier to pull together.

When it comes to where the data is stored, as with algorithms, different databases are optimized for different use cases. You might have databases designed like a spreadsheet (and read using a "structured query language", or SQL), or like documents with data nested inside other data (NoSQL)… or they might be graph/vector based datastores (I won't get into that). Even databases that look similar (say spreadsheet-style SQL databases) can be optimized for different problems, such as for columnar growth (adding more rows takes up very little room) or for tabular growth (adding more columns takes up very little room).

The Quality Work: Troubleshooting, Fixing Bugs, Testing

There’s an oft-quoted saying that “software development is 10% writing code and 90% fixing bugs”. Unless you're a technical savant, this holds true for most people.

This is because algorithms are extremely logical and precise, in the sense that you can’t hand-wave away the details. Let’s say that you’re writing instructions for making a sandwich. If you’re a human, you can deal with a lot of ambiguity:

  1. Get bread
  2. Spread peanut butter on it
  3. Spread jelly on it
  4. Put more bread on top
  5. Eat it

If you’re a program, you’ll end up running into a lot of different errors along the way. Here’s the same set of instructions, but interpreted by a program:

  1. Get bread… (What kind of bread? How much?)
  2. Spread peanut butter on it (Where did this peanut butter come from? What are you using to spread it? How much? Do you keep the peanut butter out or put it back?)

… at this point, most programs would throw an error of some sort, such as:

  • Undefined Error: “Bread” and “Peanut Butter” undefined
  • Order of Operations Error: No peanut butter found. Did you mean to open fridge first?
  • Operational Error: Tried to spread peanut butter but lid was closed
  • Overflow Error: Spread peanut butter all over your table because you didn’t define how much

... and so on. As Carl Sagan once quipped, 'If you wish to make an apple pie from scratch, you must first invent the universe'. Writing software isn't too different. Some programming languages are opinionated and might make a decision for you, some are not and might crash.

And that's before getting to user-errors. So after you write your initial set of instructions (the 10% of the work), you’re left with all of the edge cases, exceptions, bugs, and scenarios that you did not account for (the 90% of the work). This is why many developers hate hand-wavey requirements such as “just make this thing”. There are always a lot of assumptions built into that “thing” which a user is too busy to think about and describe, but the developer can’t avoid. This is why most developers will ask a lot of “what about if…” scenarios and provide very specific caveats to what they say.

So the way this plays out for most developers is that they write the initial instructions, run a program, find the bugs that crash the program, work through those bugs, run the program again and find the bugs that create unintended consequences but don’t crash the program, fix those, run the program again and find the bugs that result in not meeting the user requirements, fix those, run it again and find security holes, fix those, and keep this going in a very long loop until either the programmer has figured out how to navigate every edge case/input, meets all the requirements but ignores the improbable bugs, or gets bored and hands off the code to someone else to test (QA, the product manager, or the end user) and buys themselves some time before someone finds yet another scenario that doesn’t work.

Brenan Keller is a very tall engineer at Snapchat.

Testing has its own variations as well. You can be doing unit testing (is a function working?), integration testing (does the software still work with the new code that was added?), or user acceptance testing (UAT, which is about evaluating if the requirements have been met). Some of this testing can be automated (unit testing), some manual (UAT), and some somewhere in between (end to end systems integration testing).

More philosophically, there are many perspectives on what software 'quality' is and who should be responsible for it… is it on the developer, writing the code? Is it a dedicated quality assurance (QA) tester, or should developers be testing their own code? Is it on the product managers, when writing requirements? Is it on the user, who signed terms and conditions to use the software in a specific way (and why are they trying to enter paragraphs of text into a number field anyway)?

My perspective is that there are two types of “quality”: the software’s ability to meet user expectations and solve problems, and the reliability of the software to run repeatedly without errors.

The Operations Work: DevOps, Managing Infrastructure, and Environments

Making software isn't the only part of the work. Software developers also need to make sure that the software runs somewhere, preferably safely accessible by the users. While software can run exclusively on a single person's computer, most of today's software runs on the cloud, often across many different systems concurrently.

Running software on the web requires answers many questions about the infrastructure and environments; a developer needs to control or decide on a lot of things:

  • What kind of computer are they running it on?
  • What kinds of supporting software ("packages" and "dependencies") are they using?
  • What versions of the aforementioned dependencies are they using?
  • How are they moving the code from their computer to somewhere else?
  • How are they handling scaling and volume?

… and so on. This is sometimes referred to as DevOps, or developer operations. If software development is akin to building a house, this DevOps work is akin to figuring out where you’re going to build it, understanding the climate, the regulatory environment, then choosing materials for your house and shipping them to your destination.

DevOps is the answer to when a developer says, “Well it works fine on my computer!” Like I said before, most of today's software is hosted and run in the “cloud”, which is someone else’s very large and distributed collection of computers.

And again – because the specifics matter a lot in software development, even things like different versions of a software dependency (version 3.1.1 vs. version 3.1.2) can make difference in if something works or breaks. Making sure that the work of one engineer is compatible with the work of another means creating controls around the variables in their programming environment, either by creating “virtual” environments (a temporary workspace with predefined parameters that mimic each others’ workspaces) or through “container” environments (similar, but prepackaged to make it easy to share with others).

Setting up an environment and managing the supporting infrastructure are often both the first and last things a developer will do when writing code. It will be the first thing because this is how they get started. It will be the last thing because it is how they make sure that what works on their computer also works on everyone else’s computers.

In between those environmental bookends, there is also the work of moving the code from one place (your computer) to another (the server, or the repository). Code repositories such as GitHub or GitLab are where developers consolidate the work they've been doing (branches or code) into a unified codebase. Code is pushed (deployed to the codebase) or pulled (cloned to your local environment). Code gets merged into the codebase with a commit, and the more people are working on the code base, the more often differences in developer's branches needs to be reconciled before it gets merged.

But that's just the codebase; that codebase can then be moved and combined with the software that's live continuously (as in Continuous Integration/Continuous Deployment or CI/CD), or through preplanned and managed releases. The “quality” of a developer operations program is determined by the speed and reliability with which code can be deployed and integrated with other code… but the success of a developer operations program is dependent on the developers’ investment into testing automation and feature flagging (which is the ability to turn features on and off selectively for different users).

And as with testing, a DevOps program can be programmatic and automated (infrastructure as code), or completely manual through documentation and individuals at the controls.

The Tech Debt Work: Refactoring and Performance Enhancement

After code is written and is delivered to users, something funny happens: the world changes. First it is almost imperceptible. People may use and like the software because it helps them do things. Then they realize that they want to do more things, or different things. Or the business grows and there are new problems to solve.

Code is almost always written incrementally, additively. Software developers don’t rewrite the whole program every time you need to add a new feature… they often just write some more code and add it to the code base. As an engineering team grows, more people end up adding things in. Over time, the size and complexity of the codebase increases.

This often results in weird and unintentional behaviors. In best case scenarios, this is a result of accumulating and overlapping algorithms creating unforeseen behaviors… aka emergent complexity. In the worst case, it may reflect natural human error or the politics/organizational design of a company resulting in different parts of a program behaving in different ways.

And as the size of a program grows (in terms of lines of code), it also becomes harder to manage. In a program that is 100 lines long, it is pretty easy to grok everything that is happening. When a program is 100,000 lines long, composed of 200 different files that reference each other, developed by 15 different people over the last 3 years (and that’s a simple example), things get really hard to keep track of.

Another thing that happens is developers take shortcuts to expedite code and feature delivery. After all, most businesses prefer fast delivery (“Can you do it by the end of the week?”). Such work accumulates future debt in exchange for a payoff today.

All of this results in what’s called tech debt. Every change becomes more difficult to make, the program becomes harder to understand and work with, and it breaks more often, for harder to discern reasons.

Tech debt is addressed by refactoring, which is to go back and clean up yesterday’s code to meet today’s business requirements. This is a matter of rewriting or simplifying code — valuable meta-work for developers, but not an obvious benefit to a business because superficially it’s a lot of efforts to make something that… does nothing new? I mean, have you ever tried to pitch to your CEO that the team needs to pause work on all new features while they clean up, often for a quarter or two? (For the record, I’ve listened to such conversations and they went as ridiculously as you would think).

But not all refactoring is to tackle tech debt. Some refactoring is to create performance improvements. Making things work faster, easier, and more reliably is an important business goal, whether that is faster for the application to do something (such as retrieve data or load a web page) or faster for a developer to create new features (because there is less tech debt, or better commented code).


Ok! That sums up what the “work” of software development for the web looks like (and if you're working on things like LLMs, graphics, or software-hardware interfaces, there are a few other things worth talking about). And it's also not the end of the story. In the future (if I get around to it or if there's demand), I'd like to create a Part 2, looking at some of the specific tools and languages developers use, and why/where; and a Part 3, to unpack the different roles in software development.

Thanks for reading.

]]>
<![CDATA[When to Demo & When to Document (Two Approaches to Building Software)]]>https://sharedphysics.com/demos-vs-documents/65331b23daa90982ed42535bSun, 22 Oct 2023 23:22:31 GMTLots of product and engineering teams debate having “document-led” versus “demo-led” cultures of building.

These represent two opposite ways companies have for creating consensus and settling debates during planning and build cycles. A demo-first culture leads with tangible artifacts for people to react to and then iterates on that feedback. A document-first culture leads with documentation and analysis — whether a PRD, specs, PR/FAQ, RFC, or similar-style deliverable — to figure out work worth doing, define the scope, and set the intended outcomes.

The most visible implementations of these cultures have been at Amazon and Apple. As a quick summary:

Amazon is famous for their 6-page PR/FAQ documents that are aligned on by a leadership team before anything gets built. The bar for approval of a PR/FAQ is high, and the process becomes an effective focus-protecting/project-killing mechanism. (This is described in detail in “Working Backwards” by Colin Bryar and Bill Carr.)

Apple is famous for a culture of iterating through proof of concepts and hands-on demos, which are also reviewed by senior leadership. Each demo gathers crucial feedback that is used to continue problem-solving. Experience rather than business analysis is the crucial driver of decisions. Notably, these demos are started and iterated one with little-to-no original written specs and requirements. (This is described in detail in “Creative Selection” by Ken Kocienda.)

Most individuals I know have a strong preference for one culture or the other. Here’s a sample:

Demo-Forward Document-Forward
Likes Fast to iterate and get customer feedback. Bypasses a lot of arguing and naval gazing in favor of speed and action. Focused on user experience and satisfaction. Great when the team is experienced, as this is dependent on product taste and tacit-knowledge. Lends itself to personal initiative. Gets to clear requirements and understanding of why a project exists. Ability to focus attention and capacity on the things that matter. Figures out/validates details before any work starts and kills unnecessary projects. Work proceeds easier, quicker… when you get around to it. Typically business outcomes oriented, business-initiatives focused.
Dislikes Can result in throwaway work, or working on the wrong things. When requirements and definition of done aren’t clarified up front, it can feel like grasping in the dark. Can be short-sighted. Can result in projects that scratch a personal itch but miss the big picture. Can feel like a lot of arguing before “work” starts. Very abstract, needs a (difficult) step to translate the business needs into actual deliverables. Business analysis can forget to include hard-to-quantify human perspectives. Too much emphasis on upfront requirements can turn into the worst of waterfall-style planning and hamper flexibility in execution. People who are hands-on find it hard to delay the execution/implementation side of the work.

However, most teams I’ve seen or worked with try create a hybrid ‘best-of-both-worlds’ culture and implement it via a one-size-fits-all, we-all-use-the-same-process-across-all-projects approach. They end up with a combination of upfront documents (PRDs/briefs) and build-demo-build processes (sprint-style cycles of development). I've been guilty of this myself.

On the surface, it makes a lot of sense. Yet I’ve never heard anyone brag or speak positively about these hybrid approaches. Rather than getting the best of both cultures, they end up feeling like the worst: the documentation parts feel like bureaucratic pre-work and doesn’t validate ideas against customer needs (or get nearly enough projects killed), while sprint-style development gets to biweekly units of planning/work but mostly pays lip service to the importance of feedback and validation in a true demo culture.

My own conclusion is that hybrid approaches don’t work. At least, none of the ones I’ve personally experienced. Again: it feels counterintuitive because there are a lot of processes that appear to be complementary rather than in conflict, and merging them should be a 2+2=5 situation.

The reason hybrid approaches tend to not work is because they solve for different problems. Document-first approaches solve for the “why” - what work is worth taking on, why, and when. It answers business questions. Demo-first approaches solve for the “how”. They focus on experience, tangibility, and the specifics of execution.

Rather than creating a hybrid approach, you should selectively deploy one or the other method of building depending on the core constraints of the project.

Here are my rules of thumb/questions to ask:

Rule of Thumb 1: What’s your cost to build?

Cost to build can mean many things to many people, but I’m referring specifically to budget and the availability of people’s time and attention.

  • If the “cost to build” is high (low budget, or your team has low bandwidth to take on projects), then you need to lean towards a document/validation-first culture of development to kill more projects than you approve.
  • If the “cost to build” is low (large R&D budget, loads of engineers, lots of time), then you should lean towards a demo/ship-and-iterate style culture. If you don’t, people will be sitting around waiting for work.

Rule of Thumb 2: What’s your risk?

Every project carries some risk – whether financial, reputational, legal, delays, unforeseen problems, or otherwise.  

  • If your risks are high (let’s say you work in a tightly regulated field, proposing a significant investment, working on a hard-to-reverse decision, or working on a flagship brand defining feature…) then you should lean towards a document/validation-first approach to de-risk your work. Pre-mortems are an especially useful tool here.
  • If your risks are low, then you should lean towards a demo/ship-and-iterate approach. Getting things wrong in low-risk environments is a great way of learning!

… as a corollary to this, how much is unknown/unknowable, and how many assumptions are you making?

  • If you are making a lot of assumptions or a lot about your project environment is unknowable (i.e., the work of invention), then you should focus on experimentation via a build-and-demo culture. Action produces information that no desk research or surveying can uncover because either the answer doesn’t exist or there is no right answer.
  • If you are working with a well defined problem space and a lot of established research, save yourself the trouble: don’t reinvent the wheel, build on (and not against) solved problems. Preparation and pre-work will save you a lot of rework.

Rule of Thumb 3: What’s the size of effort?

Similar to “what’s your cost to build”, what's the size of your effort? Project estimation is loaded with all sorts of planning fallacies, but most of the time you can spitball an informed "this is a task versus this is a big effort" estimate.

  • If the deliverable is effectively a single task (or maybe a set of simple requirements) that will take a few days or a week to build or requires a single person for delivery, then just go ahead and build it. You don’t need a lot of business scaffolding around it.
  • If the deliverable is a large project, something that is expected to take multiple people, multiple weeks, and interacts with many different systems… then you should take the document-first approach to get everyone on the same page & figure out all of the dependencies you might not know about.

Rule of Thumb 4: What are you debating?

This gets to the heart of the differences between Apple’s and Amazon’s approaches: people are really bad at figuring out and making sense of abstract things, understanding probabilities, and debating relative merits.

  • If the contention/alignment is around tangibles or experiences (such as interfaces, designs, or communications), then a demo approach is the way to go. Kocienda describes it as ‘trying to debate which imaginary puppy is cuter, a ridiculous debate until someone shows you the two puppies’. The better way is to put something in front of someone and to react to the specifics.
  • If the contention/alignment is around strategy, business problems, goals, or similarly abstract concepts, then go with a document approach. In circumstances like this, jumping straight into demos or prototypes bypasses the important prerequisite questions… only for them to surface again more painfully later… Many doomed initiatives start this way and end with someone asking with exasperation, “why are we working on this again?”

Rule of Thumb 5: What are your team’s strengths and weaknesses?

Teams tend to be weighted towards one polarity of skillsets or strengths. Some people are naturally better at diving into building things while others are thorough researchers. It is hard to teach people to work against their natural strengths. So assign work to their strengths instead.

Instead of prescribing a one-size-fits-all process, focus instead on identifying the right problem-solution fits, set clear standards for outcomes, and maintain flexibility for how the team gets there.


What does this mean in practice?

Demos and documents are tools for solving different problems. Some projects may never need documentation, while others might skirt the demo. The best approach to planning and building things is to make sure the tools you use reflect the problems you're working on. That means processes can vary project by project, and you can change up your processes as you learn more about the problems at hand.

And that’s ok.


Continued Reading

If you're interested in exploring this topic more, I recommend the following:

Books:

Working Backwards” by Colin Bryar and Bill Carr (about Amazon's processes)

"Creative Selection" by Ken Kocienda (about Apple's processes)

"Shape Up" by the team at Basecamp (About 37 Signals' processes)

"Creativity, Inc" by Ed Catmull (about Pixar's processes)

Articles

"Putting Amazon's PR/FAQs to Practice", Commoncog

"Creative Selection: A Summary", Commoncog

"Action Produces Information", Commoncog

"Apple Demos vs. Amazon Memos", StatPost by Trung Phan

"Project Management at Big Tech, and the Curious Absence of Scrum", The Pragmatic Engineer

"The Cynefin Framework For Decision Making"

"What Should Be On the Roadmap?", SharedPhysics

"Some Common Planning Fallacies", SharedPhysics

"Premortems", Asana

]]>
<![CDATA[Product Chats: Building Teams and Products from Scratch]]>In February 2023, I recorded a podcast with Kayla Cytron-Thaler at Canny.io about the role of product management, building products from scratch, and more. This is the video, audio, and cleaned up transcript.

In particular, the transcript is a bit different from the recording – I'm taking

]]>
https://sharedphysics.com/building-a-product-from-scratch/6849a4cee885df3fd0c26621Wed, 22 Feb 2023 05:00:00 GMTIn February 2023, I recorded a podcast with Kayla Cytron-Thaler at Canny.io about the role of product management, building products from scratch, and more. This is the video, audio, and cleaned up transcript.

In particular, the transcript is a bit different from the recording – I'm taking it as an opportunity to clean up my thoughts, remove some redundancy, add in detail where it was missing in the moment. Not substantially different, but not a direct transcript either.


Building a Product From Scratch with Roman Kudryashov of Recommended Systems
Roman Kudryashov is the founder of Recommended Systems and co-founder of Dragon Blood Balm . He started his career in journalism – right when the industry was trans…

Transcript

Kayla: Thanks for tuning in to Product Chats. On today's episode, I talk with Roman Kudryashov, who is the founder of Recommended Systems. We talk about building out a product team and also building a product from scratch.

In a minute or less, can you tell us a bit about yourself?

Roman: Sure. My name is Roman Kudryashov. I am currently the CEO of Recommended Systems and Dragon Blood Balm. Before I started those two companies, I was the Director of Product at Pager, and previous to that I've held a number of other roles both in the product and marketing space, including as a Senior Director of Marketing and as a Director of Digital Operations. I've had a chance to work in a number of different industries from software to physical products, from the luxury sector to healthcare.

The Non-Traditional Path to Product Management

Kayla: Let's actually dive into that a little bit about where you started and what that journey has kind of looked like, diving into how you got into product.

Roman: I think that my road to product, like many other people, was non-traditional. I don't think that—or at least five to ten years ago—there weren't any courses that you took in college that were explicitly about product management. I started my career in journalism, and that was when every single newsroom was going through a transition to digital where print publications were going out of style and everybody needed to have a web strategy, a digital strategy.

As part of that, I had to transition into learning about how digital products are created, what works, what doesn't, how people come in and interact with web properties and software properties that were being put out there. That led me somewhat directly into marketing, where I had the opportunity to think about even deeper how to acquire users, how to keep them engaged, and how to make useful things for them.

There's often two big areas that people tend to specialize in, when working in marketing. One is the traditional realm of marketing communications (advertising, communications, email strategies, media, and so on), and the other one is product marketing. With product marketing, you're thinking about how to build a relationship between the company or the product that you're creating and the consumer that is on the other end of it.

When you think about that sort of relationship, you find yourself in a cohort alongside a number of other departments that are also thinking deeply about that. Product management and product development being one of them. So in doing that work, I found that marketing (as I was doing it) was starting to deeply overlap with solving the problems that product teams were working on. So I had a chance to informally start working with them and then eventually I transitioned into a full-time product role where I was responsible for building out internal tools that were informed by our team's and company's needs, and when I was successful at at, to an explicit role building and leading a product management team.

Building Teams from Scratch

Kayla: With that, let's actually talk about building out teams and what kind of piece of advice or what that looked like for you.

Roman: My experience has been mostly around coming in to areas where a team may not have existed before or where a team needs to be rebuilt, both on the marketing and on the product sides.

The biggest question when you're starting something from scratch is: what is it that you're actually trying to accomplish? For example, when I come into a company, oftentimes the initial guidance that I'm given is more so in the form of a question or in the form of a challenge. Someone might say: "We have an opportunity of some sort and we think that product—or marketing, or product marketing—is the right way to address it, but we don't have the internal expertise to figure out what is it that we should do here."

So when I come in, the first thing that I need to do is figure out: how does this department actually align with the business goals? Where does it fit in? What already exists, and who does that work? Whatever the answer is to those questions, that informs the team dynamics and expertise you need to bring in. For example, do you have a fully robust engineering team that you're partnering with, or are you relying on an agency? Do you have a community education team that can augment the things that you're building, or is it going to be the responsibility of the product team to own some of those communications?

As you go deeper into that, you start to figure out what kind of expertise do you need to bring to the table and then also what kind of processes do you need to have in order to enable the department to succeed. There's never an out-of-the-box solution to pull from – I've always built teams in situational, contextual ways.

Another example: if you're in a more enterprise organization, it may be a very real situation where you're coming in and working with predefined contracts that you have to deliver on and you're responsible for figuring out how to deliver that. Your product team may need to lean into having project and client management skills. Whereas if you're in more of a direct-to-consumer space, you'll probably want a team that's more hands on with research methodologies, mass communications, and prototyping techniques.

And after you've figured that out, the most important thing is to show meaningful progress. You don't have an unlimited timeline for these things and you don't want to squander the trust you start with. That doesn't mean you have to make quick and rash decisions but you do need to figure out how to quickly get to action. I think that anybody coming in—whether you're at the manager level, at a director level, or a leadership level—you have to figure out: How do I solve the problems that the company has and how do I demonstrate value? And then when you do begin to demonstrate value, you begin to build more trust and permission to try more things. And with that trust, your the team begins to grow – in every way (practically, psychologically, size-wise). Your capacity to take on additional challenges begins to grow as well.

Demonstrating Value in Different Contexts

Kayla: I kind of want to dive into how you actually hands-on demonstrate that value and what does that look like.

Roman: That answer is very context-specific. If I think about, let's say, in an enterprise organization, value may be around creating product utilization. For example, at Pager, we had a B2B2C model where we worked with payers and we indirectly helped our clients get patient utilization—we helped them get utilization through their membership. But the kicker is that it was a B2B2C model, so we didn't have control over the final utilization plans. We proposed ideas and models to the client, but the client had to implement it – we were a feature embedded in their bigger app. We couldn't follow a lot of "best practices" because we didn't own the end-to-end experience.

So as a product manager, one of the most important things for us was to figure out how to build a product that conforms to the ecosystem that our clients have and then within that, how to manage the relationship to the place where the client wants to follow – or at least is open – to our advice and strategies. Some of the ideas we got the client to implement had nothing to do with our product, but made the overall client app better, and as a result drove our own utilization up. We became a trusted consultative partner to our client, beyond simply being another vendor. That consultative relationship led to more and broader contracts beyond the feature set we had already built. That was the value being created, beyond any trackable metrics.

Whereas if you are a direct-to-consumer company, you probably have way more control over your product and as a product team, you may be more focused on traditional business metrics. For example, what do your referral rates look like? What does your utilization funnel look like? What do your returning users look like? And then moving the need on that. It's much easier to measure, but just as valuable.

In that sense, product management is extremely deeply intertwined with the business model and the monetization strategy. It's not about making and launching features, though that might be what the work mostly looks like. It's about the target that the work is pointed at, and the target is a business target.

The Unique Role of Product Management

Kayla: It's always tying that back to the company goals, right? Our goal is to drive more revenue or our goal is to do this. I've seen a lot of successful product teams giving their teams the flexibility to choose their own path and obviously interview customers and figure out the why of why do you need this and figuring that out, but always keeping that bigger picture in mind of what are these company goals and not just feature building because a customer wants it, but actually building stuff out that aligns with what's our bigger company vision and where are we trying to get.

Roman: Right, that's a really good way to look at it. One of the things that I hear from product managers every once in a while—one of the things that I see broadly in the industry—is that it's very easy for product management to get caught up in the software development cycle. You see articles on Medium or on personal blogs that are talking about the successful delivery of features: "We shipped something in a timeline and it had no bugs." And that's wonderful, right? As a company, we all want to be able to ship software that has no bugs, software that gets used and hopefully delights the customer.

But when you think about the role of different departments in a company, there's a trifecta in that world. On one leg you have product management, on another leg you have design, and on a third leg you have engineering.

When you think about it in that sense, what is the unique thing that product management brings to the table? When you think about the successful shipping of features without defects, that's really a question for engineering. Has the engineering team built things that are reliable and that work?

And likewise, the questions around "Are people using it? Is the experience that is being created well?" is really the domain of designers to come in and think through. They're the ones that should be asking, "What are user needs? How do they interact with the interfaces that you've designed? How do they go through and navigate the application to be able to solve the problems that they're coming in to solve?"

Okay, well then what does product management do? Product management answers the question of, "What are the important business questions for us to tackle?" As an example, when you write a PRD, the thing that you are trying to solve for or answer is: if we solve this problem, will we be able to create business value?

Then you begin to define the KPIs around that. "So if we solve this problem, then we expect to see some sort of increase in utilization, or we expect to see some sort of increase in revenue, or we expect to see some sort of new opportunities or growth being opened up." 'Product' metrics and 'business' metrics often overlap, but not always. Same thing with 'marketing' metrics and 'business' metrics. It's really easy to get lazy about that and use them interchangeably, and then when you get product managers talking about the number of clicks and page views or marketing people talking about the number of impressions, and the business folks in the room get frustrated because they can't draw the line back to the world they're operating in, which is often financial metrics. Almost everything gets translated into financial metrics at some point, because that's what businesses are organized around. And that's not a judgement call – there are other organizations (such as governments, NGOs, non-profits) – that are organized around other value metrics. But the purpose of a business is to survive and thrive, and that's a function of making money.

Then in addition to that, product managers also need to solve for the promise of taking on work. You've promised to take on some work, and gave a convincing reason for taking on that work. How do you make sure that work actually delivers on that promise? Because very often if you've built a really fantastic feature and you've shipped it and but nobody's using it, then you've not actually gotten value that you promised.

So it's not the work that's important, it's the results of the work. That's both terrifying and liberating. The terrifying bit is that you have a lot of advice out there that talks about prioritizing process over outcomes, and this sounds like I'm saying the oppositel; process is great, and good processes are more likely to consistently get you to outcomes, so I'm very much for good processes. But if you're judged on outcomes and not processes, that can get scary. But the liberating thing is that if you're judged on outcomes, everything before that outcome is flexible. Any problem can be solved in multiple ways. Say that you're tasking with "drive utilization". You can do that through a new feature deployment. You can do that through a marketing or communications campaign. You can do a webinar. You can talk directly to hundreds of customers. You can A/B test some interfaces and microcopy. You can come up with 15 other strategies for driving that outcome. You goal isn't to stay fixated on one way of working, so get creative. Some tactics are cheaper or more expensive, easier or harder. Choose your own adventure on that.

So to sum it up, what product management brings to the table is this explicit business focus. It's not project management, it's not about trying to be a scrum master or product owner, it's not managing the software development lifecycle—it's solving explicitly defined business problems. And that's where the creativity, that's where the logic, that's where the analysis in product management comes from.

I'm really digging deep into this because there are plenty of times where the solution to a problem doesn't necessarily have to be a software solution, even if you work in the software space. Sometimes it's enough to build a landing page or to run a communications campaign, or to equip salespeople with content. Too many product managers only reach for the "build more software" toolkit.

This is important to remember because engineering resources are expensive. It requires a lot of people to build and run enterprise software, it requires a lot of people to build scalable things, and so you want to make sure that when that team is working on these projects, they're not wasting their time. They have limited time, so you want your partners to be spending that limited time on the most important things.

As so part of that, you're trying to figure out: "What are the right problems? And how do I solve those problems in the most efficient way?"

And then explicitly and intentionally, if the solution is a software solution, you're letting the designers and the engineers be participants in the process of creating solutions. The designers have a ton of expertise from their side thinking about the users. The engineers have a ton of expertise on their side knowing the systems and the architectures and what might be a simple solution versus what might be more complicated. You don't want to turn designers into pixel-pushers or engineers into typists. They have a lot of perspective and experience to bring to the table, likely more than you do in the areas that they work on.

So again, going back to that core point: As product management, you're really trying to figure out what are the right problems to solve as a company. And if you can do that, then I think you'll be extremely effective.

Getting Close to Customers

Kayla: You bring up a great point about not wasting engineering time because it's really expensive. I think that goes into something that's really important as a product manager or even as a product leader—actually getting in the weeds with your customers and actually sitting down and understanding them: interviewing them and then taking a step back and understanding what is the impact and what do they actually need, what's the pain that they're solving versus "Hey, let's build out A, B, C, D."

I know in some cases—you mentioned in the enterprise space you've promised things and that can be different—but in a lot of companies it's "Hey, let's actually take a step back." That's a huge role of product: Let's think of this on a bigger picture and what is the problem that they're dealing with? Not "Let's build this feature," like you mentioned. It could be a different landing page or it could be something else, and it's not always "Let's build out this feature." It's "Hey, let's take a step back and actually think critically about what they actually need and also the different solutions that we could potentially create."

Roman: Yeah, absolutely. I think that if as a product manager you're not speaking to customers or you're not addressing customer needs, then you're missing a core part of your role. And I say that with caveats, of course. There are times where, let's say, you can't speak to customers for whatever reasons. Maybe you're in the healthcare space and you can't talk to patients that might be using the platform, or patients don't want to talk to you. Maybe you're working with an enterprise organization and your direct relationship is with buyers as opposed to the end users of the software so the idea of who is even a "customer" doesn't match up perfectly.

But the customers are the ones that validate the purpose for your business's existence. Customers are existential. You have to figure out who they are, who makes buying decisions, what drives them to make those decisions. It's an existential question.

And not only that, you have to find ways to translate the problems and ideas that people are coming to you with, to how you might be able to begin solving it. That translation layer is extremely important because more often than not, people are actually coming to you with solutions-that-look-like-problems, and not the root problems.

When they come to you with solutions, they do two things. Number one is that they make a ton of assumptions about what already exists and what's possible, what's reasonable. And those solutions might be extremely costly, they might be extremely difficult, they might not even solve the right problem if somebody hasn't done a root cause analysis and said, "Well, what is this actually a solution to, and why is that even a problem?"

There are plenty of times when I've come in and people think that the problem is one thing, and so the solution that they come to the table with is a direct result of that. But if you think deeper about the problem, it turns their perceived problem is actually a symptom of a larger problem.

So you have to go through a process of asking "why" multiple times. "Why is the fact that somebody's not going through and clicking on this button a problem?" "Well, it's because we built this feature out and we want people to use it." "Well, why do we want people to use it?" "Because we think that there might be certain results that you get from that."

Every time you go and you ask "why," it opens up the problem space more and more, and that opens up the variety of solutions that test out. That both gives you more flexibility, and makes sure you're actually solving the right thing.

The other thing about that conversation is, again, a lot of people think in terms of solutions. And I fall into this trap as well, because it's extremely easy to do this. People coming into the product space have a lot of taste, right? They've used a lot of different products, they have ideas about how products should work, and so it's very easy to immediately default to what a solution should be.

But everyone has had this experience, everyone has used apps and has ideas of what software should look like. Your taste is a useful input, but it's not what makes you an effective product manager. It's not even a core skill, because anyone can bring that to the table. If all you're doing is relying on taste, you're not doing your job. You can probably rationalize what your taste-led suggestion is the right one, but you don't want to be in the business of rationalization and pitting one opinion against another. But I'm getting a bit off track – the point is, good ideas are not siloed within the product management space. Anybody can come to the table with a good idea or a good solution.

So as a product manager, you want to stay focused on the problem and not on the solution. You need to be able to step back and evaluate multiple possible solutions, and experiment with different solutions. Your job is to make sure that the problem-solution matchup works, and you want to have all the solutions, everyone's ideas on the table for that.

Building a Product from Scratch

Kayla: On that piece about getting to the why and not solutioning but actually looking to the problem, let's actually talk about building a product from scratch and what does that look like for you.

Roman: Building a product from scratch is tough! A lot of times when you come into a company, there's something that already exists, right? Whereas when you're building something that has never existed, then on one hand it's an extremely exciting opportunity—you can solve anything, you can create anything—but at the same time, going back to that previous point that engineering time is expensive, so whatever you choose to work on, you need to make sure that you're doing something valuable.

So the decisions that you're making are – more than any other time – explicitely about the business outcomes. When you're starting a new company, you have a limited amount of time, you have a limited budget or runway, and you have a very limited number of people. So you can't afford to spend a lot of time building out a bunch of random features.

So the most important thing that you can do at that point is to invest the upfront time in talking to customers, understanding what their pain points are, understanding what problems they're willing to pay you to solve and what they're willing to live with (and are ambivalent about).

Again, going back to the idea that a landing page can be more effective than an actual product solution, this is where you want to try things that might not scale or aren't optimize just to test the waters and validate your ideas.

For example, if you're building a chatbot, you may want to start by not having a chatbot at all but just talking to the customers to understand what kind of questions are they coming to you with. Or if you're building some sort of automated service, then maybe it looks automated but you're actually doing the work behind the scenes. You don't want to be in the business of building solutions in search of a problem.

A lot of people get started, especially when they're trying to start something new, by having an idea and having this rough opportunity in mind, and that's awesome—that's what gives you the initial direction. But once you have that, the most important thing to do is to actually talk to potential customers and see how they use it, if they're willing to pay for it, if they're providing any sort of feedback, if their behaviors are very much in line with what you expected before you invest in the time to optimize for that.

Because once you optimize your solution, you're essentially locked it in. It gets more and more expensive to change features that are already built. Tech debt accumulates. And especially when you're building something new, this sort of feedback and guidance that you're going to get with your first 10 users or with your first hundred users is going to be extremely different as your next thousand users come on or your next 10,000 users, and then again as your first million users come on. The challenges and the architectures that you're going to have are going to be totally different. A product for your first 10 people is a functionally different product, a different company, compared to a product for the next 10,000.

So the most important thing at each of those stages is to talk to the customers, learn from them, see how people behave in reality as opposed to in the assumptions that you have made about their behaviors. It's no different when you're starting a company, except that the stakes are much higher and there's no prior thinking/validation for you to rely on.

The Importance of Customer Validation

Kayla: Just to echo your point about listening to your customers—at the end of the day, whether you're building out a new product from scratch or whether you're working on an existing product, it's the most important thing to do. And especially when you're building a product from scratch, you don't have those resources—most people at least don't have those unlimited resources. So if you're truly listening, you can make sure that there's a market fit. Maybe you hear from people that your idea, even though you think your idea is the greatest thing ever, maybe there's not a market fit, and then you save yourself this heartache of "Hey, I built this out, I spent all this engineering time, and now I don't have a product."

So again, just so important to listen to your customers or prospects or potential customers, what they want, and also making sure there's actual viable business value. So you're not just building out a product—unless your model is freemium—but at least that you have that there's a need and then also that people would actually pay for it.

Roman: Yeah, those are really good points. What I would add is that number one, everything that I'm saying here is not too far off from what Agile development tends to be, right? When you think about it in terms of iterative cycles and getting customer feedback, that's pretty much what I'm saying here as well. You really want to go in and you want to iterate based on real-world experimentation and learning.

I think a lot of people talk about agile, they talk about sprints and talk about cycles and talk about shipping. But if you have sprints, that doesn't mean that you're agile—it just means that you are trying to work in two-week blocks or three-week blocks or so. The most important part of Agile development is to talk to people and to use that as guidance to refine the things that you are building. It's to build a loop of learning and acting, learning and acting. If you just make a loop of acting (traditional sprint planning), you're missing the point.

The other thing—and I think that this is just a broader observation about the industry—is that because "building things" tends to be expensive, that's why a lot of engineers end up starting companies, especially today when you have that skill set. That's a cost center that you don't have to invest in right away, and so you can start to build something out on your own and bring that to the market. Whereas if you're a product manager or you're a marketer or you're a business executive that has an idea, then you have to either learn how to engineer something if you're in the software space, or you have to hire an engineering team. And so it becomes that much more important for you to do that upfront research before you hire people on to build things.

But likewise, if you're an engineer starting your own company, the same thing applies. It's so easy to get caught into a build trap where you say, "Well, it's not expensive for me because I know how to do this." And then the next thing you know, you spend three months, six months building a thing and refining the architecture, trying to make it perfect and squashing every single bug before you've actually shipped it to customers. You've built an entire roadmap of "We're gonna have this version, we're gonna have that second, we're gonna have that third." And then you bring it out to the market and maybe people don't buy it. Maybe you mistook what people said as "These are interesting things that they would want to use," but they don't have a budget for this, or this doesn't neatly fit into the way that people acquire software, or maybe that's not how they solve the problem. Oops, you've spent all this time building the wrong thing and you can't get that time back.

I think one of the biggest things is also when you're talking to customers, a lot of people will give you feedback that's helpful, but there can often be a disconnect between when somebody says "Oh, this would be awesome and I'd love to buy this" versus when people actually do open up their wallets and buy something. So in addition to just listening to them, you have to listen with a critical ear and actually validate it against how they act, as opposed to what they say they will do.

Advice for Aspiring Product Leaders and Founders

Kayla: On the subject of listening, for our listeners, I would love for you to share one piece of advice. You can take this one of two ways: You could share a piece of advice to someone who's founding a company with a product perspective, or just a piece of advice to an aspiring product leader.

Roman: That's a good question. I think that I'll try to give advice that applies to both, or maybe two pieces of advice.

On one hand, for people who are aspiring product leaders, I would say learn as much as you can, and especially early on in your career, learn from other people. Take the opportunity to understand how different departments work. Learn how finance works, learn how engineering works and how they make decisions, learn how you interact with marketing and how sales works and how decisions in those departments are driven—why they do the things they do.

This will make you a better product manager and a better product leader. The further up the chain of management that you go, the more you're working on the same business problems they are working on but just applying a different solution space to the problem. Product, sales, marketing, etc, they're all working on business problems at the end of the day, but applying different toolkits, tackling different parts of the problem. So after a while, you're working not just on technical expertise of how to write a good PRD or how to do good customer research or how to do analytics, but you're really trying to figure out: How do I fit into the broader view of an organization? When you build something, how do you need to work with sales to get them to sell it, or how do you need to work with marketing to get them to promote it, or how do you operationalize it and scale it for multiple users with engineering?

And then I think that advice is also helpful to anybody starting a company, because when you're starting out, you're taking responsibility for all of those departments. So if you don't understand how marketing and sales works, if you don't understand how finance works, if you don't understand how engineering works, having a great idea is barely getting you to the starting line. Everybody has great ideas. Everybody has hundreds of ideas or opportunities that they think would be great to solve for, but it's really about the validation and subsequent execution that makes a difference.

And to execute, you're really going to have to understand how all of these different pieces of a company come together to make a successful business. Not just a feature or product, but actual sustainable business. An idea isn't a business, a product isn't a business.

So no matter who you are or how you're trying to grow, definitely take the time to learn what drives other departments, how they make decisions, what they do and how they do it and why they do those things. If you decide to stay within product management, you'll be a better leader for it. You'll understand how the things you do fit in with other skills sets in your company, and what pressures organizationally you have to help solve for. And if you do choose to eventually start your own company, well, you'll be armed with all that extra knowledge and skills that'll get you two, three, or four steps ahead of everybody else.

Kayla: Thank you, Roman!

]]>
<![CDATA[A Year in Reading, 2022]]>I've tracking books that I've read since 2013 or so, and writing about it since 2020 (2020, 2021). This year, I read 30 books, graphic novels, and related publications.


Top Picks & Honorable Mentions

NonFiction

It's hard to pick a "best of"

]]>
https://sharedphysics.com/books-i-read-in-2022/63b37431daa90982ed424c7eWed, 25 Jan 2023 19:46:50 GMTI've tracking books that I've read since 2013 or so, and writing about it since 2020 (2020, 2021). This year, I read 30 books, graphic novels, and related publications.


Top Picks & Honorable Mentions

NonFiction

It's hard to pick a "best of" list of nonfiction books because most of the books I finished were pretty good. If they weren't, I would have left them half-read!

That said, I found myself repeatedly coming back to the ideas in these books throughout the year. Because of that, I've ended up recommending them over and over to friends and colleagues.

  • F.I.R.E. - How Fast, Inexpensive, Restrained, and Elegant Methods Ignite Innovation by Dan Ward (2014)
    I bought this on a whim with little expectation, and I was wildly surprised at just how readable, practical, and useful it was. Written primarily with government/military-style projects as the primary case studies, but they are relevant to working in any business or product management role.
  • The Outsiders: Eight Unconventional CEOs and Their Radically Rational Blueprint for Success by William Thorndike (2012)
    I picked this up on a recommendation while trying to learn more about finance and cash flow management, and I wasn't disappointed. It was a crash course on how financing and cash flow works and I learned just how central cash flow is to healthy business decision-making.
  • Quit by Annie Duke (2022)
    I enjoyed Annie Duke's "Thinking in Bets" and found myself struggling a lot with "should I quit [thing]"-type questions this year. This book gave me the emotional and cognitive foundation to figure out why I was struggling with the decisions, and the analytical framework for how to actually make a call. Really useful, and lots of great anecdotes and case studies to call back on.
  • Runners Up: Creative Selection by Ken Kocienda, Write Useful Books by Rob Fitzpatrick.

Fiction

I read fiction mostly for entertainment, but every once in a while a story manages to leave an imprint that doesn't go away. I found myself regularly recommending:

  • Tomorrow & Tomorrow & Tomorrow by Gabrielle Zevin (2022)
    Friendship, creativity, and how people grapple with their identities and work, told during the rise of videogames as art. It's immensely readable and a lot of fun.
  • When We Cease to Understand the World by Bejamin Labatut (2020)
    A series of haunting historical vignettes about how science escapes our ability to control and comprehend it. Both about the consequences of scientific pursuit for the sake of knowledge, and about the joy, wonder, beauty, and horror of discovery.
  • The Mountain in the Sea by Ray Naylor (2022)
    A slow and careful story about studying emergent animal behaviors while reflecting on what it means to be conscious and intelligent. Also a deep look into how humanity struggles to interact with things that challenge its monopoly on intelligent thought – whether that be animals, artificial intelligence, or anything else.

Some Reflections

Half-Read Books

I had left my employer to focus on my own businesses in 2022 and expected to have a lot more time to devote to reading and learning this year. In some regards, this was true – I've started, flipped through, or referenced in passing at least 50 different books during the last 12 months. Many of those now sit half-read or dog-eared, waiting for me to get back to them. Many have been technical or business books, which explains why I never finished them... I got what I needed and moved on.

Other books sit half-read because they weren't that great. Maybe it was my head space at the time, or maybe the books themselves weren't quality. I'm never quite sure whether I should add a 1/2 or 3/4ths-completed book that I've stopped and won't complete to my "read" pile. Over the years I've been pretty diligent about finishing books and only adding finished books to my list. But that also means the list itself isn't an accurate reflection of my reading... just my completion rate.

Other times still, I'll read half-way through a book and come back months or years later to finish it. A few of books I've started in years past were "finished" in this year by that count.

All in all, this is a taxonomy problem – how do you categorize things, and what does that categorization reflect for you? What makes one particular system valuable for you, and what does it hide?

Books and Pictures

Another taxonomy problem is on how to "read" and record image-based books. Whereas a "words" books might take hours, days, or weeks to complete, an "image" book takes me take only hours or days. Is a photo retrospective equivalent to a novel or technical manual? Does it need to be?

I try to "read" and record image books the same way as any other books. Maybe because of my art-school-adjacent upbringing, I devote quite a bit of time to looking at and reading each image. I get just as much out of them as I do other books, but other folks might look at my list and wonder if calling a catalogue of national park brochures and design systems therein (Parks) counts as a "book read". It does for me. Plus, it was super useful as a reference for a project I was working on at the time!

However, I don't record movies or TV shows I watched. Why not? Should I? I suppose I've not taken them as seriously as I do books and watch them mostly in the background. But maybe it's worth noting the full "media" diet each year. Should podcasts then also make the list? For some people, I suppose it should. What about magazines? Articles?

What's Valuable is the Specifics

One of the things I've found most valuable in my reading was specificity. Books that are more abstract (such as Dalio's Principles or philosophical tomes) have not been useful and I've struggled to apply the lessons from them. But reading through case study-oriented books has been wildly useful, especially when the cases are written with a lot of specific and operational details. The Outsiders was a good example of this, as was Ken Kocienda's Creative Selection. Cedric Chen's recent case studies collections (which were probably book-length but didn't make it in to the list) were another example.

I find myself drawing multiple lessons from each case, including operational lessons that the author may not have intentionally been meaning to convey. "Focus on the Cases" by Cedric Chen and "Reality Has a Surprising Amount of Detail" by John Salvatier do a much better job than I do explaining why this is so.

Lindy Effect vs. Recency Bias

In recent years, I've focused on buying less books and reading more of what I already had. It paid off this year as most of my books were catch-ups on previous years' worth of acquisitions. I try to track this with the publication year as a proxy for my acquisitions (though I might also start adding acquired, started, and finished dates to the ex libris pages).

While I'm a firm believer in the Lindy Effect, I've noticed that I have a bias towards newer books, especially with technical books. To that end, only five books (13%) were not from the last decade, and only one (Douglas Adams, much overdue) was from before the year 2000.

When dealing with the specifics of technology, most books end up being out of date quite quickly. On one hand, this is the perfect example of the aforementioned Lindy effect – books that have aged well continue to be useful long into the future. It's worth seeking those out because "things that have stood the test of time" are also often the things worth studying – or as Jeff Bezos/Amazon says, "focus on things that don't change".

Yet too much of business operations and technology have changed over the past twenty years, making some business "classics" appear quite dated or no longer useful. This is something I picked up on last year, and continue to find true:

I found that overly technical guides might be referencing long-unused frameworks and paradigms, while business books might be holding up as a case study a business that has aged extremely poorly. Branding agency Red Antler's Emily Hayworth alluded to the speed with which her examples might fade out of relevancy in the first pages of her book – 'Some of these brands might be under fire or pariahs due to missteps of their founders by the time you read this' (and it was true - Away Luggage was under fire while other startup brands had faded from relevance).

That makes me question the core principles that are being communicated. After all, why trust the premise of building towards longevity if the examples you use don't last more than a year's worth of time?

Even older 'classics' such as "Built to Last", "Good to Great", "Positioning", "22 Laws of Marketing", and others of that vein have not aged particularly well, and I found myself putting them down with some disappointment as to the value of insights they purported to offer. Part of this is natural – it's a process of cultural reflexivity and raising standards. [In business,] Michael Mauboussin calls this the "paradox of skill": "As people become better at an activity, the difference between the best and the average and the best and the worst becomes much narrower. As people become more skillful, luck becomes more important."

Business and technology are reflexive in that an innovative practice today becomes a best practice tomorrow, and a baseline the week afterwards. The baseline continues to rise and new practices are introduced. When you're working with systems – business, technology, or otherwise – you're finding ways to work within and around that system. Those systems eventually catch up to you – that's the reflexivity – and you need to come up with new strategies to achieve results.

So one of the things I've sought out – in conjunction with aforementioned specificity – is replicability. One of the reasons why The Outsiders was such an interesting read is because it focused on business operators that have repeatedly delivered high-performing results across a number of different industries and market conditions.

I want to unpack that: what I mean is that most case studies tend to have a bias towards market exceptions. These books/case studies are written about as if they were standard-bearers, when instead they tend to be exceptions to the rule. This is a backwards way of going about things. An exceptional case is one that has succeeded in spite of its circumstances and it is extremely hard to untangle what was causal, what was correlational, and what was luck. Exceptional cases are often not reproducible and so there are few "useful" lessons to take away from that.

As someone wiser than me remarked, focusing on exceptional cases is like asking a lottery winner to tell you what their lucky number was and how they walked to the bodega to buy a ticket. That's great for them and might make for a good story, but you don't really learn anything useful from that. What's more useful is to focus on why almost all people win very little at the lottery (that's the group you're in, statistically speaking), and what people who win multiple times understand about the lottery that other people don't seem to.

So rather than seeking out exceptional cases, I'm more interested in understanding reproducible results. Most of the time this manifests as case studies of failure – why most businesses fail to find product market fit, or the cause behind why so many companies are firing folks today. The lessons there are often what not to do, which is still valuable; as Charlie Munger remarked, “It is remarkable how much long-term advantage people like us have gotten by trying to be consistently not stupid, instead of trying to be very intelligent.”

But every so often – and again, this is where The Outsiders and Creative Selection (or Legacy about the All-Blacks, or Creativity Inc about Pixar) are worthwhile – you get case studies of success that were reproduced over different periods with different people and different conditions. This tends to isolate the consistent variables across those periods, without conflating situational idiosyncrasies to a level of causality. But I'm getting off on a tangent here.

Anyway, the Lindy-vs-Recency issue is also true for speculative fiction and science fiction, which I'm an avid reader of. It's worth teasing out that a lot of "classics" can be split into historically important novels (valuable because they represent a clear paradigm shift or before/after in how/what stories are told) and others are great stories that have stood the test of time (often because they focus less on technology and more about how technology or some other macguffin changes how people relate to each other). I find most of the historically important novels to not age too well, and love the timeless tales.

But with Sci-Fi/Spec-Fi, I also find that the recency bias – books that are contemporary, fun, but might not be groundbreaking – are still valuable to read because they tend to be mirrors to today's preoccupations and concerns. Similarly, most of today's business books are a reflection of today's business landscape and concerns. Both will likely go out of date quite quickly but that's not a complete-enough deterrent from always staying away from them, especially when they may have interesting perpectives and reveal your own blind spots about the present and near future.

A Diversity of Perspectives

I've tried to be intentional the last few years about picking up books from perspectives unlike my own – non-white/male/European perspectives. Here's last year:

Thirty percent of books I read were by not by men, and only thirteen percent were by nonwhite authors. [In 2020], only six percent were not by men, and twelve percent by nonwhite authors. It's progress, but it's not great.

This year was roughly the same as last year. Eight out of thirty books (25%) were by women, and six (20%) were by non-white authors. However, books by non-white/non-male authors were in my top 3 best books in both fiction and nonfiction.

I take full accountability for the books selections I make. Still, there remain two challenges. First is that I pick books up primarily because of the content and not the author. I also don't know the author's background beyond what they choose to present and make a part of their explicit biography.

Secondly, part of the challenge seems to be in structural inequalities in publishing – the choices we make are constrained by the available selection. For many technical subjects (especially in business, computer sciences, and older Lindy-validated books), this means a bias towards white and male authors.

While it's immediately clear the benefit of having a diverse perspective brings to things such as social sciences or narrative fiction, I've heard some hang ups from folks about more technical work. If you're writing a guide to managing python-based deployments of data models, does it really matter who is doing the telling?

The answer is yes, for many different reasons. I can't offer a comprehensive list, but the benefits I've seen from having diverse perspectives on even technical subjects include:

  • Clearly identifying ethical and moral concerns that are implicit in the work being done, and strategies for mitigation.
  • A plurality of examples for how to relate to both abstract and technical concepts, which resonate differently across populations. These semantic differences manifest in increasing accessibility and comprehensibility.
  • It introduces a diversity of teaching techniques and practical examples to work through. I find practical examples to be my own preferred way of learning, and I also know that a good study-project can spur my imagination to other applications within my own life. Good practice cases offer immediate and practical benefits to folks.
  • Given that most work is done in collaboration with other people, strategies for relating and working with other people/teams/contributors are wildly different. Groups of people who experience structural inequalities have wildly different techniques for negotiating those spaces compared to people in an in-group.
  • It exposes biases that you may not have known existed – the "unknown unknowns".

How true is this in practice? I think that Marianne Bellotti's "Kill it With Fire" (on managing aging computer systems) is wildly different to somewhat comparable/overlapping dev-ops and change management books such as "The Phoenix Project" (Gene Kim) or "An Elegant Puzzle" by Will Larson (again – where those books overlap topically). It's almost incomparably different from anything that the HBR may have written on the topic as well. I believe this is partially because Marianne Bellotti is really good at what she does and partially because her background as a woman in tech leadership means that she doesn't have all of the affordances that a more commonly represented background might experience, and therefore needs different strategies to approach her subject matter.

Similarly, in Mismatch, Kat Holmes writes about inclusion and how most accessible design ends up benefiting a much broader population than what they intended to serve for. Sidewalk cutaways, keyboards, TV captions, and many other innovations are the direct result of designing for inclusivity and accessibility, which is another way of saying that it's design for people outside of the most-commonly-represented-stereotypes. So again – diverse backgrounds benefit everyone.

I also look at Cedric Chen's Commoncog blog and know for a fact that we all benefit from his first-hand experience managing software projects in Southeast Asia even if at first glance the specific challenges are not the same as where I work. Simply put, the world gets richer from a plurality of examples and I'm glad to not reread the same three case studies of US-based venture companies over and over when people write about growth in software.

So looking into 2023, I want to continue being more intentional with my selections, and I'm going to try to make an effort to feature such perspectives. Please reach out if you're interested!


And Now, the Complete List

(In no particular order)

Narrative & Technical Non-Fiction

Tape Sucks by Frank Slootman (2011)

F.I.R.E. - How Fast, Inexpensive, Restrained, and Elegant Methods Ignite Innovation by Dan Ward (2014)

The Principles Sequence by Cedric Chen (2019)

Burnout: A Guide by Cedric Chen (2022)

The Founders by Jimmy Soni (2022)

The Man Who Solved The Market: How Jim Simons Launched the Quant Revolution by Gregory Zuckerman (2019)

Write Useful Books by Rob Fitzgerald (2021)

Analogia: Emergence of Technology Outside of Programmable Control by George Dyson (2020)

The Outsiders: Eight Unconventional CEOs and Their Radically Rational Blueprint for Success by William Thorndike (2012)

How to Do Nothing by Jenny Odell (2019)

The Toaster Project by Thomas Thwaites (2011)

Philosophy for Polar Explorers by Erling Kagge (2006)

Quit by Annie Duke (2022)

Creative Selection by Ken Kocienda (2018)

Fiction

The Murderbot Diaries, Books 1-5 (Artificial Condition, Rogue Protocol, Exit Strategy, Networks Effects, Fugitive Telemetry) by Martha Wells (2018-2020)

The Mountain in the Sea by Ray Naylor (2022)

Hummingbird Salamander by Jeff Vandermeer (2020)

When We Cease to Understand the World by Bejamin Labatut (2020)

Valuable Humans in Transit (and other stories) by qntm (2006-2022)

The Hitchhiker's Guide to the Galaxy by Douglas Adamas (1979)

Tomorrow & Tomorrow & Tomorrow by Gabrielle Zevin (2022)

The Lying Life of Adults by Elena Ferrante (2019)

Art, Design, and Image Mediums

Abandoned Moments by Ed Kashi (2021)

Parks by Standards Manual (2018)

A Man & His Cat, Vol 1-3 by Umi Sakurai (2021)

Opus by Satoshi Kon (2014)

]]>
<![CDATA[What Should Be on the Roadmap?]]>https://sharedphysics.com/what-should-be-on-the-roadmap/63ab23a1daa90982ed4247c1Tue, 27 Dec 2022 20:38:48 GMT(A Deep Dive on Balancing Risks, Costs, and Rewards for Product Managers)

In my last long post, I wrote a lot about unblocking work, fixing prioritization problems, and getting things done. But I didn’t address the question how to figure out what work is worth doing, what should be on your roadmap, and how these things should make it onto the roadmap.

Figuring out what to work on is about unpacking what the heck ROI is, how to figure it out, and how to actually get a ‘return on investment’. That’s the work product management should be doing, but it’s also broadly applicable to anyone who needs to make effective roadmaps and business decisions... regardless of role or department.

At the end of the day, choosing what to work on is one of the most important things a leader should do. And that decision is all about the balance between three things: costs, risks, and rewards.


Table of Contents


One quick thing!
This is a long post... it's roughly 11,300 words! To make it easier to read/annotate, I've also turned it into an epub ebook and PDF. If you prefer to read it that way (or you found this helpful and want to leave a "tip"), you can buy the ebook version for $1.99. This helps offset the cost of writing and editing. Thank you!

1. Introduction: Strong roadmaps are about opportunities, not solutions

A roadmap is a deeply political thing.

A roadmap represents the hopes and dreams of a company, what it expects the future to look like, how it plans on getting there, and what risks it is willing to take.

Because of this, roadmaps inspire debate from every corner of the company. Product and engineering teams, investors, marketers, customer success, sales teams, and every IC, manager, director, and executive has a perspective on what to do and why.

Good roadmaps find ways to strike a balance. They do this by focusing not on solutions, features or tactics, but on opportunities and problems to solve.

This might sound a little counterintuitive at first. Most public-facing “roadmaps” are actually upcoming feature lists, such as “added support for Bluetooth” or “ability to edit content”. Those are fine when you’re telegraphing to the general public what they should be expecting in the near future. In fact, that’s a great launch strategy. But it’s not a great business roadmap.

Consider the following examples of roadmap items for a hypothetical chat app:

  • A: “We need to build an integration to WhatsApp”
  • B: “We need to launch a WhatsApp integration to Increase engagement and chat volume in Latin American markets.”
  • C: “We need to increase engagement and chat volume in Latin America and deflect emerging competition for our market share. If we create 10-25% more chat volume for customers or reach feature parity with competitors within the next 3 months (when contracts renew), then we won’t lose existing customers and will gain $X new revenue through renewed contracts. We’ll start by launching a WhatsApp integration targeting customers underserved through other channels (15 million people in the current market).”

What’s the difference between these?

In the A example, you have a task removed from any context. Why are you building that thing? How do you know if it made any difference at all? Has it solved any problem? Do you even know what problem you’re solving?

In the B example, you begin to have context. You now know the goals and target market. You’re still building the same thing, but now you know how it fits into the bigger picture. You also have some more idea about what the feature needs to do — for example, it needs Spanish-language support, and to be successful it needs to increase certain metrics.

In the C example, you skip the task entirely in favor of focusing on the business problem, with specificity. You’re no longer guessing at what success looks like. You begin to understand that the problem is complex and there are a few different options for solving it. You have a relevant timeline. All of this begins to inform a tactical plan for what you can do: you’ll start with a WhatsApp integration because it’s a quick win and easy to launch, then add in additional features based on interviews with the customers you service.


Most people end up confusing tactical plans and to-do lists with business opportunities. This leads to a focus on building things rather than solving problems (what Melissa Perri calls “the build trap”). If you flip that and focus instead on opportunities, you gain flexibility and impact. You also gain the ability to draw a clear line between costs, risk, and rewards of pursuing any opportunity.

Moreover, costs, risk, and rewards are not nebulous, hand-wavey concepts that you can guesstimate through. They’re _specific_ factors that you can play with to find the best balance and thus the best business case for why you should work on one thing rather than another.

The other interesting thing that happens is that what you build ends up mattering less than the process of how it gets on the roadmap:

  • A good roadmap is about problems and opportunities, not about solutions.
  • Because any problem can be solved in many different ways, this gives you flexibility to try out different solutions instead of fighting over or getting stuck any single idea.
  • Every opportunity is a balancing act between costs, potential rewards, and the risks/probabilities encountered along the way.
  • Teams that are are able to go through the process of identifying and testing opportunities quickly tend to get to quantifiably better ideas and execute those ideas quicker. They are able to keep costs low by abandoning things that aren’t working, keep risks in check by creating effective feedback mechanisms, and keep the rewards coming by focusing on the goals and intents rather than specific tasks.
  • So what you end up choosing to do matters less than having a system for moving through a loop of testing different opportunities and having the right feedback mechanisms to adjust your execution.


Why write this and who is it for?

This is for Product Managers who are doing the wrong jobs

Across companies, I’ve found too many product managers that are focused on doing the wrong job. There’s an awful anti-pattern of people calling product managers the “mini-CEOs” and having them end up acting as tiny dictators: build this, build that. You end up with legions of product managers pretending to be designers and software engineers, busy writing requirements for what the product should and should not do, how it should and should not look and feel.

When product managers become feature dictators, everyone loses. Designers become pixel pushers, engineers turn into typists, and marketing and sales folks feel like they’re walled off from the “technology” side of the house. And product managers end up bringing nothing unique to the table, just mimicry of the work of other departments.

If you flip that and have product management focus on opportunity validation, you give autonomy for designers to flex their understanding of human interactions and service design, and for engineers to give proper guidance (and forward planning) on what to build and how to build it, including what’s simple, what’s complicated, and how to best create something given the rest of the software architecture. It also brings the important customer-facing voices of sales, marketing, and customer service into the conversation and makes them partners in the success or failure of every effort.

This is also because most advice online is incomplete

The other challenge is that there exists a huge body of advice around everything I’m writing about, but most of that advice is piecemeal, incomplete, or contextless. That means that most of the advice is hard to practice (or won’t get you the results you want).

Without the proper context, you’re more likely get a mismatch between a problem and what you’re implementing. Advice mismatch end up sounding like “this is what we did at my last company” or “I tried that earlier and it didn’t work as promised” or “but this best practice has been recommended by [insert company name] and they really know what they’re talking about”.

My experience has been that figuring out the right problem is 90% of the work. When you identify and define the problem correctly, the solution often becomes obvious.

So this guide — I’m not sure what else to call it — is designed to systematically approach the problem of navigating costs-risks-rewards when figuring out what to work on.

By teasing those things apart, it’s easy to see how different risks require different risk-management strategies, how costs run up and how to keep them down, how to create feedback loops to identify if you’re moving in the right direction, and how to figure out what is a real reward and what is reward-like but not actually impactful.

Along the way, I’m also going to highlight some stories and tactics that have worked for me to solve those problems. I’ll also try to illustrate it with examples (and failures) from the roadmaps I’ve had to put together or worked on or had been handed down to me.

Let’s get started.


2. What Business Decisions Are

Every business decision is a balance between:

  • Risks (probability that things will/won’t go right),
  • Costs (time, money, and effort), and
  • Rewards (the possible outcomes of things going right).

A roadmap is a commitment to a set of business decisions.

Making better decisions is about finding ways to play with those three levers and about  what you can do to skew those options, and what to do when you can’t.

Skewing the options means being creative with your constraints and asking yourself:

  • How can I decrease the risks associated with my projects to make them more likely to succeed?
  • How can I lower the costs to build so that I have more opportunities to keep trying (“shots at goal”)?
  • How can I choose problems where the potential rewards justify the work?

This might sound familiar to some folks: risks, costs, and rewards are really just another way of saying “ROI” or return on investment. But ROI has a popular usage and history and I don’t think its useful to try to co-opt that and create inevitable confusions. But at the end of the day, it’s still about what you need to invest (costs), what your return is expected to be (rewards), and what you need to get from point A to point B (risks).


3. Understanding Rewards

“The purpose of business is to create and keep a customer. " he argued. And: "What does our customer find valuable?" is the most important question companies can ask themselves.”

- The Wisdom of Peter Drucker, from A to Z

If I skip to the punchline for a minute, the reason for existence of any business is to make money. I’m not trying to be calloused or jaded. Businesses are a form of law-described organizations designed to facilitate groups of people working together to generate trade and commerce.

Businesses make money by creating, refining, or arbitraging something of value, something a customer — or a large enough volume of customers — is willing to pay for. Businesses survive and thrive when the amount of money they make exceeds whatever their costs to produce and deliver the products/services are.

(So if you’re interested in running a successful business, these are your two crazy tips: do something people are willing to pay for, and generate more money than you spend.)

It’s important to understand this context — and more specifically, how your business makes money and what your customers find valuable enough to pay for — to figure out how to chase the right rewards.


The first “roadmap” I worked with was a list of tasks and projects to complete. When that list was completed, management added more projects to the list, mostly to make sure we had work to do.

Much later, the first “real” roadmap I worked with was something that was handed down to me. It was a well-meaning spreadsheet with 40 different parameters evaluating all of the projects we had on our plate, plus things different teams wanted to work on in the future. There was another column that calculated expected value based on weighing certain fields. It was all pretty subjective, and it was still just a project list at the end of the day.

To use the “cart before the horse” metaphor, starting with projects and features is like trying to build a cart before you know where you’re going. Maybe you don’t need a cart to get there. Maybe a horse is good enough. Maybe you need a sled and pack of huskies. Maybe you need a boat. So where are you going and why?

But, I inherited that roadmap and it was as good a starting point as any. After a lot of discussion and weigh-in from different department heads, we ended up narrowing down 40 different parameters to eight fields, then five, and eventually three fields that really mattered.

These three reasons (which I ended up calling “Rewards” for reasons that will become clear later) have been applicable across every business I’ve worked at since. They are:

  • Reward 1: Expand existing business (increase existing revenue)
  • Reward 2: Enter new markets (create new opportunities/new business lines)
  • Reward 3: Increase profit margins (Increase profit margins/decrease costs while keeping revenue stable)

Everything ultimately ties back to these rewards/reasons. They’re the intent of the work — the “why” of what you’re doing.

By starting with the opportunity rather than with a solution, you can be nimble with figuring out what to do to reach that intent. This allows you to focus on outcomes rather than getting locked into specific tasks that might no longer be relevant if circumstances change.

However, this is easier said than done. Most people think in solutions-oriented ways. They skip over defining the opportunity and get straight to a proposal. If you let them, they’ll follow up with two or three more proposals. When someone comes to you with a brilliant idea that you should work on, it often takes a couple of questions to get to why they think the idea is so brilliant and what problem it solves (if any... too many companies today seem to be solutions in search of a problem). Only then, when you’ve gotten to the “but what’s the problem” stage, can you have a real conversation about if that’s a problem worth solving, and if there are other possible solutions to that problem.

That 40-parametered roadmap I inherited was an example of this: it was a series of solutions the team had backed into, with a bunch of justifications for why we should build them. When we investigated further, many solutions were different ways of solving the same two or three opportunities we had. Once we rewrote many of the projects to focus instead on the opportunity rather than on the specific solution, we were able to consolidate different efforts across the company and give autonomy back to the executing team to iterate to the right solution by balancing costs, rewards, and risks.

So let’s unpack those rewards:

Reward 1: Expanding existing business (increase current revenue)

Expanding existing business is exactly what it sounds like. You already have some customers and sales channels, and your goal is to do more of that, to generate more revenue there.

At Dragon Blood Balm (a niche consumer product goods company), this meant focusing on increasing average order size (often by playing with price and promotions), increasing the number of orders per customer (through marketing and communications), and increasing the spend and subsequent reach/conversions of our advertising.

When I was at a B2B enterprise healthtech company, the types of projects that fell into this bucket included increasing our utilization volume (for cost-per-service activities), creating new capabilities that could be upsold to existing customers, and selling our existing platform to new clients.

However, increasing existing business often has an upper limit. Within the company’s area of healthtech, there were 300 or so major insurance companies that were viable sales targets. If insurance companies were our only business, then we would eventually hit a point of full market capture and no more growth might be possible. If we tried to maximize revenue capture by bundling or increasing prices for any single insurance client, they would get wise and eventually try to find unbundled and cheaper alternatives. Same for Dragon Blood Balm: there was a ceiling on how much we could increase the average order size and frequency given the nature of our products and pricing. As we approached those points, squeezing out further growth had a higher and higher cost to us with less and less efficacy.

That’s why you need to also think about Reward 2, entering new markets.

Reward 2: Entering new markets

If increasing existing business is often a “do more, do better” of what you’re currently doing, entering new markets often requires doing things differently — including leveraging different people, processes, and products.

Because of this, it’s often harder to create new channels for growth than it is to make existing channels more valuable.

To continue with the two examples, at Dragon Blood Balm we knew that our existing product worked and had an audience. New business lines meant launching new products & new sales channels. New products allowed us to reach new people with new uses cases, but it required R&D efforts and changing our production process to accommodate different formulas. New channels such as wholesale (we were primarily DTC at that point) meant new (and potentially lucrative and recurring) revenues streams, as well as stability of order volume… but the tradeoff was that we needed to bring on and train a sales team to generate that business, we needed to invest in new packaging that would work for retail stores, and directing our marketing spend towards supporting retail partner success.

At our health tech company, this meant discussions about taking our product from the B2B insurance space to the DTC space, or expanding to other industries such as Pharma or hospitals. Each one would require a change in sales strategy, new product capabilities to support different workflows/integrations, and a change in collateral and ROI models we were pitching.

All to say, you often can’t fully take advantage of the scaffolding you’ve already built up when creating new lines of business. However, the upside is a much larger potential market and increased revenue opportunities, especially if you’re starting to hit the upper limits or diminishing returns on effort for your existing market.

For mature businesses, this is a logical next step in growth. But for new businesses, this is what finding product-market fit is all about.

Reward 3: Optimizing Profit Margins

Operational costs are just as important as overall revenue growth. If you have a stable line of business, it makes sense to think about how to optimize operations so that you can decrease your cost of doing business. You can then capture this difference as increased profit margins or pass along the savings to customers to incentivize further growth.

To stick with Dragon Blood Balm, our optimization came primarily from taking advantage of scale effects: as our volumes of sales increased, we were able to order ingredients and materials in higher volumes and thus at a lower cost per unit. Our cost of production went down, which allowed us to invest in more marketing and advertising to increase sales, which in turn helped increase volume and decrease costs.

But optimization has three key drawbacks:

  • It Creates Lock-In
    Firstly, most optimization comes from a lock-in to some way of doing things, which can prevent you from being flexible in responding to changing needs and conditions. A good example is an assembly line: while an assembly line can increase throughput and decrease costs, it locks you in to doing the same thing over and over. A bespoke manufacturing process on the other hand might be more expensive and slower, but can allow you to create many different variations of a product to identify the right one to optimize for. Therefore, beware of premature optimization if you need flexibility.
  • It Needs Scale
    Secondly, optimization itself is a reward only at scale. For example, if you have a SaaS service that engages a million customers per day and captures $50 from each of them, a 1% increase in conversions is an extra 10,000 customers and $500,000 in revenue. If you service 100 customers per day, a 1% increase is 1 customer and $50 reward, effectively a rounding error on any day. In the first case, a 1% increase can allow you to hire a few new team members, while the second case would allow you to buy coffee for a week, assuming you’re not ordering anything large or fancy.
  • It Has Strict Lower Bounds & Diminishing Returns
    Thirdly, just like ‘expanding current businesses’ has upper bounds (you can’t capture more than 100% of your market before you need to expand), optimization has lower bounds: you can’t optimize to less than zero, and often hit diminishing returns vis-a-vis effort put in.

Meanwhile at the health tech company, one such misguided optimization effort was around our cost of delivering care services. We estimated that we could save at least 20 minutes per interaction through automation, and another 20 minutes of back office work through integrations with medical records systems. This would ultimately result in each individual on the care team being almost 3x as efficient. However, without increasing the volume of interactions we had, our efficiency gains would have been wasted. Moreover, the cost of building out that automation and integration was much higher than the efficiency gains we would have realized at that scale.

Other kinds of rewards

There are, of course, many reasons to do something. Any sufficiently large company might find itself with projects and goals around:

  • Virtue signaling or goodwill
  • Compliance and certification
  • Things that are good for team morale
  • Increasing customer satisfaction
  • Protecting market share
  • … and many others.

However, these often tie back to one of the three rewards we’ve already looked at: virtue signaling can help offset bad press and increase brand affinity and differentiation (which translates into revenue or price justification). Compliance and certification is often the cost of doing business with certain customers. Doing things for team morale is important if there is a morale problem that threatens business operations and thus is a stop-loss strategy. Increasing customer satisfaction is a proxy for customer retention and word-of-mouth growth. And protecting market share is important but it can turn into a process of mirroring your competition without understanding if those things add value or just sound great in the competition’s marketing. Moreover, losing market share is often a indicator that you’re not solving the right problems (or in the right way) for customers. Doing any of those things without proper justification turns them into cost centers.

Beware of things that look like rewards but aren’t

“The great public sector mistake is to judge policies and programs by their intentions rather than their results.”

- Milton Friedman

It’s really easy to confuse results for rewards.

Results are outcomes of your actions. Results show that something happened. But not all results translate into business rewards. Increasing click-throughs, improving engagement metrics, decreasing process running time, and increasing booked meetings are quantifiable results, but if you can’t draw a line between that and business rewards, there’s a high possibility that what you’re doing might feel valuable but actually isn’t.

This is important to understand this especially in context of all the layoffs happening in the tech world: the people, roles, and projects that survive (and get rewarded) are the ones that have a clear line between the work they do and the value they create from that work.

If this scares you, good! It should be scary. This is what accountability looks like.

Rewards are concrete and specific

"Consider the scenario. Two people have imagined two cute puppies. I assert mine is cuter. What do we do know? Do we have a cuteness argument? How can we? We have nothing to go on. The scenario is ridiculous. There's no way to resolve this conflict. Without a concrete and specific example of a cute puppy, there's no way to make progress."
- Ken Kocienda, "Creative Selection"

There’s a huge difference between “we should build Feature X because customers want it” and “we should build Feature X because its a cost-effective and low-risk solution to a problem that four clients are willing to each sign a $2 million contract to deploy, because it solves a costly headache for them”.

Bad rewards are generic and unclear. Good rewards are concrete and specific.

By being concrete and specific, you allow opportunities to be compared and pressure tested. You also create clear criteria for matching one proposal against another.

Concrete and specific success criteria naturally generate their own benchmarks to check progress against.

A good description of a reward should be able to provide dollar amounts, user justifications, and timelines. It answers the question of:

IF we solve [a] problem for [b] customers,
THEN we will see [c] rewards,
BECAUSE of [d] reasons.
We can approach this problem with [e, f, g] potential solutions.

... and on the feature level:

I BELIEVE THAT [e/f/g feature description]
WILL [do something]
FOR [who is this for?]
BECAUSE [reasons].

When rewards are not concrete and specific, that’s usually an indicator of hidden risk and uncertainty. It usually means that a lot of guesswork and assumptions were used (if there was any justification at all). It’s hard to compare abstract rewards and harder to know if you’ve succeeded in reaching them.

Which brings us to the next section, Understanding Risk.


4. Understanding Risk

“It is remarkable how much long-term advantage people like us have gotten by trying to be consistently not stupid, instead of trying to be very intelligent.”
- Charlie Munger
“Many people who appear to be famous risk takers are actually experts on capping the downside and imagining worst case scenarios.“
- Tim Ferris

Nothing is risk free. All roadmap decisions are decisions made about the future, and that is yet unwritten, unhappened. So the question is, how much risk are you willing to take on, and what can you do to decrease the risks you can’t avoid?

While almost all risks are due to not knowing the future, there are different reasons for not knowing what the future will potentially look like. Understanding different types of risks can help you plan for how to address them, and ultimately reduce the size of the risk.

The major types of risk are:

  • Risk due to missing validation
    (Such as missing information, you don’t know something you should know often resulting in picking the wrong problem/solutions)
  • Risk due to execution
    (Such as process-related problems, failure when acting)
  • Risk due to dependencies
    (Such as increasing problem complexity and scope, which means more things need to go right)
  • Risk due to probability & lack of control
    (Such as when things are outside of your control and thus ultimately probability-based)
  • Risk due to change
    (Such as black swans, paradigm shifts, and more mundane forms of changing requirements)

Risk due to missing validation

Risk due to missing validation is when you don’t know something that you should know. Most of the time, this manifests as “we don’t know if this is a good idea or not”, or as a lack of specificity/details about the rewards.

I've adapted this with some modification form Itamar Gilad's excellent "Confidence Meter"

Luckily, this is also the easiest risk to manage:

  • If you’re a new employee and you don’t know enough about the industry or the company, you can talk to internal experts.
  • If you don’t know enough about customer behaviors and needs, you need to find ways to engage with customers. Talk to them, survey them, do over-the-shoulder observational studies. Learn good interviewing techniques that aim to uncover new and valuable information.
  • If you don’t know enough about the market needs (people who are not yet customers), then you need to find ways to engage with those people. Events, surveys, paid consultations, cold outreach, and sales/presales tactics are great ways to engage and get validation.
  • If you lack specificity, you need to drill down into why you lack that that. More often than not, it means that you haven’t spoken to enough customers or potential customers to be able to able to confidently propose a range of realistic numbers that you could measure against in the future.

One of the ways we evaluated knowledge risks with my team was to ask why someone was confident in their proposal and details. The level of validation someone could provide could easily be mapped to the chart above and provided a clear direction for where more validation was needed.

You don’t need to be able to predict the future to act, but you do need to have some good reasons for thinking a specific future is likelier than another future. The less you know, the more you expose yourself to uncertainty about the outcomes. Thus the work here is to reduce the amount of uncertainty — and hence the amount of risk — as much as possible by interacting with customers and potential customers.

The consequences of risk due to validation is usually picking the wrong problem or picking the wrong solution to the problem.

Picking the wrong problem looks like work that isn’t actually valuable, or working on problems that aren’t properly validated. It often happens when a market/opportunity is properly sized but isn’t validated against customer needs. The most immediately familiar example is Meta’s quest to “capture” the VR market early (and hence reap all of the benefits of owning their own platform). While this makes strategic sense, the problem is that VR doesn’t seem to be a problem any customer needs solved. There is not yet a great use case for VR, and Meta’s push into business applications such as Horizon Worlds flies in the face of their own logic for asking people to come back to the office for in-person collaboration.

Not a real problem
"As a user, I want to use two joysticks to type on a virtual keyboard while looking at a screen inside of a heavy headset, so that I can be tracked and monetized in Meta's quest to capture and own a hardware space." ... said no one, ever.
https://mobile.twitter.com/carnage4life/status/1595133298401218561

Picking the wrong solution is when you identify the right problem, but you solve for it in the wrong way. To give a closer-to-home example, I fell flat on my face one time when I identified the right problem (increasing utilization) but proposed a solution that was a non-starter for the client (push notifications and marketing) because it ran afoul of regulatory rules against incentivizing healthcare services. Oops!

For any problem/opportunity, there are near-infinite possible solutions. Picking the wrong one happens when you cannot gain more information to increase the confidence between options.

The solution to risks due to ignorance is usually to speak with customers or other stakeholders and learn more about them, then to validate the solutions with iterative development and tight feedback loops. While it won’t help you choose the right solution, an iterative approach will ensure that you don’t overcommit to doing the wrong thing for too long.

Risk due to execution

As they say in Hollywood, “some ideas are execution dependent”. It means that the key success factor is how well a something is made. This makes sense in the movies — most ideas are narrative tropes that have been told hundreds of times. So the success or failure is dependent on how good that version  of the telling is. Hence, it is execution dependent.

This is true in business as well. David Heinemeier Hansson from Basecamp makes the point that Basecamp is essentially a to-do list, which has been done hundreds of times by hundreds of businesses. "The vast majority of businesses succeed or fail on the basis of their execution and their timing." So if Basecamp succeeds, it is because they execute well.

And while you can fail because you chose the wrong problem or proposed the wrong solution, because most businesses aren’t making extremely novel and innovative solutions (despite what they’d like to think). So risk due to execution is the most likely challenge any team will need to deal with.

This is caused by people, processes, lack of follow-through, and incomplete planning.

At my last employer, we had a lot of risk due to execution because we were bad at shipping code. We had too much work in progress, we had teams with too many dependencies to be able to deliver anything on their own, and we shipped things with too many defects to get traction with users and clients — something I wrote extensively about in “When Everything Is Important But Nothing Is Getting Done”.

For others, risk due to execution might mean being slow at iterating through problems or making decisions. It may mean overspending on capabilities or overstaffing/understaffing in proportion to the work at hand. For others still, it might be an inability to actually get work done on anything that isn’t already in progress.

It also means being aware that success does not end with shipping something. In software development, too many product managers limit their attention to the software development process and are happy to completely hand off responsibility for getting their solutions into customers’ hands to the marketing, sales, and customer success teams. This is a mistake. Getting solutions into people’s hands is just as important as making the solution… and it’s also where you learn if the solution is successful or not, and how you get the feedback necessary on if you need to pivot.

I’ve seen more initiatives fail due to this lack of an action plan and rollout effort than I’ve seen in companies shipping defective code. This is because getting products into people’s hands is a hard problem. Remember: if you make something and no one uses it, that’s even worse than not making anything at all because of all the costs that went into it. So if your biggest risk is utilization, dedicate a fair amount of time to accounting for this risk.

Risk due to execution can be measured by the complexity of the problem/solution, the capacity (and available skill sets) to tackle that pattern, and historical patterns of delivery.  Resolving it is about finding opportunities for process improvement, which is what frameworks such as DevOps, SixSigma, Total Quality Management and other operational excellence systems aim to solve for. It’s about how to do work better, rather than how to choose better work.

Risk due to dependencies

A corollary to risk due to execution is risk due to dependencies. Dependencies are like a Rube-Goldberg machine. The more things that are required to go right, the less likely it is that all of them will do so.

This is especially true when a dependency is actually a prerequisite. Annie Duke describes this quite well in her book “Quit”, when she talks about Google X’s Astro Teller and his way of evaluating difficult dependencies. Here’s the part, using an example of how to create a “juggling monkey show”:

Teller recognizes that there are two pieces to becoming successful at this endeavor: training the monkey and building the pedestal. One piece of the puzzle presents a possibly intractable obstacle in the way of success. And the other is building the pedestal. People have been building pedestals since ancient Greece and probably before. Over two-plus millennia, pedestals have been thoroughly figured out. You can buy one at a furniture store or a hardware store, or turn a milk crate upside down. The bottleneck, the hard thing, is training a monkey to juggle flaming torches. The point of this mental model is to remind you that there is no point building the pedestal if you can’t train the monkey.



… In other words, you ought to tackle the hardest part of the problem first. […] You already know you can build the pedestal. The problem is whether you can train the monkey. On top of that, Teller realizes that when you’re building pedestals, you are also accumulating sunk costs that make it hard to quit even as you find out that you may not be able to train the monkey to juggle those torches. By focusing on the monkey first, you naturally reduce the debris you accumulate solving for something that’s, in reality, already solved.

The dependencies problem can be modeled with simple math: let’s say you have a 99% chance of something succeeding, and it has one dependency (also 99% chance of succeeding). Your success rate is 0.99*0.99, which is 0.98, or 98%. But most of us aren’t working with things that have a 99% certainty rate and only a single dependency. Let’s say you have a project that seems 80% certain to succeed, but is dependent on three other things happening as well. One of those things is a 50% chance, another is 99%, and another is 80%. The likelihood of you succeeding is not 80%, but 31.7%. Even though every part seems likely, the final result is anything but. So if you have a dependency that is extremely high risk, it could tank your entire effort. You should focus on fixing that first, before embarking on anything else.

Which brings us to….

Risk due to probability

Risk due to probability is when things are ultimately outside of your control. You might select the right problems, the right solutions, execute them well... but still fail.

For example, if you’re working on a software problem, you may have full control over the outcomes. If you write a new script or optimize the existing architecture, you may see a direct result in say, higher throughput or processing speed. Conversely, if you’re working on a sales pitch, you might do ‘all the right things’ and still find that the deal is lost because you’re not the one making the buying decision.

Of all of the risks, probability is the only “formal” risk in the sense that even with perfect information, you still don’t know the outcome because that outcome is outside of your control.

Managing this kind of risk is about having fallback plans. Because you don’t have full control over the situation, you need to (a) understand the possible outcomes, their likelihood, and their impact, and (b) what your fallback plan is for the most probable and impactful ones.

Finance and gambling tend to have the most robust management models for hedging against probability-based risks. This includes tactics such as distributing and hedging bets against each other, creating options around black swan-like events, and bet-sizing criteria derived from probable outcomes and the availability of funds. To translate that into business terms, it means tackling multiple problems, having a premade plan for when things go wrong, and figuring out the right balance of cost-to-rewards to pursue based on existing runway and risk profile of the proposal.

Risk due to change

The last risk is risk due to change. Change over time is inevitable. Circumstances, budgets, people, and requirements change. You might learn new information that changes your decision-making calculus. Someone might pressure you to add something to the backlog. Someone might quit. A customer might have their budget slashed.

Each of these things (any many others) is a change in circumstances that you need to consider. By acting in the world and learning from those actions, you have an opportunity to proactively modify your business calculus for the better (including sometimes killing projects and sometimes doubling down on them). Conversely, when the change is out of your control you need to be reactive (and it’s more likely than not to be a setback).

The way to address risk due to change is by increasing speed or scoping projects smaller. This is something I wrote about in “When Everything Is Important”, but it’s worth repeating:

When projects are scoped too large, a few things happen:



(1) They take a long time to deliver, which means that work is locked up for long periods of time without creating value. 



Said inversely, value is created only when someone uses the deliverable. Until something is shipped, zero value is created.



(2) They introduce the possibility — nay, the inevitability — of scope creep. Over time, more needs are discovered, customer preferences evolve, management changes, patterns are discovered, and so forth. The longer a project goes on, the larger the calendar window is for introducing a new request into the pipeline. The more requests are put in, the harder it is to keep saying no without coming off as difficult or uncollaborative or unresponsive to new business needs. 



Said inversely, the shorter the project work window is, the less opportunity surface there is for scope change to be introduced. A two week sprint has a smaller surface than a two month project, or a two year epic. It also means that you can be more responsive to those requests because you’re working in shorter intervals. With the conclusion of each interval, you can change scope without it affecting your existing commitments.



(3) They introduce systemic problems in the form of people leaving (or new management entering), contracts/budgets/business needs changing, and so on. Each one of those represents a potential existential risk to a project’s scope and each one of those becomes much more likely the longer a project goes on (again: a larger calendar surface area).

How much do you need to know to be comfortable with a decision?

When I was a brand-new lieutenant, I asked my father, “How would I know if somebody that I worked for or worked for me was going to be a good commander in combat? ... How would you tell in peacetime?” He says, “You won’t. You won’t know because people have capabilities or coping mechanisms that in peacetime look fine, that doesn’t play well in war.”

Then I asked him, “Okay, when you’re in combat, how do you know?” He said, “Some people keep asking for more information and what they’re trying to do is drive uncertainty to zero so that there’s really not a question on the right course of action because you know everything.” But you can’t do that. It’s not achievable. So they become hesitant. They become tentative, and they become focused on getting more and more information to ratchet the uncertainty out of the situation and they don’t act."

- Ret. Four-Star General Stanley McChrystal

Risk is inherent in everything we do, and can never be fully removed. There are ways evaluate and understand risk, to manage risk, to document and create plans to address risks, but risk itself will always be present. Moreover, risks don’t exist in isolation. Risks overlap and compound. Risk due to not knowing begets bad decisions and hidden dependencies, each of which lowers the probability of things going right.

Some things you have more control over and others you have less control. The work is in figuring out what risks you face to understand what actions you need to take to manage or reduce those risks.

And most importantly, it’s hard to properly quantify most risks. Most of the time, attaching a number is a matter of guess work. It’s possible to have enough data to extrapolate the probabilistic risk of something, but most organizations that I’ve worked with or know about don’t have such clear and structured data over a long enough period of time to generate statistically significant numbers. So most people end up resorting to their intuition to say something along the lines of “we have a 50% risk of things not working out because of the team is stretched thin and priorities change weekly”. What does 50% mean, and how did you come to that number? When I hear that, I am less interested in the probability and more interested in the plan for if that happens. Ok, let’s the team gets stretched thin - what do you do now? Do you give up? What do you change?

Other folks will attach an exponential scale to the risks as a modifier for doing calculations. Itamar Gilad, for example, will use a a log scale of 0.01x for low confidence to 10x for high confidence as a multiplier for measuring projects against each other. My experience has been that this “math” is more often used to create a feeling of certainty than as a useful planning tool. Again, this is because the important thing about risks is not just how risky something is, but what you can do to change that risk profile.

Other times I’ll get asked, “how many people should I talk to before I know I have good information”? There’s never a good answer to this because it’s case by case. If your target audience is a specific customer, you might need to talk to two or three people. If you’re targeting an addressable market of millions, you should probably talk to more. (For what it’s worth, my rule of thumb is to talk to enough people that in your next interview, you pretty much know what the person will say. If you’re learning something new in every interview, then keep learning.)

At some point, you need to become comfortable with the risks to move forward to action; “Wanting more information is often just a form of procrastination” as Russ Roberts once wrote. Or as Cedric Chen wrote, “Action produces information”.

The question is, what is an acceptable amount of risk for you, your project, and your team? And that answer is all about costs and rewards.


5. Understanding Costs

“Time is the fire in which we burn,” says the poet. It is our most inflexible and valuable commodity, the one thing with which you should not be generous. Squander money, you may earn it back. Squander time, it is gone forever. 

- Scott Galloway

Risk is an intermediary between rewards and costs. It’s all of the work that you need to do to ensure that the costs you put in generates the reward you expected.

To continue on the question of “how much information do I need to be comfortable,” this tends to follow a rule of proportionality. Here’s Ben Khun, CTO of Wave, on this proportionality:

The most important thing to remember when sampling from heavy-tailed distributions is that getting lots of samples improves outcomes a ton.



In a light-tailed context—say, picking fruit at the grocery store—it’s fine to look at two or three apples and pick the best-looking one. It would be completely unreasonable to, for example, look through the entire bin of apples for that one apple that’s just a bit better than anything you’ve seen so far.



In a heavy-tailed context, the reverse is true. It would be similarly unreasonable to, say, pick your romantic partner by taking your favorite of the first two or three single people you run into. Every additional sample you draw increases the chance that you get an outlier. So one of the best ways to improve your outcome is to draw as many samples as possible.

If you have high costs, then you need higher certainty and lower risk profile. If your costs are low and/or your impact is small, then you can get away with higher risk profile.

So costs are the last part of the Rewards/Risks/Costs equation and there are four important aspects for evaluating costs:

  • Cost to build
  • Cost to manage
  • Opportunity costs
  • Sunk costs, future costs

Cost to build

When people talk about building software, they often talk about how inexpensive it is (compared to ventures in manufacturing or retail) and how anyone can spin up a website and just get things going. But that's not entirely true. Running and scaling software can be relatively inexpensive, but building it is often very costly.

Software engineers are not cheap. A software engineer’s salary often starts in the six-figure range and most projects are supported by multiple engineers managing very different parts of building and running software. With that in mind, if you have an idea and need someone (let’s say a team of three or four engineers) to build it for you, you can easily be looking at a half-million dollar burn rate per year just from staffing engineering expertise.

This is also true for any proposal where you’re tackling a net-new business gain. As we discussed earlier, if you’re pursing a new line of business as the reward, then you often can’t take advantage of existing scaffolding, staffing, or infrastructure, and need to build up new scaffolding to support that.

This is the “cost to build”. When your cost to build is high and you have a finite amount of money, the consequences of not getting it right are also high. When your cost to build is high, you end up having fewer ‘shots at goal’: less chances to start over or pivot, less experiments you can run, less changes you can make in response to new information.

This cost to build can be estimated as an actual total dollar cost based on the time and number of people it takes to do something, and the associated costs of any supporting infrastructure or process development.

Moreover, the numbers on “cost to build” continue to run up until a project is delivered and in customer hands. Value is created only when someone uses the deliverable, so until something is shipped, zero value is created.

So if your cost to build is high, you need to find ways to lower it.

Decreasing the cost to build is primarily about playing games with scoping. By focusing on delivering to customers more frequently (for example, putting something usable into people’s hands every two weeks), you accomplish three things:

Firstly, you lower the cost per deliverable. In terms of time, effort, and dollar value, a two-week project is less costly than a one-month project, or a three-month project, or a two-year project. This is important because when you’re accounting for the value you created, you want to be able to say that you eventually made more value off a feature/project than it cost you to make it.

Secondly, you put a ceiling on the costs per deliverable. This limits the risk that a project’s costs will balloon and creates natural points for evaluating progress towards your goals. Otherwise, you might find yourself with long-running projects caught in a perpetual development loop than never find themselves ready for people to use them. Business-wise, that describes a money pit with a giant furnace at the bottom.

Thirdly, you create opportunities to learn from real-world customer behaviors and create natural points for evaluation against preset benchmarks. This real world feedback is essential for adjusting your risk/cost/reward calculus.

Ignorance about real cost-to-build is what allows teams to spend months working on small features and incremental improvements that deliver little real world value. This pain is especially acute for early-growth companies that are more dependent on venture funding rather than self-supporting revenue streams.

As an example, I once inherited a team that ultimately took two years (compared to an estimated six months) to deliver an enterprise contract worth $4 million. This team was composed of 12 developers, a project manager, a customer success manager, and two product managers. All together, their average salary was $150k/year. This means that the contract took the team $4.8 million to deliver it (not counting the hours put in by the sales team, legal team, and other supporting teams). At the end of the day, the project was a net negative reward of $800,000. Rather than being the “game changing contract” the company had expected to profit from, it ended up being a massive loss-leader for getting in the door for a large enterprise client. Legality aside, it would have been almost cheaper and definitely quicker to simply pay the client $1m to be their preferred vendor.

Cost to maintain

The flip side of cost to build is the cost to maintain. While cost to build is mostly for net new things, cost to maintain is racked up once something is out in the world.

The cost to maintain is slightly harder to calculate, because it needs to factor in:

  • Continued costs of staffing and infrastructure costs against service delivery
  • Continued management of defects and stability as something scales in use
  • Increasing complexity of code/services which can decrease overall velocity and increase execution risks (such as bugs and defects).

Relying on adding “maintenance and support” tasks to preexisting teams can be dangerous if it ultimately increases their work in progress and slows down their ability to deliver.

Moreover, many maintenance costs continue (and sometimes grow) during the lifespan of a feature or capability. When you build something, you commit to keeping it supporting it until you get rid of it. For all future features, you also now need to consider how other things you might make will interact with this current feature.

A Digression on Technical Debt

Cost to maintain can also manifest as the “cost-of-shortcuts”. In managing cost to build, I referred to playing games with scoping as a way to reduce upfront costs and reduce risks through faster release-and-feedback loops. The value of getting real-world customer feedback on if something is worth doing or not always trumps the desire to write clean and well-architected code, especially if there is no certainty as to if that work creates any value (and might get scrapped).

But many cycles of this can generate significant technical debt which results in accidental complexity and eventually lead to bugs, defects, slower turnaround time, and cumulatively more technical debt. The solution is almost always rework, which can be hard for a team to get buy-in for.

My experience with tackling this technical debt was to rely on the same process of rewards/risks/costs as for any other proposal:

We kept asking "why is this valuable" until you get to the reward questions. It's often a few layers deep. Being unable to quantify the value of rework was often either a signal that the tech debt was an emotional grievance or that engineering leadership had no clear visibility as to the value of the work they did or the real size of the problem. Being unable to quantify the reward of the rework meant that we were unable to prioritize it against any other projects, and risked refactoring projects that took longer than the time they ultimately saved.

Opportunity costs

I’ve briefly alluded to “shots at goal” and “competing proposals from other departments” earlier. Outside of the cost to build and cost to manage is the opportunity cost of any proposal, which is to say — if you do “X”, what are thing things you can’t do anymore?

Opportunity costs take the form of:

  • Budgetary limitations
    
I.e., in marketing if I spend money on one channel, I can’t spend that same money on another channel.
  • Capacity limitations
    
I.e., you can’t keep adding more work onto someone’s plate without their productivity diminishing. People can take on a limited number of projects at a time.
  • Timeline limitations
    I.e., managing time-sensitive projects (such as when someone needs an immediate solution — as was the case during Covid) or limited runway (we have 6 months to make something valuable before we run out of time)

Being clear about opportunity costs helps to tackle the organizational problem of hidden work — such as when the work being done diverges from the work that’s on the roadmap. This can be due to benign reasons (someone helping someone else out) or otherwise (such as hidden work or when legacy projects continue because no one explicitly told them to end). Being clear on what isn’t going to be done gives people and teams permission to free up their capacity and say no to distracting projects, which is a major risk during execution.

Sunk costs, future costs

Lastly, you need to evaluate sunk costs and future costs. This one is tricky for most people.

Here’s Annie Duke in “Quit” again:

Richard Thaler, in 1980, was the first to point to the sunk cost effect as a general phenomenon, describing it as a systematic cognitive error in which people take into account money, time, effort, or any other resources they have previously sunk into an endeavor when making decisions about whether to continue and spend more.



The sunk cost effect causes people to stick in situations that they ought to be quitting. When deciding whether to stick or quit, we are worried that if we walk away, we will have wasted the resources we have spent in the trying. You might be experiencing the sunk cost fallacy if you hear yourself thinking “If I don’t make this work I will have wasted years of my life!” or “We can’t fire her now, she’s been here for decades!” Sunk costs snowball, like a katamari. 



The resources you have already spent make it less likely you will quit, which makes it more likely you will accumulate additional sunk costs, which makes it again less likely you will quit, and so on. The growing debris of your prior commitment makes it increasingly harder to walk away.

The fallacy with sunk costs is that the money is already spent and can’t be rescinded.
Therefore, you need to focus on future costs rather than past (sunk) costs.

As an example, Duke refers to a public works project in California whose costs had ballooned to $10 billion dollars over the initial estimate. While there was plenty of outrage against the price tag, the project continued because “it was a sunk cost” and the only way to justify that cost was to complete the project. But completing the project — given the current status and known cost overruns — would be another $50 billion or so. So the decision about the initial $10 billion is really a question of if an additional $50 billion in future costs would justify that. If the original project wouldn’t have been approved by legislature at $50 billion, it’s cheaper to write off the sunk cost than it is to continue accumulating more future costs.

However, not all sunk costs are bad. A good use of sunk costs is to ask, “we already have x created, can we use it for anything?” To go back to the example of the $4m contract our team delivered at a loss, we were ultimately able to productize many of the customizations of the platform to sell it forward to other enterprise clients with similar needs.


6. Putting it all together: Speed, Cycles, and Safeguards

“The true method of knowledge is experiment.”

- William Blake

At the end of the day, nothing validates an idea better than actually engaging with reality. Again: “action creates information”. What ties together all of the risks, costs, and rewards is the ability to launch and get things out to customers. You need to engage with the real world, and not your idea of the real world.

This should be a “duh” sentiment, but people seem to forget it. More importantly, it’s central to everything we’ve been covering here:

  • Until you’ve shipped something (put something out into the world and into people’s hands), you’ve created no value and you’ve created no opportunity to capture value. You’ve only run up the books on costs to build. And the longer something takes to ship, the more costly it is.
  • The more costly something is, the larger reward it needs to reach in order to justify that cost.
  • Any (and every) idea can be proven, disproven, or modified to be valuable or not, once it’s out there in the world. This real world feedback is the most valuable information you can get.
  • So you’ll make better decisions if you’re able to use the real world as a guide.
  • Which means that the faster you’re able to put things into customer hands, learn from customers and validate your ideas, and react to that, the faster you’ll get to better ideas and ultimately be more successful.

The organizations that succeed are the ones that engage with the real world, listen to its feedback, and react to that… rather than the ones that live inside their own heads and build without engaging with customers.

Which is to say:

  • Doing successful work is a matter of going through cycles of learning-acting-learning-reacting.
  • Companies that are able to go through those cycles with the most speed are the ones that will orient themselves best. This is true even if they start with the worst ideas possible, because the cycle forces you to learn and react. The company that takes six months of iteration to reach a valuable idea and the company that takes six months of research and arguing may find themselves in the same place, but the one that engaged with customers through iteration will have the benefit of additional real-world knowledge, experience, and a habit of getting things done.
  • Both learning and acting need to have safeguards — criteria for determining if you’re on the right or wrong track, and what to do in either case. This prevents you from sticking with bad ideas for too long and ensures that you don’t get stuck in loops of action-without-learning or learning-without-action.

This means that Cycles, Speed, and Safeguards are the second-order impacts of risk, cost, and reward management:

  • Cycles & Risks
    Cycles allow you learn and adapt based on real world feedback. Cycles are a way of collecting more information to adjust your risk profiles.
  • Speed & Costs
    
Speed is the most effective way to keep costs down and decreases the volume of risk you will encounter. Coupled with quick movement through cycles it ensures that anything too stupid can get addressed quickly.
  • Safeguards & Rewards
    Safeguards are generated by rewards and risks. Safeguards derivative of rewards are the KPIs and metrics that let you know if you’re on the right track or not. Safeguards derivative of risks are the “kill” or “adapt” criteria that lets you know if your likelihood of success will be challenged.

... and finally, the roadmap

All of this was around evaluating and understanding a single opportunity. But a roadmap is more than that: it’s a set of opportunities that you commit to working on, sequenced in the order that business capacity will be allocated to support them.

Understanding rewards helps you figure out if something is worth doing or not. Understanding costs helps you understand how much time, money, and people will be needed to get it done, and more importantly what that means you won’t be able to do. Understanding risks helps you validate the opportunity, figure out a tactical rollout plan, measure your progress, and cut your losses when you’re way off track.

Choosing between those — building the roadmap itself — is then the work of politics and negotiation between different stakeholders. Every company differs in its available runway, its appetite for risk, and the potential size of rewards it can tackle. Building a roadmap and getting buy-in is a matter of understanding those preferences at the decision-making level, and negotiating between them.

And as I stated earlier: if done right, those politics shouldn’t be too bad because what is on the roadmap matters less than how it’s on the roadmap. And that “how” is: with safeguards and iteration, focused on the opportunity and not the specific solution. It’s not to say that any “solutioning” should be banished from the roadmap, but that any solution is ultimately a tactic being tried on for size and other solutions may come and go before the opportunity is won or moved on from.

Lastly, it’s worth mentioning that roadmaps are ultimately a document of alignment and persuasion. The level of detail needed to evaluate all the opportunities is likely not the same level of detail that will get presented at the leadership level (opt for the abridged version and keep the other data in your pocket to pull out as needed), and it’s not the same document you present at an all hands versus during a team planning meeting. And while the level of detail may vary, it should always include the parameters of: potential reward, expected costs, and risks along the way. And once you’ve gotten buy-in to the problem/opportunity space, then you can add the tactics and safeguards in as part of your plan of action.


Phew!

That wraps up this post. Having reached this point, I hope that you can systematically understand how different risks, costs, and rewards can have implications on your plan of action, how to build systems for iterating through opportunities, and how those two things can help you identify the right stuff to work on.

In a future post, I’ll try to tackle some of the more practical implications of this — such as how to make useful product requirement documents and how product-engineering-and-design can best collaborate and avoid common anti patterns.

]]>