The Wrong Question Everyone's Asking About AI

Image by Kyle Head at unsplash - https://unsplash.com/@kyleunderscorehead

Last time we talked about the problem facing the C-suite when crafting a credible AI story.

The post gave a simple structure to follow to pursue AI ambitions without falling into the trap of thinking only about AI adoption or being caught on the wrong side of history when the market narrative swings away from AI being magical.

What is a good way to reason about the tools themselves, though?

This is harder than it should be because blog posts, demos, and even talking directly to Claude can easily lead to a distorted perspective on how "intelligent" these intelligent tools are.

We'll cover that today.

An interesting paper came out a few years ago exploring exactly this problem. And it's a serious academic work, not some biased sales pitch from an AI Boomer or an angry Doomer rant.

The paper, "Role-Play with Large Language Models," by Shanahan, McDonell and Reynolds in 2023 (later published in Nature), proposes a metaphor for talking about large language models that works very well for filtering out the chaff.

Let's meet Bob. Bob is a very, very good actor, and he is especially good at improvisation. Bob can play many roles, but today he's playing a software engineer. He's studied projects, all the languages we code in, architecture, testing, deployments, you name it. He is so good that if he turned up to a hackathon, no one would know he wasn't a real developer. Well, maybe. Let's come back to that.

Bob may know a lot of detail, but of course, he can't know it all. So like any good improviser, sometimes, when he hits the edges of his reading, he has to wing it. And that's fine. After all, he's an actor, not a developer. When he wings it, everything remains plausible, and no one can tell. Well, maybe. Let's come back to that.

Before anyone gets a downer on Bob, let's be clear about a few things. He's not expensive. His code works. He never complains. He's fast. Is his code good? Well, a lot of code is pretty bad. Where Bob has read the most, he's on solid ground.

What's important when we compare Bob to real developers is that Bob is always winging it. Because he's not a developer. He didn't struggle through simple projects in his youth or work on a bug for six hours only to solve it the minute he took the dog for a walk. Real developers can feel their edges in ways Bob can't.

Bob has the same confidence whatever he is saying. Developers can feel their confidence rise and fall. And when it falls below a certain point, they can ask for help, or research the problem, or let someone else take over.

Boomers and Doomers will argue about whether models are really intelligent or not, but when we translate this to whether Bob is a real developer or not, you can see why it's the wrong question.

Can Bob code? Yes, he can. Does he get things wrong? Yes, and so do other developers in the room.

Is Bob as good as all the best bits of all the developers in the world? No. No developers are either. It makes no sense to compare anyone to that standard.

So what is the right question?

The right question is simple.

Do we know where Bob's edges are?

No we don't, and neither does Bob.

We need edge-awareness for Bob. We don't need it for real developers because they have other options when they reach them.

What does edge-awareness look like? The other developers in the room and the systems they create to catch their own issues.

How do we know whether we can afford Bob on the team?

Well, we need to know a few things that are quite hard to truly know today. We need to know the all-in cost of Bob plus the residual developer and oversight cost his edges impose. And all those costs need to be less than the value we get from Bob before he's replaced with Alice, another actor who's also very, very good (better than Bob) but who has the same challenges in other areas.

No one is saying Bob is bad. His role-playing is off the charts. But we are not saying he's infallible either.

What the Shanahan, McDonell and Reynolds paper showed was that if we move away from assuming that models have goals, beliefs, understanding, reasoning, etc, but instead see them as role players (in software development, email writing, meeting summarisation), we can see the simulator, not the simulacra.

And it's role play all the way down from the persona to the responses to the words to the tokens.