
Ep. #9, The AI Coding Paradigm Shift with Simon Willison
On episode 9 of High Leverage, Joe Ruscio sits down with Simon Willison to unpack the rapid evolution of AI coding tools and what they mean for software development. They explore the shift from vibe coding to agentic engineering, how coding agents are reshaping workflows, and why experience still matters. The conversation dives into trust, security, and what breaks when code becomes cheap.
Simon Willison is the creator of Datasette, an open-source tool for exploring and publishing data, and is a co-creator of the Django web framework. He has been an influential voice in web development through his blog, simonwillison.net, since 2002. Previously, he co-founded Lanyrd, a Y Combinator-funded company acquired by Eventbrite.
- Simon Willison’s blog
- Simon Willison’s newsletter
- Django web framework
- Stable Diffusion
- OpenAI ChatGPT
- Anthropic (Claude)
- OpenAI Codex
- GitHub Copilot
- Mistral AI
- DeepSeek
- Qwen models (Alibaba)
- Communications of the ACM
- THE PEOPLE DO NOT YEARN FOR AUTOMATION (The Verge)
- Matthew Yglesias Tweet
- Introducing Laguna XS.2 and Laguna M.1 (Poolside)
- DeepSeek V4 Preview Release
transcript
Joe Ruscio: Welcome back, everyone, to another episode of High Leverage. I'm super excited to be joined today by someone who needs no introduction, I believe, but I'll do so anyways. Simon Willison. Thanks, Simon.
Simon Willison: Hey, I'm really excited to talk to you.
Joe: Yeah, I just want to start before we get into it by just saying how much I've appreciated, as I'm sure many of the listeners here have, all the work you've been doing the last two years, providing what I think is a very needed and balanced perspective on what's actually happening in AI from week to week and month to month.
Simon: Yeah. I realized the thing I picked for myself, I decided that my beat in all of this, my journalistic beat, was going to be: What can it do? Because there's so much to talk about around what could it do, and what are people saying, and all of that, and that's fine.
But if you stick to just what actually works today, there's such a rich vein of information to dive into and explain to people and that's just super useful. Like, nobody can complain of being updated on what the new models can actually do right now. So that's what I've been focusing on.
Joe: Right. Which I think really comes through. And yeah, like you're saying the value-- There's just so much and it's understandable. I mean, we're in arguably the largest hype cycle of my career. Which is not short at this point, or it's longer than I care to admit.
But yeah, I think constantly finding yourself usually set between thought leaders who are, "hey, this is some nirvana" or doomerism on the other side--
Simon: There's a position right in the middle which very few people are positioned in, which is surprising to me. Like, the middle is great. Everyone should come and hang out here with me.
Joe: Yes, yes, well, that's definitely our goal is to get more people there. Maybe then, really quick, just share your origin story. How did you come to occupy this position in the middle? You know, what led you to pre-AI and then to there?
Simon: Honestly, all of my influence in the AI industry comes from the fact that I have a blog and I frequently update my blog. Like, basically I'm treating it like it's 2005 back at the peak of tech blogging before everyone moved to short form media and then LinkedIn and TikTok and all of those kinds of things.
So I have a blog and I update my blog with what I'm thinking about and what I'm discovering and what I'm finding out is interesting. And I've been doing that, yes, since 2002, originally for web development and CSS and JavaScript and then Python and I co-created the Django web framework quite early in my career. So a lot of my blog was talking about Django related stuff.
And then about three years ago it was Stable Diffusion, actually, that was the thing that got me into this because I was I teamed up with Andy Baio, we were looking at Stable Diffusion and they released the training data. And so we went and dug into the training data and discovered that they'd scraped it all from news websites and Pinterest and places, which we thought was interesting. You know, we thought people might want to know that these models are being trained on all of this unlicensed scraped data.
So we built a little explorer tool that people could use to dig into the training data and wrote a few things about it. And then ChatGPT came out, I think two months after that. So I was already thinking about that space and I'd been playing with GPT3 prior to ChatGPT for about a year and a half and it was really interesting and it was very difficult to get anyone else to care about it because the way GPT3 works was it was the text completion thing.
You have to type a JavaScript function that does X, Y and Z is colon and then it would write you the JavaScript function. And that was just weird, right? It was a weird way of interacting with this technology. So I could tell that there was something there. But the UI for it, the way you interacted just didn't quite fit most people's brains.
And all ChatGPT was, was GPT3 with a chat interface baked over the top. Like it was basically a little UI hack which opened everything up because once people could actually talk to the thing in a conversation, what it could do became so obvious. So I was in the right place at the right time in that I'd already been building up a little bit of sort of material around GPT3 and how that works.
And then ChatGPT came along and I was ready to start writing about that as well. And then really I've just been following and riding the wave since then, writing up the new models as they come along. I've got my ridiculous Pelican Riding a Bicycle benchmark that I've been using, which some people appear to take seriously and I certainly don't myself.
And yeah, I've been doing an annual write up. At the end of each year, I do a write up of the things that we've learned about LLMs in the previous year. I've got the monthly newsletter, the weekly newsletter, all of that kind of stuff.
So, yeah it's just an endless deluge of interesting things. So the challenge for me is always carving out time to do actual work as opposed to being distracted by Mistral. Mistral Medium 3.5 came out two hours ago.
Joe: Yes.
Simon: So, you know, every day, every day something new happens.
Joe: Yeah, I mean, we were originally scheduled to record this podcast late last week. I had some network troubles and I was just thinking in the four or five days that's passed since that original day, I mean, Poolside models came out yesterday.
Simon: Mhm. Deepseek v4, I think maybe came in the last four days.
Joe: Yeah, it's incredible even at this point in how quickly things are still changing. The firm's focus, and I think a lot of our audience's focus is building better software. And even the name of the podcast, High Leverage just comes from--It predated LLMs, but just generally speaking, as software engineers, how can we gain more leverage out of better practices and techniques?
And I assume your end of year post from last year had some pretty heady thoughts on that, specifically around software development and applying AI to it.
Simon: I mean, the key thing was last year was the year that it became indisputable that these things are amazing at code. You could just about argue at the end of 2024 that the code they wrote wasn't very good and was full of bugs and maybe it could slow you down if you didn't know what you were doing with it. That's gone now.
And that's because both Anthropic and OpenAI spent basically all of 2025 training for code. Like Claude Code came out in February of that year. It quickly became apparent if you want people to spend $200 a month on your product, code is the thing that they will pay for. And then there was that moment in November where it was Claude Opus 4.5 and GPT 5.1 came out at the same time basically. And they were that tipping point where the coding agents got good.
Up until then, coding agents were hit and miss. Sometimes they'd be useful, sometimes they weren't. I'd say that November was the point when a coding agent can now be a reliable partner. You can use it and know that you're going to get working code 9 out of 10 times, which is pretty great. Now you can use it as a daily driver. And so many of my peers are saying that, yeah, 70, 80% of the code they produce is now written by coding agents for them. They'd have laughed at that idea a year ago. And now it's not even surprising to say that.
Joe: Yeah, yeah, I think it's kind of amazing both the paradigm, because Claude Code launched just over a year ago now, you know, both the interactive paradigm shift from IDE based interactions or autocomplete or whatever with these tools. And to your point, also with code where, it was like, okay, this certainly means I don't have to like type out for loops or look up function definitions as much anymore to yeah, these new-- Not even the newest models anymore, but the models from late last year to I can give it a higher level.
You know, my kind of experience. One of our one of our portfolio companies, exe.dev. It's a hosting Sandbox, it's a new kind of cloud for these workloads. But I was using their agent over Christmas break. Actually they just soft launched and I was just trying to provide some barely valuable user feedback.
But it occurred to me that I was working on some side app, that I'd been working on it for like two, three weeks, adding feature after feature after feature after feature. And I was like, wait, I've never actually looked at the code that this thing has put out. Like it's still working.
Simon: Uh-huh.
Joe: Which was dramatically not-- I mean, I hadn't even been paying attention. I was like, "wait, this is doing a lot. I guess I should go look at the code. Maybe? I don't know."
Simon: That's shocking. And the fact that that wasn't-- You hadn't even really, it hadn't crossed your mind because we're already so baked into the thing where they write code and the code is good and sometimes we might go and look at it. But it's fascinating, the economics of it. The fact that the bit--
It used to be that you'd figure out what you wanted to build and then you'd hand it over to your engineering team and you'd wait two to four weeks for them to turn around with something that you can try. And now it's two to four hours. It's really astonishing.
Joe: Yeah. And so my mind then kind of went to and has been basically fully preoccupied ever since. And something I'd like to definitely get in today is like understanding that there's this new primitive capability or atomic primitive of like you said, 8 or 9 times out of 10 I can ask it this relatively high level thing. It produces the code that works.
If that primitive now exists, how far can we go with that? That's the first question. I know it's certainly farther than we've ever gone before. And how do we get there? And what are the kind of practices, what are the new practices and principles that come into play?
And I guess just to kick it over to you, one of the things I think you've been doing, which I appreciate is trying to distinguish between vibe coding and I think "agentic engineering"is the term.
Simon: Yes, those are the two terms that I've been using. Weirdly though, those things have started to blur for me already, which is quite upsetting because I thought we had a very clear delineation where vibe coding is the thing. You're not looking at the code at all. You might not even know how to program. You might be a non programmer who asks for a thing and gets a thing. And if the thing works, then great. And if it doesn't, tell it that it doesn't work and cross your fingers.
But at no point are you really caring about the code quality or any of those additional constraints. And my take on vibe coding was it's fantastic, provided you understand when it can be used and when it can't. Like personal tool for you, where if there's a bug it hurts you, go ahead. Right? If you're building software for other people, vibe coding is grossly irresponsible because it's other people's information. Other people get hurt by your stupid bugs. You need to have a higher level than that.
And then the contrast is like agentic engineering where you are a professional software engineer. You understand security and maintainability and operations things and performance and so forth. You're using these tools to the highest of your own ability. I'm finding the scope of challenges I can take on has gone up by a significant amount because I've got the support of these tools.
But I'm still leaning on my 25 years of experience as a software engineer. And the goal is to build high quality production systems that if you're building lower quality stuff faster, I think that's bad, right?
I want to build higher quality stuff faster. I want everything I'm building to be better in every way than it was before. The problem is that as the coding agents get more reliable, I'm not reviewing every line of code that they write anymore.
You know even for my production level stuff, I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it's just going to do it right. Like it's not going to mess that up. You have add automated tests, you have it add documentation, you know it's going to be good. But I'm not reviewing that code. And I've got that sort of feeling of guilt where I'm like, if I haven't reviewed the code, is it really responsible for me to use this in production?
The thing that really helps me was thinking back to when I've worked at larger organizations where I've been an engineering manager before and I've had my team building software. Other teams are building software that my team depends on. And if another team hands over something and says, "hey, this is the image resize service, here's how to use it to resize your images."
I'm not going to go and read every line of code that they wrote. I'm going to look at their documentation and I'm going to use it to resize some images. And then I'm going to start shipping my own features. And if I start running into problems where the image resizer thing appears to have bugs or the performance isn't good, that's when I might dig into their Git repositories and see what's going on.
But for the most part I treat that as a semi black box that I don't look at until I need to. I'm starting to treat the agents in the same way. And it still feels uncomfortable because human beings are accountable for what they do. Like a team can build a reputation. I'm like, "you know what, I trust that team over there. They built good software in the past. They're not going to build something rubbish because that affects their professional reputations."
Claude Code does not have a professional reputation. It can't take accountability for what it's done. But it's just proving-- Time and time again it's churning out straightforward things and doing them right in the style that I like.
Joe: Yeah.
Simon: So, yeah, that's a complicated thing, right?
Joe: Very complicated, yeah. Actually it's kind of funny because my experience with models is they're very happy to take quote, unquote, accountability for what you-- And you're like, "wait, did you just do this thing?" It's like, "oh, I'm really sorry, I did just do that thing." Haha.
Simon: Right. That tells me nothing.
Joe: Yeah.
Simon: And then of course Claude 4.7 came out and I've built trust in Claude 4.6, but now there's a new model. How long does it take me before I start trusting it not to make weird mistakes that I wasn't expecting?
Joe: Yeah. I do think that framing-- I was actually just literally thinking that in a large scale human engineering org, there's some piece of load bearing code. At some point, how many humans actually looked at that piece of code? Like one wrote it, maybe another one like, you know, it said plus one on the primary, haphazardly and then it shipped and then probably neither of those people even work here anymore. Right?
Simon: The thing I started realizing, like this is a problem I have with side projects as well. Like the whole industry is facing this. It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.
And now I can knock out a git repository with 100 commits and a beautiful readme and comprehensive tests of every line of code in, in half an hour. It looks identical to those projects that have had a great deal of care and attention. Maybe it is as good as them. I don't know. I can't tell from looking at it. Even from my own projects, I can't tell.
So I realized what I value more than the quality of the tests and documentation. I want somebody to have used the thing. If you've got a vibe coded thing which you have used every day for the past two weeks, that's much more valuable to me than something that you've just spat out and you've hardly even exercised.
Joe: Yeah, I think one of the things you called out and I think you were alluding to there is prior to the kind of Opus Codex moment, I do think there was this pretty clear heuristic between vibe coding and agentic engineering where it's like, well, if it's important then a human absolutely has to at least review it and approve it.
And I have a similar, I think, epiphany to you starting to use these things. I was like, "oh, these are good enough now that it's not going to make sense to gate every single thing that comes out of these into production systems on human review." Right?
Simon: Absolutely. Yeah. But I think you used the term "load bearing" earlier and that's super important.
Anything that I write that is security adjacent, I'm reviewing that. It is not responsible to outsource that entirely to the agents. And knowing what's security adjacent and what isn't is a skill that you develop through years of software engineering. So, yeah, none of this stuff is easy.
Joe: Yeah. One of the things I think is interesting, like the history of software development, at least one lens you could view it through that's helpful for me, is removing bottlenecks. Right? So one of the most important parts of cloud computing, besides like, oh, making CapEx, OpEx or whatever, was that human engineers for the most part didn't have to wait for someone else in some other building to rack and stack a server to actually deploy to production.
Right? It was just like, well, now there's an API call, the server exists, you can ship the software. And that removed this big bottleneck and I think was part of a big explosion in developer productivity tools, you know, over the course of the last I guess pushing 20 years now, because now "how fast can your human developer move" became the bottleneck. Not like we need the network architects to open up some firewall ports and they're not going to get around to it for two weeks or something.
It strikes me that up until late last year, even with all the excitement around autocomplete and tools like Cursor or whatever, human review was actually this, like between development and production, this like bottleneck gateway rate limiter. And if we remove that, it's-- I'm curious, it feels like everything downstream potentially breaks or it needs to be rethought.
Simon: 100%. Right. The single biggest question in all of this becomes:
If you can go from producing 200 lines of code a day to 2000 lines of code a day, what else breaks? The entire software development life cycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn't.
And what does that mean for-- And it's not just the downstream stuff, it's the upstream stuff as well. If you're like product designers-- I saw there was a great talk by Jenny Wen, who's the design leader at Anthropic, where she's been saying we have all of these design processes that are based around the idea that you need to get the design right. Because if you hand it off to the engineers and they spend three months building the wrong thing, that's catastrophic.
So you need to do your-- There's this whole very extensive design process that you put in place because that design results in expensive work. But if it doesn't take three months to build, maybe the design process can be a whole lot more riskier because the risk of you getting something wrong has been-- The cost, if you get something wrong, has been reduced so much.
Joe: Yeah, that's really fascinating because it also applies actually to the whole discipline of product management, like good product management and like specification and thinking things through is the cost of an engineer building the wrong thing or building the not as valuable thing has been so high historically. Right?
One of the things, vibe coding in actual like professional software development-- It reminds me, extreme programming had this term called spikes. I don't know if you're familiar.
Simon: Oh yeah, I'm very spiky. I love spikes.
Joe: Yeah. Yeah. So I guess for maybe younger members of the audience, when humans wrote all the code a spike was "okay, we have a working theory about something. Maybe the best way to figure it out is we take two days," which is a very small amount of human development time to do a thing and say remove all the guardrails. Don't worry about writing your test. Just hack a prototype out and see if it feels good.
And yeah, I think it's fascinating that now with this vibe coding it's like, well, you can do 10 different spikes in an hour and you don't even have to be an engineer to do that. Right?
Simon: I mean this is where there's this idea of parallel agents where you have more than one coding agent running at a time, which last year I thought was ludicrous because if you're going to be reviewing what they're doing, the last thing you need is five of them running at once I've actually started doing it now, much more because it's great for things like spikes.
I can fire off a Claude Code on the web task to do a spike to explore what would this look like? And have another one running in Codex over here. And they're just while I'm getting real work done elsewhere. And then there's still a lot of mental overhead. There's a lot of cognitive load in reviewing what the spikes came up with.
But it means that if I've got any random idea at all, I can have something doing a little bit of a prototyping work on it to see if it's going to work or not.
Joe: Yeah, the parallel agents. So that's interesting because I know you've kind of written and spoken on this a bit, but parallel agents, kind of historically with software development, it's all like oh, it's like deep focus and flow state. Which, by the way, I subscribe to. My very first startup, I mean, I was a accolade at Joel Spolsky and it was very important to me. All the developers at our startup had a room with a door so they could focus.
Simon: Yep. That thing where if you interrupt a programmer just to ask them a question, it takes them 15 minutes to an hour to spin back up on what they were doing because they have to load it all back into their head again.
Joe: Right, right. And so now contrast that with how does that compare and contrast to, like when you're working with five agents at once?
Simon: It's ridiculous.
I can walk the dog and write code now. Like I can fire up ChatGPT voice mode and tell it to write some Python and go back and forth with it with, while I'm out with the dog.
I get a lot of coding done while I'm like cooking dinner because my laptop's over here and every five minutes I pop over and run another prompt and I go back to other activities. I'm so much more interruptible than I used to be.
And I spent years developing my own mechanisms for being interrupted where I would do all of my work in GitHub issue comment threads. And so I'd constantly be writing an extra comment about what I'm doing next. So if I got interrupted, I could reread my comments, kind of like an engineer's lab notebook, and get back up to speed.
I don't do that at all anymore because the need to get really deep into it and hold all of that stuff in your head is so important when you're typing out a hundred lines of code. It's way less important when you're prompting the model with the next sort of architectural direction to go in. Yeah, but it's fascinating.
Like, I would never have thought that I'd be able to productively work on two or three projects simultaneously, which I can now do especially-- And the harder the project, the easier it is to parallelize because you can prompt something that takes 10 minutes for it to build the next segment of this thing, which gives you 10 minutes that you can go and spend on other things.
It's weird. It's a really weird way of working.
Joe: Yeah. One interesting area, and you already alluded to this, but one of the things I've been trying to wrap my head around is how much prior and, you know, I've been now-- Yeah, I've been investing now for, I don't know, almost eight years. But prior to that, you know, I spent the better part of, you know, 20, 25 years as a technologist.
I dropped out of PhD in computer science. I wrote a lot of code in my career. And I find that really, really helps when using these tools. In terms of the kind of conversations I can have with them and the kind of directions I can put them in. How important do you think for yourself or for future use, actual grounding in the underlying techniques is for being successful at these?
Simon: It's such a difficult question, isn't it? Because I've got 25 years of experience, so all of that stuff is just there. And when I look at my conversations with the agents, it's very clear to me that this is moon language for the vast majority of human beings.
There are a whole bunch of reasons I'm not scared that my career as a software engineer is over now that computers can write their own code, partly because these things are amplifiers of existing experience. If you know what you're doing, you can run so much faster with them.
But yeah, at the same time, what that means for newcomers to programming, I'm probably the last person on earth who can credibly answer that because I'm 25 years out of being that newcomer. But I'm optimistic about it. I'm seeing indications that suggest that new programmers-- This is incredibly helpful there as well.
And I think back to-- I've coached people learning to program a bunch of times. And there's that first six months of absolute hell where first you have to get a development environment set up, which is a nightmare because it breaks all the time. If you forget a semicolon, you get a weird error message and you might spend half an hour battling with that error message to try and figure out where that semicolon is.
Ideally, you can have an experienced programming manager who can look over your shoulder and say, here's the missing semicolon, but most people don't have that. That's solved. Right? If you've got a missing semicolon, you copy and paste the error message into ChatGPT and it tells you the answer every single time. Right?
So that initial friction, that miserable six month learning curve where you're fighting the thing all the time, that's been smoothed down a lot. And I love that because I know so many people who always wanted to learn to program but never did. And they assumed that they weren't smart enough to learn to program. And it wasn't that at all. It was that nobody warned them that there's six months of miserable tedium before you get to that sort of ray of light where you can actually get something to work. And it doesn't take you five hours of struggling against semicolons.
And most people got frustrated and quit. Right. And now those people are all learning to program. Like, I've talked to a whole bunch of people who are enthusiastically vibe coding as a starting point, but through vibe coding they're getting that reinforcement that what they're doing is working. They're learning more about code, they're starting on that journey.
The open question is, in three years time, will these new vibe coders be like software engineers who you would trust to build a production system, or will they still be just vibe coding things that don't have that sort of long term value? And I don't know the answer to that, but I think we'll find out within three years.
Joe: Yeah, yeah, I'm really fascinated. I think, like you, I continue to think, even with kind of the advancements in the last six months, that there's a place and a role for humans. If I zoom way out, the role of a software engineer has always been to sit between very messy, fuzzy and continuously changing kind of business requirements and deterministic things happening on silicon. Right?
Simon: And that will never be easy.
I'm constantly reminded as I work with these tools how hard the thing that we do is, like producing software is a ferociously difficult thing to do. And you could give me all of the AI tools in the world and it's still really difficult what we're trying to achieve here.
Joe: Yeah. So I think that role's still there. And I think the interesting challenge like you hit on is, and I think a small part of that role, certainly in areas like you said, security adjacent, is going to require the ability to like drop in and understand code and even if it's like little small pieces.
You know, I've been talking to-- I have in my network a lot of like very kind of senior tenured software engineers like yourself. And you know, some of them have existential questions like I've been doing this my whole career, what does this mean?
And I'm like, well I think you all are kind of like pre-atomic steel in that I think we're gonna have to find ways to train and mentor new people in the profession. But I think those probably will all ultimately fall short of like, oh, well, I had to do this for 25 years before AI. Haha.
Simon: That would be pretty awful. Wouldn't it be awful if in five years time the only useful programmers are the ones who've had 20 years of pre-AI experience?
Joe: Yes.
Simon: Right? That is not a good situation for us to be in.
Joe: Yeah, I don't think we will be. Scott Hanselman was on the show recently from Microsoft and he and some of his colleagues wrote a really interesting article in the communications at ACM about potential new models of mentorship and how we could invest in upskilling junior engineers.
One of the things they mentioned which I thought was really interesting is tools like Claude Code or similar tools should have a mode where you kind of flip the AI into the, to go back to like pair programming, the notion of the driver versus the--
Simon: Oh, nice.
Joe: Yeah, right. So you could be in working with Claude and saying, hey, we need to add this capability to this app and Claude could be like, "cool. Yep, I know how to do that. We're going to do XYZ," or even if you're bug fixing. Right? Like, but you're the fingers on keyboard.
Simon: I'm fascinated. What would it be like to do pair programming with an agent where you're the one typing the code?
Joe: Exactly. Yeah.
Simon: That's a fascinating idea. I'm tempted to try that out and see how that works.
Joe: Yeah, I feel like there could be some very interesting training modes where you would still get the benefit of the, like you said I don't have to go three cubes over to bug some senior engineer to look for where the missing semicolon is. But I still have to physically go through the act of--
Simon: Oh, that's really nice because one of the best things about pair programming has always been that you've got someone else to look things up if you're like, oh, what's the regular expression syntax for X? They can be doing that while you're typing the code and the models will do that stuff flawlessly.
That's a super interesting thing. I'm a huge fan of Scott. I hadn't realized. I just found that ACM article. I'm going to have to read that one. That looks really interesting.
Joe: Yeah, it's really good. We had a really interesting chat. We'll link to the episode in the show notes.
Simon: I'll tell you one thing. So Matthew Yglesias, who's a political commentator, yesterday he tweeted, "five months in, I think I've decided I don't want to vibe code. I want professionally managed software companies to use AI coding systems to make more, better, cheaper software products that they sell to me for money."
And that feels about right to me. Like I can plumb in my house if I watch enough YouTube videos on plumbing. I would rather hire a plumber.
Joe: Haha. Yeah, let's talk about that because this is an area I've been in a lot of spirited discussions because there are a lot of people, very intelligent people by the way and making good points but who, you know, the kind of dots they've connected is like, "oh well, because Claude can build working software, like literally the software is dead."
I mean, I don't know, like if you go look at public market multiples right now on software just as a class, it's like just been hammered, depressed. Everyone's assuming these things are all going to zero. And that just hasn't made sense to me for a few reasons.
One is at least if I have my old like CTO VPN hat on, I'm thinking if I'm increasing non determinism somewhere in my system of generating software, anywhere I can reduce non determinism or increase determinism, like is going to be valuable.
And so the higher level tools I can give my agent to use, in terms of like the comment you made earlier about image generation, like do I want to sell my agent? Well, I need a database and here's the MDM E disk API. Just go write me a database, then we'll build that. Right?
Simon: Right. Yeah. I've been thinking a lot recently about how so if I was going to build an issue tracker today, if I was going to build like GitHub issues or linear or something from scratch, the way I'd want to do it is I'd want to have an incredibly well designed core database schema. Like put everything I have into coming up with the perfect schema that can represent issues and comments and labels and milestones and all of that kind of thing.
And then I turn that into a very robust API and vibe code all of the UI on top of it. And that's where I want to open it up to my customers. I want them to be able to say, "oh, we need a Kanban board with this particular thing." But it's all writing to that deterministic, stable, well defined, well backed up underlying API and data model.
And if you get the data model right, people can build all sorts of, they can have all of that end user customized flexibility on top of it without any of the risk that comes from them vibing up a database schema and getting it wrong, and now their data doesn't make any sense anymore.
Joe: Yeah, yeah. I think that one of the other things fascinating to me about this with agentic coding and engineering, what it should enable-- I like to think that there's like this spectrum of possible software and a huge chunk of it, I think historically has not actually been economically viable to produce. Right?
Simon: 100%. If you build custom software for your local butcher shop, it's probably not got a big enough market that you can pay for the development time to build it, in the past.
Joe: Yeah. Because in human, I think just in broad strokes, keeping the numbers simple, a kind of minimal thing is like two weeks of developer time. Two, three weeks developer time. You're talking tens of thousands of dollars just conservatively and then maintaining on an ongoing basis hundreds, low to mid hundreds of thousands of dollars a year. Which, yeah, that is not the kind of margins the butcher shop has, historically speaking, at least.
Simon: Right.
Joe: And so when I hear people saying like, oh yeah, software is going to be dead because we're all going to do our own things, I think part of what you're actually hearing is: It's been really frustrating that the only economically viable way for this platform to exist is for it to be sufficiently horizontal and cross a sufficient number of categories that the TAM, you know, the addressable market is large enough to support it.
Simon: Yeah.
Joe: But I also believe what enterprises certainly get when they get software is well thought through and cohesive and working data layers, you know, security--
Simon: I just realized it's the thing I said earlier about how I only want to use your side project if you've used it for a few weeks. The enterprise version of that is I don't want a CRM unless at least two other giant enterprises have successfully used that CRM for six months.
Joe: Yes.
Simon: Like it's exactly the same mentality. Right? You want solutions that are proven to work before you take a risk on them.
Joe: Yeah, well, there's an interesting even-- Yeah, that post by Yglesias was great. I'm zooming out because you referenced-- I just read this last night in the Verge. There's an article by, I think his name, Nilay Patel.
Simon: He's amazing. I'm huge, huge fan of that guy. Yeah, he's the editor in chief, I think.
Joe: Yeah. The title which just struck me was The People Do Not Yearn for Automation.
Simon: Poetry, yeah. That is the best piece of commentary I've seen about the growing AI backlash. If you poll the general public on AI, those numbers are cratering. I think Nilay was saying they're less popular than ICE now.
Joe: It's incredibly negative. I think a lot of-- If you don't pay attention. But yeah, I mean it's like touching or dipping below like 20% approval in some. I mean, which is wild.
Simon: Especially among Gen Z who are also the highest that-- The people who use it the most hate it the most. Which is so interesting. And yeah, Nilay's whole pitch on this is that the things that we get excited about as people with software brains, like the idea that we can automate all this stuff, that does not work for most people. Like the reason home automation has not taken off beyond the nerds is that most people just don't want to be able to clap and shut their blinds or whatever. That's not exciting to mainstream humanity.
Joe: Yeah, yeah. I've been working in spots just here and there, but kind of iterating on somewhat detailed Claude skill to basically do some like robotic process automation on a couple different CRM tools. And it struck me at some point in the last few weeks as I was working on this, it's very similar to like building like a fairly detailed shell script back before where you're like, "oh, I have to--"
And yeah, I've been curious and part of the reason the article resonated because it again it struck me, I was like, okay, well I've been doing automation through code for decades and I'm just used to thinking in this mode of like, what are all the edge cases? Like what do I have to account for? Like what do I--? And kind of keep building.
But I was like, I don't think quote unquote, normal people think like this or not because they're not capable. They're just-- That's just not a thing they've had to do.
Simon: And it's a habit that you build up over years. Spotting those little frictions that you can fix and then fixing them. And once you've figured out how to do that, you can't understand why you would tolerate a friction in your life ever again if you could automate it away.
But, yeah, most people just, they don't have time in their lives for that mentality because they've got all sorts of other stuff going on.
Joe: Yeah. And so it's interesting, as I was thinking through that, that most people experience, I think, across broad human population with AI, is basically search. It's a better search, right. Where it's like, hey, I have a question I need answered. I'm going to ask you, you're going to both go through what's already in the weights.
Then you'll probably do some now with thinking models, you're going to do some search and you're going to like, synthesize it all. You're going to come back and you're going to present it to me in precisely the lens that I asked the question. Right? But that is a wildly different mode of operation than, yeah, let's sit down and automate some work.
Simon: Right. Or let's have it write me an entire piece of software from scratch in 10 minutes. And meanwhile, all of the rhetoric around this stuff is horrifying. Right. It's all, "oh, it's so dangerous, it might destroy the world and all of our software is going to be hacked and no one's going to have a job ever again."
Yeah. It's not surprising that AI has an image problem.
Joe: Yeah. Which is I think outside of the scope of here, but, yeah, very much an image problem and something I think maybe they're starting to take seriously. But yeah, the people at the large players, I think should take very seriously.
Simon: Very seriously. Like the AI data center backlash is about people not liking AI. Right? I think a great deal of that is people like, what is something I can campaign against in my life that pushes back against this. It's the construction of these giant buildings that suck up all of the energy and so forth. But that feels to me like that's the expression of that wider distaste at the moment.
Joe: Yeah. The data center is like, for a lot of people, the physical manifestation of AI.
Simon: Exactly, exactly.
Joe: You mentioned this early on because one of the things that I'm interested in getting your take on. I do think you're right. Part of the reason we had this watershed moment is because 12 to 16 or 18 months prior, people at the foundation labs decided that like, "oh, coding is going to be really important and so what do we need to do?"
And so you had at least three things. I mean there's certainly the interface, the cloud code, the agentic interface I think is critical. I think the reasoning tool use, RL and the impact there I think is really fascinating.
Simon: That's what OpenAI and Anthropic spent 2025 doing. All of their compute budget went into reinforcement learning against simulated software. Like you fire up 10,000 virtual machines with Python interpreters and you generate code and you see if the code works and you vote, you thumbs up if the code works and you thumbs down if the code failed.
And some of the Chinese labs, I think it was Qwen in one of the Qwen papers they talk about firing up 10,000 virtual machines to do this. Right. This is acknowledged and, and I think this is why xAI and Gemini are behind is that they didn't spend all of 2025 Running reinforcement learning loops on code, which with hindsight is what they should have been doing.
And so you now hear that, like both of those companies are doubling down on the coding side of things, but they're 12 months behind on what they needed to get done, I think.
Joe: Yeah. And generating software has unusually, because reinforcement learning as a technique you could in theory apply across a lot of domains, but software coding has unusually clean reward signals.
Simon: It's the perfect fit for it. Yeah. Like did the code work? Yes or no? Did it, all of that-- Like you think about lawyers and the way you do reinforcement learning on law is that you have to put it through a trial and see if the judge and the jury agree or disagree. And so that's like a six month turnaround to find out if your AI generated text was the right thing or not. That is clearly not nearly as easy as a Python script.
Joe: Yeah, I find it fascinating. I think at the beginning, kind of immediately post the ChatGPT moment, there's a lot of talk, especially from maximalists, about lawyers going away or these other professions going away. To be clear, I don't think any professions are going away. Well, translators might be in a rougher spot.
Simon: Data entry is a very scary area for that. Yeah.
Joe: Yeah, but in some of the higher levels-- But yeah, I find it kind of amusing that it may actually be, you know, software engineering may actually be the field that is the most susceptible to, let's say change at least or disruption based on these things.
You mentioned QEMU. Yeah, there's some other models. There's Kimi, which Cursor is kind of famously using their composer built on. Like you said, Poolside. I haven't had a chance to play with it yet, but they just yesterday as of this recording released their open weight coding models.
Simon: I've been following the Chinese AI labs very closely for the past year and a half because-- Wow, wow, they put out some good stuff. And I think there's at least five competitive Chinese labs that are all putting out models that are like three to six months behind the Frontier closed models, which is an incredible achievement.
Joe: Yeah, yeah, amazing.
Simon: And some of them run on my laptop. Like right now my favourite local model is Qwen 3.6-27B, which runs in about 20 gigabytes of memory. So it'll run on a decent powered laptop and it feels similar in capability to the leading Frontier models like a year ago maybe six months ago.
But it runs on my laptop. That shouldn't be possible. I thought I needed $50,000 worth of server GPUs to run a model that was as good as the best ones a year ago and now it's fitting on my MacBook Pro? It's astonishing. It's absolutely astonishing.
Joe: Yeah. And I'm curious to get your opinion. Having been around the industry for a couple decades now and having gone through a few innovation cycles, I'm always interested in what prior lessons can be applied here and what don't apply early on in this.
I continue to have a thesis that long term, particularly in enterprise software, enterprise computing, we're just necessarily going to be in a multimodal world of mixed models. So there's going to be a lot of value in enterprises being able to say, "okay, for this workload we're using this model, which is a foundation model. For this workload we're using this open weight model we're running in house," and dialing the data provenance and cost. First of all, I guess, do you agree, disagree? What's your perspective there?
Simon: So I think right now the big problem is that if you were to commit yourself entirely to one model vendor, what happens if some other model vendor discovers a new trick which means that their models are better? And the answer seems to be your model vendor will catch up. The three big labs, Anthropic, OpenAI and Gemini, leapfrog each other all the time. Gemini's fallen behind a little bit because they haven't done the focus on code by all counts.
They're throwing everything at code at the moment, I'd be surprised if by the end of this year, Gemini's leading models weren't as good at coding as OpenAI and Anthropic's leading models. So then it comes down to things like data provenance. Do you want a local model because you're avoiding sending things through APIs to these cloud models? For most cases, I think that's a false optimization.
I think there's a huge amount of paranoia around that. But if you've got a well signed agreement with Anthropic that they don't train on your data and so forth, you can trust them to hold to that agreement. At the same time, I do a lot of work with journalists, and journalists sometimes have situations where they have to protect their sources and there can be a government subpoena to a data center going after a journalist's sources.
So journalists actually have a very credible reason to want to use local models for some of the stuff that they're working with. My own interest in local models sort of has been like waxing and waning over time.
About a year ago, I lost interest entirely because they were just not nearly good enough that it was worth me using them for anything. Like the models on my laptop were so puny in comparison to what I could get out of the big cloud providers, that I just couldn't see myself using them for anything other than the playing around. That's changed in the past six months.
Now the models I run on my laptop I can get real work done with, which is notable, except that it turns out real work now is throwing Claude Opus 4.7 at an incredibly difficult technical problem, letting it churn away for 10 minutes, and that the local models can't do yet.
So I go back and forth on it, just as an individual who's a sort of enthusiast in this stuff. But yeah, I do feel like there's a lot to be said, especially as you spread out more from just the normal stuff that we're doing in these models. If you're doing data extraction from video, Gemini is pretty much the only player in town. You can feed Gemini an hour long video and it can answer questions about it and pull out structured data from it.
The audio models are getting stronger. A lot of what I do on the journalism side is try and extract content from PDF documents, which was almost impossible two years ago. And today, given the right combination of models and the right setup, I can get really, really good results out of. But yeah, it's complicated. Right?
I do think locking yourself into a single model is very short sighted at this point.
And I would say that I build an abstraction layer in Python. I've got an open source library for talking to different models, so I'm already quite invested in the idea of these abstraction layers.
Joe: Yeah, I was actually going to follow up and ask what techniques are you using for yourself to enable you to use different models? What's your harness look like?
Simon: Well, I have my own tool called LLM. It's a command line tool and a Python package for talking to all sorts of different models. It's got a plugin system for talking to DeepSeek and Mistral and all of those kinds of things. And last year I used it very heavily. This year I've not been using it nearly as much because Claude Code and OpenAI Codex got so good.
Like the majority of things I want to do with the model from my laptop I can pile through Claude Code and Codex but as I start wanting to do more with the local models, you can plug Codex and Claude Code into a local model and it doesn't work very well because they've got like 20,000 token system prompts that the local models aren't particularly good at handling.
I've been playing a little bit with Pi, which is a much more lightweight open source coding agent. I've got a massive upgrade to my one coming out actually later today which redesigns it to be a better fit for working with the reasoning models and things like that.
There's also the whole OpenClaw thing which I'm fascinated by. I don't trust it. But I feel like the new hello world of working with models is building your own little OpenClaw. So I'm working my own, we call them claws now. So I've started work on my own claw on top of my own LLM framework. So that's a whole other side project that I'm getting into.
But yeah, so my day to day driver is still, it was Claude Code until a week ago, it's now OpenAI Codex for the most part. Their latest version is outstanding and I don't trust Anthropic's Claude Code pricing. They've been messing around with things around that, that they're not happy with.
Joe: Well maybe that's a good segue because something else I want to chat about, which is fascinating to me because this is just something that's kind of come up in the-- Yeah, I mean like you said that the Claude Code pricing and whether it was an AB test or whatever about whether or not pro users would have access to Claude Code anymore. I think GitHub Copilot just put out--
Simon: GitHub Copilot and Windsurf as well actually had per request pricing where the cost for a prompt that might trigger a whole bunch of tool calls was the same no matter what the prompt was. And that made sense a year ago and today it's a terrible idea because maybe one prompt will run for 10 minutes. So both Windsurf and GitHub Copilot have switched to the same per token pricing that everyone else is using now.
Joe: Yeah, and I think it's interesting because there's always been some kind of chatter on whether the last few years have been the foundation labs and AI in general than the early Uber era when you could catch a black car across San Francisco for like $3 or something.
Simon: Yeah, it's funny, I feel there are two tensions pulling in different directions. The first tension is because of coding agents, heavy users use 100 times the tokens they used to. If you're a heavy user of Claude Code, you are burning vastly more compute than if you're a heavy user of ChatGPT like last year. And this is driving a lot of the price increases.
But the flip side is the open weight models, especially the ones coming out of China, have a sort of push pricing in the opposite direction. Deepseek is, what, 20 times cheaper than Claude Opus and benchmarks-- It's not 20 times worse than Opus. Deepseek, you can run it through their API, but you can also run it on your own hardware.
So hopefully those open weight models, the force that they have on the pricing helps counteract the obvious need for these companies who want to do IPOs to start actually making real revenue. But yeah, it's all very frothy on the pricing front.
Joe: Yeah, well, I don't think the timing was accidental in that, you know, end of last year, these new models that you could start doing so much more with in terms of software engineering come out. And then it's really-- I mean, this is like April 2026. It's this month after, like all these finance teams did their Q1 review and looked at the token number and went, "wait, what, What? Like, what's going on here?" Haha.
Simon: I mean, we just had two massive price hikes this week. So Opus 4.7 is priced the same as Opus 4.6, but the tokenizer is less. It takes 1.4x the tokens for different things so it's effectively a sort of invisible 40% price bump.
Joe: Right.
Simon: And Anthropic will tell you it uses less reasoning tokens than the previous ones, but it's definitely more expensive. GPT 5.5 is double the price of GPT 5.4 over the API.
Like these are the most significant price hikes we've had since I started tracking the pricing of these things.
Joe: Yeah. And so I think it will be interesting as this year plays out. For me it feels like this is finally the moment where I think forward thinking enterprises, obviously this is the future, these tools are going to be a critical part. They're going to be used a lot but, they're going to have to start measuring outcomes against input costs.
And I think we've been in this honeymoon period where more than any other new tech, at least in my career, the budget has just been wide open. Usually these new things you have to fight and claw for a little bit of budget to try out some new technology. And I feel like the last few years--
Simon: The one thing that's certain is that we're going to have to stop having scoreboards of how many tokens people have used. Haha. And like maybe don't do that if you don't want to blow your entire budget on wasteful token usage.
Joe: The first time I saw mentions in some media of a business employing a token maxing leaderboard, I just, I just immediately was like, I went "oh no." Hahaha.
Simon: There's no world in which that ends.
Joe: Yeah, is like this is an executive who has never attempted to put a metric on a human software engineer before, like a gameable metric. And they're about to learn a very, very expensive lesson.
Well, this has been so much fun chatting. I could go on forever. You know, we'll have to check back in sometime in the future. Where's the best place for listeners? You know if we have the odd one or two listeners who doesn't already follow you, where can they find more of your work?
Simon: Basically I'm all on SimonWillison.net, so that's my blog which links to all of my other stuff. I'm present on Bluesky, Mastadon, and Twitter.
I have a substack which is SimonW.substack.com, which is just my blog copied and pasted into a newsletter about once a week. But a lot of people subscribe to that and really appreciate it. I guess RSS feeds are not as widely spread as they were when I was blogging back in the early 2000s.
Joe: Yeah.
Simon: And I have got RSS feeds. If you've got RSS, I've got a great feed for you.
Joe: Amazing. Well, thank you so much. It's been so much fun.
Simon: Thanks. This was a great conversation.
Content from the Library
Data Renegades Ep. #11, Contrarian Bets and AI Skepticism with Michael Stonebraker
On episode 11 of Data Renegades, CL Kao sits down with Michael Stonebraker, legendary database pioneer and creator of Ingres and...
High Leverage Ep. #8, The AI Preceptorship Model with Scott Hanselman
On episode 8 of High Leverage, Joe Ruscio sits down with Scott Hanselman for a conversation that goes far beyond prompts and...
Open Source Ready Ep. #36, Managing AI Coding Agents with Jesse Vincent
On episode 36 of Open Source Ready, Brian Douglas and John McBride sit down with Jesse Vincent. They explore how Jesse’s...

