JUN 3, 2026

68 MIN

Ep. #11, Why Agents Need Computers with David Crawshaw

GuestsDavid Crawshaw

light mode

about the episode

On episode 11 of High Leverage, Joe Ruscio speaks with David Crawshaw about the shift from traditional developer infrastructure to agent-native computing. David traces lessons from founding Tailscale to now building exe.dev around the idea that “agents need a computer.” The conversation explores why VMs may be the right primitive for AI coding agents, why deployment friction becomes unacceptable when software can be generated in minutes, and how agent-driven development changes code review, production workflows, security, and the economics of personal software.

about the guests

David Crawshaw is a software engineer, entrepreneur, and infrastructure builder best known as the co-founding CTO of Tailscale. He currently co-founded exe.dev, where he is exploring agent-native infrastructure and new approaches to software development in the age of AI coding agents. Throughout his career, David has focused on simplifying complex systems and building tools that help developers move faster.

show notes

David Crawshaw’s Blog
Tailscale
exe.dev
WireGuard
Anthropic Claude
Claude Code
OpenAI Codex
Mistral AI
Docker
gVisor
Ubuntu
SQLite
Amazon EC2
AWS Lambda
Kubernetes
Okta
Playwright
Simon Willison's Blog
Code Review Has to Go (David's recent talk from DevGuild)

about the episode

about the guests

show notes

transcript

Joe Ruscio: All right, well thanks everyone for joining us for another episode of High Leverage. I'm doubly excited today. We have a great guest, a longtime friend and colleague of mine, David Crawshaw is joining us and also this is our first time recording the show in person.

David Crawshaw: This is the first time? I'm honored.

Joe: Yeah. We're going to talk about a lot of things you're doing now and have done, but usually what I like to do just kick things off, give the listeners some idea of your background, what brought you to your pursuits and what you're doing today. And so yeah, what's your origin story when it comes to computers?

David: Yeah, what's my origin story? It's fun to me, but also it's just my life. I'm glad to be doing this in person because you know, talking to you over Zoom, like am I actually talking to you or did you just send a Claude bot to talk to me? I never know what we're dealing with. Haha.

I've been programming since I was a kid, about the time I learned to read is when I started to program. And so I just enjoy it. It's a lot of fun. And you know, I started because my parents ran what in Australia is called a medical center, which here would be a primary care practice, I think. So they provided the facilities for a dozen doctors to work.

And my father was a general practitioner, primary care physician in US speak. And he enjoyed playing with computers and taught himself to program and wrote the medical record software, EMR I think you'd call it here, for the practice. And I learned a program playing with the medical record software. It was very easy stuff. It was a whole series of MS-DOS PCs.

Joe: That's interesting. A lot of times people are like "oh, I was writing video games and you're like, medical records."

David: I tried. Do you remember QBasic on MS-DOS shipped with that little guerrilla game where you throw bananas?

Joe: Oh yes. Wow. Yes.

David: Yeah. So I did hack on that a bunch and I taught myself some BASIC doing that. But honestly I really couldn't improve on the game was my take. I thought it was perfect. It was perfect. It exploded. I was so happy.

Joe: What more could you build?

David: Yeah, that's right. I made the banana bigger. I made it go faster, stuff like that. But no, it was ideal. So instead my father worked in this programming language called Clipper. It's a derivative of dBASE III Plus that sort of lineage of database plus high level programming language which compiled down to a P-code. So in retrospect I could say it's a high level garbage collected language for building TUIs, which is really nice. You write five lines of code, you get a menu system.

Joe: Right, right. Use the keyboard to navigate around.

David: Yeah, exactly. This was, you know, the first version was pre mouse. Eventually we got a version that kind of sort of worked with mice and we never really used it. Anyway, there was a, there was a LAN of like 18 machines for all these doctors and the receptionists and the printers and such. And there was a Novell Netware file server.

And so it was all file shared based software. So it was all the database was in a directory and all the programs were starting and editing the directory and communicating over file locks on a local system. I know that now. I don't know.

It just sort of worked when I was doing it. And you know, then I learned PCL 5e which is one of the printer languages because we bought these fancy printers to print prescriptions and then we're like, oh, we should figure out how to use them.

Joe: How to actually do it.

David: Yeah, yeah. So it's in the family of PostScript and then we would send the PostScript like language to the printer and it would print. And sometimes it worked. It was a ton of fun. I learned a lot about computers and making them work.

We bought a magnetic card reader on vacation in Hong Kong once and took it home and it plugged into a PS2 port like a keyboard. And you swipe a magnetic card through and it like types a bunch of characters in like a keyboard. And so we taught it to read the Medicare cards Australians use for healthcare. It was tons of fun.

And you know it was a small town and so like all the doctors knew each other and we would bring other doctors in to show them and they would take us to their practices and we see how their systems work. And these off the shelf systems were just so clunky and terrible and not optimized and oh it was so painful watching their receptionist try to check someone in or check them out.

And so I learned a lot about product design there. And like, how do we get it so the receptionist doesn't spend more than a minute checking in a patient who has an appointment? You know, that was a question. My father and I would sit down and he'd talk me through it, like what he's trying to do, optimize these flows in the business.

Joe: Early KPIs.

David: Yeah, that's right. And so, you know, it was sort of the ideal small business in which to learn programming. And so I had a ton of fun. And you know, those really were the formative lessons in product design for me too.

Because if you think about my career, you know eventually I ended up in the US. I ended up working at Google, these sorts of places, you know, big company stuff. You don't learn much about product working at one of those companies.

Joe: Yeah, certainly not at Google, famously. Haha.

David: Yeah, that's right. Yeah.

Joe: Yeah, well, yeah, well then. And when I think our paths first crossed, you left Google and founded a company, Tailscale which I would think many of the listeners are actually active users of.

David: Yeah, that's right. I started Tailscale with two co-founders. I was the co-founding CTO. I hacked on a lot of the original software. My co-founder Avery, we pulled a lot of commits into a git repository trying to solve problems. We started with the idea that we should try and write software that would be less terrible than some of the other software people were already using.

There are a lot of problems in computing that we thought we could simplify. And then we started talking to potential customers about their problems.

We eventually ran into a small Canadian bank who was an old friend of Avery's through some distant connection. And they had the problem that they had a whole bunch of .net software that their bank was run on, and they'd hired an external auditor who did a phishing test, which they failed because everyone fails phishing tests. I fail a phishing test if you try, and I'm looking for them.

And so they gave the standard advice of you should put two factor on everything, which is good advice. You should put two factor on everything. Or switch to passkeys entirely and just use a factor that's good. But we didn't have those back then. This is in the old days. This was 2019.

Joe: Ancient times. Haha.

David: Yeah. And so we went looking, you know, we were effectively consulting with various people, trying to solve their problems. And we were like, well, this can't be the first bank that's had this problem. Let's go talk to others.

And, yeah, we found banks did have a solution for this. It was virtual desktop systems. It's like the old Citrix systems, where you put a computer in a computer and they're universally loathed by everyone who has to use them. And I honestly don't know why. I've never really understood it, but no one likes using a desktop in a desktop. I guess it feels a bit too much like remote hands operating at a distance.

And so we were sitting around thinking about the problem, what else we could do. It's like, well, it's too bad you can't just two factor onto the network. And then we're like, well, surely we can do that, right? With like, a VPN or something. So we downloaded all the VPNs, we tried to make it work, and we couldn't.

Joe: Yeah, couldn't do it with any of them.

David: And we're like, well, that's silly. Like, I could build that in a weekend, we said. And then the usual, you know, bravado of anyone starting a company.

Joe: Classic nerd snipe.

David: Yeah, that's right. And so we built it in the weekend. Avery found WireGuard, which I guess he'd heard about somehow, which I hadn't. He sent it to me. I looked at it. I was like, "oh, this is all written in Go, I know this."

And so I spent the weekend hacking up a WireGuard configuration management system that ran on a computer in my closet. And we sent it to them, and they tried it, and it was great.

Joe: And it worked?

David: It worked. And then we were like, wow, this is great. Let's keep building this. And we started building a traditional VPN. And then one day, I was trying to manage it at home. This was in my New York apartment and I was out of space for computers. I needed one more computer to act as the concentrator in our architecture design.

I was like, you know what, I'm just going to make it all peer to peer for now. And I know peer to peer networks don't work, but we can figure it out later. And I sent Avery and my other co-founder, David Carney an email saying like, I'm just going to make it peer to peer and we'll figure it out later. And Avery's like, yeah, that never works, but it's fine for now, we'll figure it out later.

And we both knew it wouldn't work. Right? And then it worked. And so that's how Tailscale today, where you can just download it, login, and your devices connect and you get direct connections came from. It was purely solving my personal problem there.

Joe: All right, Fascinating. Yeah. Because Tailscale is today, amongst actually many things, I don't want to sell it short here, but it's definitely a distributed mesh VPN where everything happens point to point. Everything is basically 2fa on the network and user device tuples.

Yeah, just really incredible product. Universally loved. I'm always telling people, I think the most co-located, if you're word clouding or whatever you call it with Tailscale is magic.

David: And when there's a bug, with all the downsides of magic, but it almost always works. I use it every day. I love it still. You know, I'm a huge fan and I'm very happy with the state of the product. You know, it solves the problems I have in just the way I want it to.

It's one of those cases of both building to a customer's need and doing the exploratory product work of like, how does this fit in my universe and how do I solve this? And there's a lot of accidents like that in the design of new things.

Joe: Yeah. Taking that forward then I guess towards today, you know, having finished up your time at Tailscale, at some point in there, ChatGPT happens in late 23, you kind of finish up and you moved on from Tailscale and despite having like a ton of success there, you get the itch again and you want to start doing something. So what happened there?

David: Yeah, I mean I have to do something you know, it's the nature of things. Right?

I actually really think it's very hard to sit down as a solo programmer and build a thing of worth, I would say. And that was my first immediate thought is I'm just going to go build something. There are clearly spaces where it works. I mean the ones I admire most are the indie game devs who build a game from start to finish themselves. I've never tried that.

Maybe when I retire I will if the LLMs let me. But you know, I looked for a scope project like that and honestly the interesting problems that I like to solve require teams to work on them and that inherently narrows the scope of things I can do. I start immediately looking for companies and such like that. So I started looking for ideas in that space.

And then my co-founder Josh who I've known forever, is very into all things machine learning. He was doing machine learning back on the iPhone4, which was a very constrained CPU and required a lot of effort, particularly vision work. But he mentioned Mixtral which is a model that had just come out of Mistral, which I think was the first MOE model you could download.

Joe: Yeah, at least one of the very first.

David: Yeah. So I downloaded it and I figured out Llama CPP and I got it running locally and it was a lot like talking to ChatGPT but it was in my hands. I had this Apple studio it was running on. I could pick it up and I'm like, oh, this is the thing I'm talking to.

Joe: Right, right. It's right here.

David: I know. And for some reason there's something very visceral about that moment of like, oh, I'm holding the model in my hand and like, you know, this is clearly an object which I can use to make computers do things. I like making computers do things.

And so at that point I was like, well, I have to put these to work somehow. And so ever since then I've been staring at models, thinking about what to do with them. And so Josh and I started building what is now called a coding agent harness.

Joe: Right, yeah, it was Sketch, right?

David: That's right. So we built this coding agent and hilariously, the nature of timing, we actually started before Claude Code came out.

Joe: Yeah, I was going to say this was I remember when you first started working on this because, I mean we didn't have words like harness or--

David: No, we didn't.

Joe: Or even, I mean "agent"is a broad word, yes, but not like "coding agent."

David: Yeah, that's right. I know we had this notion of-- We didn't even really have the word agent yet for the tool calling loops, you know that came a bit later. But yeah we were like we're going to have an LLM write code we said, and modify code we have is the thing we said.

And we spent a lot of time, you know, tool calling existed as a concept, we thought we'd make that work. And then we spend a lot of time on how do we integrate this into the software development lifecycle and how do we have multiple of these working in parallel?

And so we started building git based, container based machinery for managing all of this before Claude Code existed. And so we were focused on sort of two problems out from where we should have been.

But it was very educational and we learned a lot trying to get people to use it. So we learned several things very quickly. One of which is isolating your agent and running it independently is very useful if you're willing to do the work of learning how to use one of these systems because you can have a whole lot of them running and you can build an orchestrator for your workflow.

Two, that containers are not the right shaped thing to put these workflows in, which was a huge surprise to me actually.

Joe: And when you say containers you mean like Docker containers or?

David: The first version of Sketch literally used Docker locally. And so these were Docker. And you're right to ask what I mean by containers because container is not actually a real term. Right? It's a made up word that we throw around. It's some variation of cgroup permissions in the Linux kernel as a container. And what does that mean?

And then we tried moving from the pure Docker container to when we built a hosted version of this to use gVisor under the hood, which is a proper virtual machine though a very interesting one because it re implements the Linux kernel itself.

And what we learned there is that you know, although we would take it to companies and say hey, try running your test suite in this thing or having the agent run your test suite, that was the big magic, to make it all work. You need to give the agent a loop where it can see and do everything.

We got further with that but we quickly hit a wall where like people would say like oh my agent tried to use Docker Compose or something which doesn't work in gVisor for arcane reasons about how Docker Compose is implemented using exotic features of the Linux kernel.

And that's when we realized, oh, I see. Even almost all of a Linux kernel, which is gVisor, isn't enough. You need to give an agent just a computer. And so that's when we started saying, okay, they need full VMs and we could boot the VMs from a container image, but they need to, if they want to add a network interface, let them add a network interface. Let them do whatever they want to do. If they want to install Tailscale, let them do it.

And so that led us to VM infrastructure. And at this point we were like, well, you know, our agent is one of many coding agent harnesses themselves are relatively small things that people want to heavily customize. Is that really a product? And we said, no.

I mean, really, this underlying infrastructure that you build the agent on is the most interesting part. And that led us to where we are.

Joe: Yeah, well, because you can run, to your point, two things. I mean, one is you can run Claude Code on your laptop, you can run it on a cloud VM. And what I think is interesting is, yeah, the proliferation now there's things like PI.

David: I really like PI.

Joe: Yeah. Which, you know, it has occurred to me, even using Claude Code where it's like, this is a good paradigm, but there are things that I like about and things I don't like about it, and for certain workflows, it's good. And yeah, so I do think there's going to be a lot of these different harnesses.

David: Yeah. I mean, I very much have to distinguish these days between the Anthropic models, that is Opus, which is brilliant and probably still my preferred coding model.

Joe: 4.7 or whatever?

David: Yeah, no, it's 5.5 now. Maybe they're out the other one. Oh, no, 4.7's Opus. I'm sorry.

Joe: Yeah, 4.7 is Opus.

David: Yeah, Opus is great.

Joe: Just to firmly date when we recorded this podcast. Haha.

David: That's right, that's right. Yeah. It's not next week, it's this week. Yeah. The Codex models are actually really good these days and I use them sometimes. The models are terrific. And then there's the harnesses on top of them.

And if anything, the Claude Code harness, as it has been developed to be useful to more developers, has become less useful to me, because they built a lot of what is effectively safety machinery into it to try and stop it deleting your production database or Emailing your birth certificate to someone or buying something on your credit card, all very reasonable things to try and solve.

But that machinery makes them worse at programming. The way I work with them is I put them in a box, I give them very specific things and I let them do whatever they want.

Joe: In the box?

David: In the box, that's right, with the things I gave them. And that gets better results. And so that cruder technique gives a better outcome. And so to achieve that with Claude Code today, the actual harness rather than the model is actually quite tricky. There's dangerously skipped permissions, but it still puts some restrictions in there and there's, it's always changing and it's always a surprise.

And so I don't use it very much anymore, which is a bit of a surprise to me, though I love the underlying model.

Joe: Yeah, well, and I guess this exploration and kind of these learnings, because this is along, you know the company you and Josh founded is exe.dev. Spelled e-x-e.dev.

David: Yeah, that's right, that's right. When we created it, I said E-x-e.dev because of, you know, MS-DOS, I used to type EXE a lot on the other programs, even though I didn't need to, I typed it anyway. And so I said "e-x-e.dev" but everyone else says "exy," and so, okay, let's go with that.

Joe: Yeah, it does roll off the tongue a little better. It does, yeah. So exe.dev is actually, I think the end result of that kind of exploration and learning. And so what actually today is exe.dev?

David: Yeah, it is virtual machines in the cloud is the simplest explanation which also doesn't capture any of its value.

It is an attempt to redo what a cloud provider looks like around both what is best for a developer and then what is best for sort of modern agent based development of software.

Joe: Okay. And I assume in many parts that overlaps and maybe in some parts it maybe does or doesn't diverge. But like, what is best? Because I guess one question would be, I think you just said like cloud based VMs and you're like, oh, well you know, EC2 launched in 2006. So like what's new or what's different about this?

David: Yeah, there's a few things that are different. Though, EC2 remains in my mind, it's the baseline from which I work. I say to myself, "I want EC2, but," and here are all the problems. And so the whole design of exe.dev is how do I make EC2 the thing I want? And there's a couple of like fundamentally underlying problems, which is EC2 has never been very much a developer centric thing.

Joe: Right. It was built for originally production, I mean, I guess test dev prod. But yeah, it was not like, "hey, do development here."

David: Yeah. And their customers are very large companies solving very large company problems. You know, the first thing you think about with EC2 is your IAM configuration and like, well, you know, it's permissions and access into complex systems--

Joe: Security groups.

David: That's right. Yeah, that's right. That's actually, that's exactly where you start. Right? Network security groups. And you set some stuff up and you almost certainly need something exotic for every EC2 instance you start by hand. Or you've written a lot of code configuration that you've checked into a git repository that starts your EC2 instances.

Joe: Right. Piles of terraform somewhere.

David: That's right. So you're never the, you know, the number of engineers who use AWS who actually directly use their APIs is extremely small. You use layers and layers written on top of their APIs, which, you know, you're not the customer.

Joe: Right, right. Yeah. Well and like I said classically it's for most people it's the deployment target. Right? And even that deployment is like, oh, it's the staging or CI or production. But it's like when I'm done working on my machine or my laptop, that's where I push to.

David: Yeah. And most people are working even a layer above it. Right? They're spinning up jobs on Kubernetes which are ultimately running on EC2 instances. Or they're running Lambda functions they're called, which is built on EC2 instances.

We're all using VMs at the bottom layer. That's what's there. But we use all these layers of abstraction above them. And there's reasons we built all of those layers of abstractions. And some of those reasons don't make sense because of the passage of time. And some of those reasons have become far more important to revisit because of agents.

And that's why I've saying let's strip back all the layers of abstraction, go back to the first thing, rebuild it, try and make it as friendly as possible and see how far we can get without those abstractions.

Joe: What are some guiding examples of like what an agent needs? Well, first of all, which I always love and repeat a lot to people: Your agent needs a computer. That to me is the first. I mean I'm much happier asking my agent or harness to do whatever inside of its little box where whatever it does in there, I'm not concerned. As opposed to like letting it run wild on my laptop.

But yeah, so what at a high level does the agent need? I know you just recently wrote a post about this. We'll link in the show notes and I think maybe this is what you meant by passage of time, but there's even certain hardware just in terms of like disk latency and network throughput that have like dramatically changed in the last 20 years, right?

David: There are, yeah. And you know we can work through all of these things. There's a whole series of things here.

Joe: Well, let's hit the most important ones.

David: The most important one is agents are trained on developers. You know, they train agents on exactly the way we work. Right?

Joe: Yeah, because it's not just code but you mean like even like reinforcement learning.

David: Yes, exactly. Like they take those transcripts from subscriptions to Claude Code of what you're trying to build, how you build the various commits you're making and they use that to train the model. So what developers do ultimately is what the agents do and that's where they work best.

Developers work best pre agents on their laptop or their desktop, directly on a computer where things are relatively stable and extremely flexible and they can do lots of things. Then developers constrain themselves into various production environments and they take that hit for a variety of reasons.

And so you start saying to yourself, I'm going to build for AWS Lambda, which means I can only answer HTTP requests and I have 60 seconds so I can't build like a long pull system or anything. So you start immediately cutting off options.

Joe: Right. Because the, to your point, I guess the higher level primitive or the platform, you know, whether that's like AWS Lambda or like a Heroku or something akin to that, by selecting that you inherently are agreeing to some trade offs they've already decided to make.

David: That's exactly it. You work to these constrained environments to get something back out. And with Lambda that's scaled to zero and scale growth without a lot of operation overhead. So it's all scaling related, which is very traditionally reasonable.

The scale to zero, which was the original selling point and is the best selling point for writing small software is an example of where we have other options these days. We're better at parking VMs, taking them and taking them offline in a way that wasn't reasonable back when EC2 was created.

EC2 was built back when the Linux kernel was not particularly good at isolation. And they designed what was the only reasonable thing to sell at the time, which was, "we're going to give you a precise slice of a computer. We're not going to over subscribe the computer or attempt to, even though that's ideal from a bin packing perspective because you'll get a terrible experience if you do."

And even then when they were trying to give you precise slices of a computer there was still a lot of issues with noisy neighbors on EC2. I think there's a famous story from one company where every time they would start an EC2 instance they would run a little bash loop that every second would print out the time and then they would look at it and see how far off a second the times were printing.

And if they're too far apart, the machine was oversubscribed, they'd turn the EC2 instance down and turn up another one.

Joe: Until they found one that was closer. Yeah, haha.

David: Yeah, and so back then we didn't have the technology to properly isolate or oversubscribe VMs in a safe manner. And since then lots of engineers, many of these cloud companies, have been building great machinery in the Linux kernel to let you do more exotic things with VMs, including take them offline cheaply and quickly.

And that is why, you know, that original thing that AWS Lambda promises of scaling to zero is far less interesting because an idle VM can basically be scaled to zero today.

Joe: Yeah.

David: So why not just use a computer? Why constrain yourself in an abstraction you don't need to?

Joe: Okay. And so now, if you can just use a computer-- I guess, maybe bringing it back, so what in your mind is the ideal environment for an agent or rather, for a computer? What kind of computer?

David: Right. The closest I can get to a good development environment for me, which today is basically an Ubuntu Linux machine. You know, I'd love to say my laptop, but macOS has always been, you know, it gives with one hand, and takes with the other. Haha.

Joe: And increasingly takes. To be honest, as a longtime Mac user, it's getting worse.

David: Yeah it's always been a challenging environment. I enjoy it because I used the BSDs back in the 90s and so the oddities of macOS are familiar to me, but they're still a pain. Every day I have to adjust something or other because I'm on a Mac. Just for example, all the lovely VM isolation machinery I was talking about from Linux isn't there.

Like, if you want to do any of that on a Mac, the first thing you do is you start a Linux VM and then you do it all inside the Linux VM and like that's not ideal. So yeah, an Ubuntu Linux VM is the closest I can think of to a computer I want. And it scales down which is great. So I don't need all those abstractions. The other thing I need, and this is because of agents--

I now write a lot more code than I used to. I have a lot more products that are mostly toys that I build for myself that are sometimes useful. And I have that because the work involved in writing a prototype app now is the same work as I would have done opening an Apple Note and adding it to a list of things I might want to build one day. That one sentence, my LLM will prototype based on that. And sometimes that prototype's even useful.

And so I have a lot of these things now. And so this other big problem I need is I need not just a lot of VMs, but I need them all not to cost me five bucks a month because my ideas aren't worth five bucks a month.

Joe: Haha. Right. Yeah. Well this to me has been one of the fascinating things and particularly in the last six months since Opus and Codex both arrived. And I think in part because of the degree of RL that Anthropic and OpenAI both did on those models in terms of coding. And so the things they can now produce from your point with a simple prompt that just wasn't possible before late 2025.

But yeah, what's fascinating to me is I just feel like, and understandably there's a lot of energy focused on what does this mean for this, what's actually like a pretty narrow band of software which is like enterprise software or the serious, even consumer, serious consumer software.

But what's even more fascinating to me is I think these now enable, there's this whole realm of personal software or whatever that just was not economically viable before this because the amount of time it would take to sit down and like, not only is it not worth you, you know, $5 a month to host it but it wouldn't have been worth the three or four days it would have taken to write it.

David: Absolutely. I have many, many working examples.

Joe: Or two weeks in my case. Haha.

David: But yeah, that's right. I mean, for me, it would have been three or four days to write and two weeks to figure out how to turn on the clouds hosted service--

Joe: Right, haha, to deploy it somewhere.

David: For some reason my mind insists on forgetting all of these details between every time I try to deploy my blog. Like, where did I host it, how does it work?

Joe: Well, I guess that's going back to needing a new kind of computer. That actually is, because I'm remembering now you've made this point before that if it used to take two to three weeks to write the software, which for even a pretty trivial app, like internal tool, whatever--

David: If we're being honest, that's how long it takes, right? You say it takes a weekend and then you spend a week fixing it after the weekend.

Joe: Right. So, yeah, so if it takes a couple weeks to write the app, then if it takes like, I don't know, a day or two to deploy the app, like, oh, well, right? That's amortized. If it takes a prompt and like five minutes of the model spinning to create the first version of the app, then like two days to deploy it is not, it's not okay anymore.

David: That's absolutely right. And to some degree the agents help with the deployment as well. The last time I tried to deploy my blog, there was some YAML file that was broken and I'm never touching YAML again. I had an agent do it for me. It figured it out eventually, but I probably consumed as many tokens deploying the thing as I did building it originally.

Joe: Yeah. I was going ask, is that a good use of tokens? Haha.

David: No, it's really not. And if, ideally, if you're building a toy, you're developing in prod, you're working right there and every token you have to spend on minutiae or pageantry around the actual programming is wasted context window, that means a dumber model for actually solving your problem. And this is just like programmers too, right?

We need to get all of the distractions out of the way so we can actually solve hard problems. And models are the same way. You need the thing to be as simple as possible. So you can spend your valuable context window wisely and you need a lot of them. And it needs to not cost a lot and look a lot like a computer.

Joe: Yeah, let's dig in because I think you just kind of offhand mentioned when you're making a toy app you're developing in prod, but I think there's some nuance there because I mean look you know, at Heavybit, full disclaimer, we're happy investors in exe.dev and, and honestly Tailscale before that. But putting that aside, like I'm like an avid exe.dev user daily.

And so that's the hat I'm gonna wear for the rest of this conversation, just as an exe.dev user who really has been enjoying it. But like historically when you would say like oh, I'm working in prod, what you meant was like well I'm working on the app on my laptop and I'm just like YOLO git pushing main on Heroku or something.

So it is going, I am like straight to prod, but I'm not technically developing in prod. But with exe.dev, that's different. You are developing in prod. Right? And so take us through like if you got one exe.dev VM, what does that actually look like?

David: Yeah, so most of my programs, the only user is me and so I start an exe.dev VM.

Joe: Yeah, like true personal software.

David: Yeah, that's right. Which I've always had a little bit of. But I've always had far more lists of personal software I would write rather than actual software.

Joe: Well for me I just always had personal Google spreadsheets. Haha. That's what my personal software was before this.

David: A lot of my software with great effort you could pour into a spreadsheet and I'm sure I would have done a couple of them as spreadsheets. They're more fun this way and they're much more capable.

So I start an exe.dev VM usually with a prompt to start the thing off, but sometimes I just start the VM. The VM starts with a coding agent running on it called Shelley, which is one we wrote. It's a web interface is the key thing. The key thing is it works on your phone. And it's right there when you start the thing. And you can set something running in Shelley, you can walk away, you can see what happened on your phone, you can do the next thing. The agent is right there on the VM with root on the machine.

Joe: Yeah. And it's interesting, sorry just because like working with Shelley because like web based-- So I was thinking about the story. It's sort of like a hybrid between kind of the what you're used to in like a ChatGPT style chat interface, but then like an agent harness interface.

You're not like in the shell and watching things just stream by, but you're not just in some dumb chat either. Like it's doing this. Like, hey, I'm doing this. Hey, I'm doing this. Hey, I'm doing this. So it's kind of interesting hybrid.

David: Yeah. We've tuned it for working dangerously. So it doesn't ever ask you whether it can do something because that's an appalling distraction from work. It wastes time for the agent.

Joe: Well, and this is where being in the VM, like it's kind of like form follows function where you want it to be able to act dangerously. But it's like it's in the VM, it's in prod, so it's fine.

David: Yeah, that's right. If it wants to go edit the systemd startup config, let it, that's fine. If it wants to rearrange the network, good luck. I mean it might take you off the network, it might go terribly wrong.

Joe: Haha. It's never tried to do that for me.

David: But there you go. That's because you didn't ask it to. It's pretty good at that. And then it has some extra supporting machinery that's very easy to do on a VM that's a little bit harder for like when you download on your laptop.

Like it has the equivalent of the Playwright CLI built into it. So it drives Chromium whenever you do web stuff, takes screenshots, looks at the screenshots, says, "oh, I got my CSS wrong." Fixes it a lot like a real programmer.

Joe: Yeah.

David: Fun stuff like that.

Joe: Yeah. Which I mean sounds minimal but at least for me it has been pretty transformative in that when you're actually-- The harness is in production and not just in production like where the code is, but to your point.

So if you're building any kind of web based application, the experience I've felt it's just for me has been the best thing I've found because I can be out in the field using my personal software that I've written with Shelley and I'll notice a bug or something and I'm like, oh, this is annoying. And I'll just flip to the web and say, hey, Shelley, this isn't working right. And Shelley will just go, oh, Hang on. Well, let me check that out and load up the app itself, look at the system logs, like, do everything an engineer would do and say, oh, I found it, I'm going to fix it.

And then I just, you know, hit refresh on my phone and I'm back off to using it. Like the bug has been fixed.

David: Yep, exactly. And it has the, you know, in all the ways that makes it sound scary for writing enterprise software or large collaborative pieces of, "Oh, it has the production database right there."From a development perspective, "Oh, it has the production database right there." Right? It's not using fake data. It goes and looks at the table filled with real data and says, oh, it doesn't work because of an edge case of the data set.

Joe: Yes.

David: Just like a human does when we take a copy of the prod DB and do this.

Joe: Right. Because like you said, you can kind of ask it to when you drop into a fresh VM for really any kind of setup you want. But sort of by default, if you're like, hey, I just want to write this personal side project web app, it'll just use like a SQLite database locally.

David: Yep, it's all you need. You can build a lot of the SQLite database. I have blog posts on this out there in the world. if you want to know how far you can take a SQLite database. You probably shouldn't take it as far as I do, but you can do a lot.

And of course an exe.dev feature that we intend to launch very soon, that we've done the classic thing we do of we've done all the hard work and then haven't put the UI on top, which is that on these VMs, we are regularly snapshotting a history of the disk. And so the whole point here is if you do accidentally delete the prod database, just rewind the VM to the last snapshot and there's your prod database.

And, you know, this is our very big hammer answer to the question of what happens when your agent goes rogue and things go wrong. And it turns out it's a surprisingly effective hammer, and I'm very, very excited to get that in everyone's hands because it's useful.

Joe: Yeah, that's interesting. I'm just actually thinking now I have a few apps running on exe.dev where Shelley and I have already built our own--

David: Did I just sign up for support requests from you--

Joe: Yeah. Haha.

David: Where I have to ask you for a snap for-- All right, we'll just have to ship the feature. Haha.

Joe: Yeah, yeah.

David: Again, going back to the "EC2, but" story, right? I'm writing this personal software. I'm using it myself, or I'm sharing it with my team, or I'm sharing it with some colleagues, or I've written software that I share with a couple of vendors. What do I need?

I need the network security groups on the AWS instance to make sense. And it makes sense in a 2026 way. Which means when I go to the website for the VM, it presents me with my login screen for exe.dev and I log in and I can access it and no one else can.

And there's a share button and I click share and then I get a Google Docs style sharing link to send to people and then they can log in and use it. Like, why would I insist on dumping VMs onto the Internet raw, Building my own auth system usernames and passwords into every personal app? That's the context window you're throwing away.

Joe: Yeah, yeah. The way I've thought about it, there's like literal, just like, you know, there's this kind of spectrum software. At the extreme end, there's literally just like prototypes where it's like either, you know, used to personally spike it out or where it's almost barely more than a mock or you one shot prompt and you just throw the thing away. Kind of the next to what we've been talking about, like the personal SaaS, or personal software rather. But like the next thing after that is what I've been thinking of is like group chat software.

David: Yes.

Joe: Which is like, okay, this is an app that's not just for me personally, it's for me and the three people in this group chat.

David: Yeah. I see some people trying to get OpenClaw to do that. And you know, t here's actually some work in OpenClaw on making it happen. It's very impressive. You know, they're very much at the edge of that.

And you know we play around with Slack integrations for our apps and things like this. Like, what if you just talk. You know, what if you could have a Slack channel to chat with Shelley, for example? These sorts of things.

Joe: Yeah. But even just like to your point, like if I have a piece of software that I've built with, with Shelley and it's in an exe.dev VM and I want-- There's like three or four people or like family software. Right?

It's like I want a grocery shopping, meal planning thing I'm sharing with my partner. You know, it's just a click of like you said, it's like Google share because it shouldn't be like, "oh hey, because I've integrated with Okta or whatever, you're going to get an email to do 2fa, right. Haha.

David: I mean that's what you want in your corporate environment. Right? Like everyone logs in with Okta and all the apps are just there. But yeah, I'm not going to make my wife and kids log into Okta. No, that's, that's not a recipe for anyone's happiness.

Joe: No, exactly.

David: There's nothing wrong with Okta except you know, it's just, it's not family friendly software.

Joe: Yeah.

David: In fact we have Okta support. It's a thing some of our customers--

Joe: Yeah. Well in terms of software existing on a spectrum, I think it's interesting to find these kind of white spaces where it's like oh we sort of historically hacked this by doing this. Or the trade offs of doing it didn't used to be as large but now that these new kinds of software are possible to exist, and at scale. Right?

David: Yeah.

Joe: Like everyone can now create these new kinds of software.

David: Yeah. And there's nothing radically new in what I'm saying. To someone who's an engineer in the field who's been deep in all of these systems, I would say yes, I start a VM and I put an IAM proxy in front of it immediately, automatically. And like IAM proxies have been around for 20 years or 15 at least.

You know, I've not invented anything new there. But very few people know what an IAM proxy is. But everyone needs one. And so this is just a case of, you know, the fundamental defaults are wrong in all of this and the defaults matter because the defaults are how I'm building most of my software now.

And so we're just shipping these features that people need and we're building it into the VMs and we have a whole set to come. We've just started doing integrations in a big way. So again Slack is coming soon for example. But we have a GitHub integration and the whole purpose of our integrations is you use OAuth or similar style token management on our side and we provide HTTP proxies to your VMs to let them use those resources without actually giving them the secrets.

And those secrets then can't be exfilled by an agent out of your VM. You know, they can't check them into GitHub and push them or something like that because they don't have them. They work through a proxy. And so, this is us trying to make the act of like you know, I said at the beginning, I want to create a box, put what the agent needs in the box and then let it run loose. It doesn't actually need the secrets, it needs access to services using the secrets.

Joe: It needs working access to the service.

David: Yeah, that's right. So we're building that machinery and integrations around the VMs, because otherwise everyone is going to have to build that themselves. Everything we're building is just the things I need to use to write computer programs in 2026.

Joe: Yeah. And so then does that work too? Even if you were-- Honestly this is something I've been meaning to plan and try out is I've mostly just been having so much fun like making software in it. But you can get a harness in there, you can get it access to a whole bunch of different MCP servers or whatever.

David: Yeah, that's right. It's the same basic idea. In fact, the generic version of this is you create an HTTP proxy integration and you say what HTTP headers to add on the way out. And that lets you type in, you know, you could get a Stripe key, for example, and type it into the HTTP header. And now it can work with stripe without--

Joe: Without actually exposing that to the agent.

David: Right. Without it checking the key into GitHub or you know emailing it to someone.

Joe: That's cool. Okay. I guess this is just an example though of how working with agents and models requires like a fundamental think of what the underlying infrastructure primitives actually are and how they work and how they interop and there's like a new set of constraints and a new set of capabilities.

David: Yeah, there really are. I mean, you know--

Where did Sketch come from really? It came from Josh and I confronting the fact that 30 something years of programming, the rules have changed, all the fundamentals are different now. And we had to start from scratch. And it's been a very humbling process throwing away all of my best practices, all of my hard won knowledge and saying I have to try again from the beginning. Does this make sense in the world anymore? And we're getting there.

Joe: Right. Real first principles.

David: Yeah, that's right. It's painful. I mean, I didn't learn programming in school, but it feels like I went back to school, except there's no teacher, no one knows what we're doing. And so we've had to figure it out as we go. And exe.dev is very much just like, this is what I need to build the layers on top of. And we're making it available to everyone.

Joe: Okay, well, people should definitely check it out. I enjoy it. That comment though leaves me, I guess, maybe kind of one last topic to explore maybe a bit, because I'm curious your perspective.

I don't know in 10 or 15 years how we'll all talk about the Christmas break of 2025. But everyone went over break and they're like, okay, I've got a little time and I've heard of these new models and we've been through a few iterations of models and they're always a little better. So I'll try out these new ones.

And I think many of us had the same kind of experience of like, "oh, well, this is an actual like step change now." And it just so happened for me, overbreak, I was like, "oh, exe.dev actually just kind of soft launched, dark launched, whatever, I should start to play around."

And I actually was like, I was making sure my Claude Code harness was fully set up because I was just thinking of exe.dev as like a deployment target. And I went logged in my first VM and I saw the Shelley tab and just was like, oh, I should check that out. And Shelley was just like, what do you want to build? And I was like, oh, that's interesting.

And I just told her what I wanted to start. And so it started building and the thing just kind of worked, which surprised me. It just loaded up there and it worked. I was like, that's interesting. So I said, oh, I'll add something to it and add.

And it was a few days later with multiple sessions where all of a sudden, I had this very strange feeling where, going back to what you're saying, like having programmed really for almost about 30, 32 years . And I realized, I have a working piece of software that I've been iterating on for a few days and I actually haven't looked at any of the code yet.

David: Isn't that shocking? That's so hard to believe. But like it's clearly solving your problem. Right? It's a good program. Yeah.

Joe: Yeah. Well, then I kept going, and I kind of at some point had this even weirder epiphany where I was like, huh. At least for this piece of software, I'm actually never going to look at this code.

David: Yeah.

Joe: Like, ever.

David: Yeah. Isn't that amazing? It's incredible. What do I think about this?

So the biggest problem trying to talk about this stuff is people immediately jump too far and they start saying ridiculous things about how, you know, it's the end of software engineering, we don't need programmers anymore, et cetera, et cetera. And obviously we're not saying that, right?

Joe: No, not even remotely.

David: I'm busy trying to hire programmers. Why would I be doing that? No, what's going on here is the set of programs you could write by never looking at the source code went from basically zero to some larger set.

Joe: Some larger set. Yeah. Yeah. And to your point, it is a polarizing topic for some reason. Yeah, I'm not, I know you're not one of those. But for a lot of people, they feel a need to kind of plant a flag either on, like, software engineering is solved or some nonsense like that, or the other side where they're like, well, none of this works.

David: Yeah. And clearly neither of those camps is correct today.

Joe: Or will be tomorrow.

David: Or will be tomorrow. Exactly. I know if you can't figure out how to hire software engineers, you just haven't found a business model that needs software yet. Otherwise you'd be desperately trying and you'd find great software engineers are hard to find.

Joe: Yeah. Getting harder to find every day, it turns out.

David: Exactly. That's not how markets work, when you don't need something. Right?

Joe: Haha. When something's no longer needed, it doesn't get harder to find.

David: Exactly. Yeah. That's a big hint. Yeah. But I know I have programs--

Joe: Like, you can tell this isn't necessary anymore because it's getting really expensive. Haha!

David: I know. I have written programs like this that have had significant impacts on our business that have changed which vendors we choose. And it's really shocking to me that I can do that without ever looking at their source code.

And it's kind of wonderful in a sense, for me, because while I enjoy every element of our craft, at the core, what I really love is making computers do things. Because here was the world before, and now here's an object in the world that does a thing, and this is great. As a programmer, I love not having to read the code.

Joe: Yeah. Just last week Jensen Huang, CEO of Nvidia was being interviewed and made the point, which I've been very clumsily kind of trying to say. And you know, I guess like a world class CEO, he just had this very succinct point. He was like, basically the purpose of a job is different from the task of a job.

David: Yes. Oh, that's exactly true.

Joe: Right? And yes, the task of software engineering historically involved a whole lot of time of being hunched over a keyboard, like a physical analog device and manipulating it to get, to your point, like making computers do things, something you've conceived of, out into the actual computer.

David: I know and sometimes you can get so deep into the weeds you can become completely detached from the thing you're trying to make the computer do. Before you know it, you're writing Kubernetes config files and like what does this program even do? Who knows?

Joe: What did, how did I get here? Where did it start? Where am I?

David: It's any of those. There's a joy in those things.

There's a joy in the craft of doing things, but there's a great joy in solving a problem too.

Joe: Yeah, well, and there are also levels. I mean this is another aspect I do find kind of fascinating. Like yourself, myself, I had Simon Willison on the podcast recently and we were talking about this. There is an aspect of working on these applications, these pieces of software and like you said-- Yeah, it's not even just personal software.

I mean I've built some tools that we use here at work too with it, but working on it, even though I'm not looking at the code, I am interacting with, you know, Shelley or Claude Code from an engineering mindset.

David: Yeah.

Joe: So I'm not speaking at this high level, like "oh, do this." Or like you know, the classic meme, "make the design pop." I'm like, "hey, I think the database table, like, this is how I would like the relations to work in the database because I have some sense of what's best practices.""

And honestly sometimes the agent's like, "oh, that's a good idea. But what have you thought about this?" And we can have a quick back and forth. Or I'm like, "hey, I think like the bounce delay on this like fuzzy search has to be a little higher."

So I am conversing often with it, like I'm still crafting software and making software level decisions often. But I'm just not looking up arcane syntax that I've forgotten.

David: Yeah, I completely agree with that. I have a couple of friends who I've watched, who are not programmers, who I've watched try to write software and it's fascinating because sometimes they actually pull it off and build a useful thing and like that to me is verging on the miraculous. It's pretty rare though.

And often what happens is it would be very easy for me to restate their request in a way that would get them the result they want, but they are missing a lot of knowledge. There is an old joke about this, right? About the consultant who gets called in to fix the supercomputer and spends a couple of days and then like replaces a valve and sends a bill for $50,000 and they ask for it to be itemized.

He sends back an itemized list. As you know, it's a $50 for the valve and then you know, $49,950 for knowing which valve to change.

Joe: Which valve to change. Yeah.

David: Haha. And so there's definitely a lot of that going on. And some of that superior models will improve on. Right? Like if I'm acting as a translation layer with a lot of knowledge between a person and their context and the machine and its context, then that is an object that an intelligence can be built to replace. And some of that can be done, only some of it.

Joe: I guess to your point earlier too is the more widespread this becomes, the more training data around non-software engineers trying to build software will emerge in terms of like when they're asking for X, they actually mean Y or that's the output.

David: Yeah. And more intelligent models too. I think it's not very hard as a software engineer to push a model to failure by just giving it a more sophisticated task. Like a huge part of what I do, and I assume you do, when writing software with models, is take a task and break it up into component pieces to have it done.

Joe: Oh yeah.

David: And as a model gets smarter, the size of the component piece can get larger. But it would take me not 20 minutes to increase the size to a point where a model collapses and produces rubbish.

Joe: Yeah, I guess that's the other interesting thing with using these, that separate-- So there is an aspect of like oh, having all the software engineering experience and knowing, but there's just another aspect because-- I was actually experimenting with like Claude Cowork, which is a pretty thin wrap around Claude Code, but designed for non programmers and in theory to help them automate their daily like knowledge work.

David: Good idea.

Joe: Yeah, yeah. And I have worked with it to do that. And it struck me while I was doing this, I was like, oh, I've been writing shell scripts and automating toil my entire career. And so I drop into this with the same mindset of like, okay, well let's get this small unit of work done and let's make sure it works. Okay. And then let's add another piece and like, oh, that's not, you know, and let's kind of keep iterating until that works. Let's keep iterating--

Like I do wonder, going back to what Simon and I were talking about-- There was a blog post recently that said the-- We'll put in the show notes. Basically something like, "the humans do not yearn for automation." And was making the point that there's this like, this kind of like software brain affliction which I know you have. I have. But like most normal people without this affliction, like sitting down, like I spent hours in Cowork automating this and it was a semi complicated workflow with lots of different tools and documents and whatever, but still I was like, I don't see the average person sitting down to do that.

David: No, I agree. That's a job. Right? And it's a job that requires a lot of understanding of how a computer works and what it has access to how it functions.

Joe: And what the failure modes are.

David: And just like compilers take writing Assembly away from me, I have disassembled the output of many programs of my career and had to stare at broken Assembly in one form or another, usually performance these days.

Joe: Hexdump.

David: Yeah. Or a disassembler that almost works. LLMs are amazing disassemblers by the way. They've changed the game.

Joe: Yeah. This is a fascinating, completely unrelated side note, but I'd seen something recently about like basically stripping debug symbols is no longer like, don't worry about it. LLMs are so good at disassembling.

David: Yeah. If you leave the debug symbols in, they'll just rewrite your program wholesale. It's incredible. So I know, I watched a friend take a hexdump out a Wireshark as a screenshot, paste it into ChatGPT and say so what protocol is this? It's like, oh, that's clearly VNC embedded in something else. And it was correct and like it pulled it apart and showed all the details. I was like, wow, I used to have to do that myself.

Joe: Wow. Wild.

David: Yeah.

So I think you're right that the act of breaking a larger task down to a smaller task is not a self contained thing you do in a box. You do it with full knowledge of the world, either side, both the machines that can solve the problem and the shape of the world. And what you do as part of that is you change the task to adjust for the worlds around it. And it's going to be a while before models have enough understanding of the universe in which they're operating to be able to do that very sophisticated work.

I'm not going to go saying it's impossible but there's a very steep hill to climb and it's tall.

Joe: Yeah. Well, coming back, one of the things that kind of I've been thinking a lot about. So it's now possible as we've been talking about for some classes of software, some pieces of software that you can ask an agent and iteratively work with the agent to build and it's a working piece of software.

I mean I think my most complicated app on exe.dev, I just asked Shelley last-- which actually I didn't even know if this would work, but it did. You can ask Shelley to actually just go look at the history of all the work and sessions.

David: Very useful.

Joe: Yeah, it's super useful feature I didn't know.

David: Yeah.

Joe: And it's a feature I didn't know, it's like 20,000 lines of code and we've done like 30 different database schema migrations and we've been working on it for almost three months now. I'm still using it and there's been a total of 16 hours of my time spread over three months and my typical prompt is 50 words.

David: Wow, this is fun. I haven't thought of doing this.

Joe: You should check it out. You could ask. It does tabular display and everything on the data. But coming back to it. So I think there actually absolutely is--

And by the way, we know and we have some people are going to be coming to or hosting a conference tomorrow that you're actually speaking at. We'll also put that in the show notes so because there'll be some great sessions there people can follow and watch.

But there are people using these techniques now in actual working enterprise software companies. And I think there's like some just clear obvious places where if, like, oh, there's a, I don't know, sentry error for a given API route, like handing that to an agent, like literally just automatically handing that to an agent, it gets routed to an agent the agent can look at-- API routes tend to in many cases be fairly decoupled from other ones. It's just like messing around with data in the database.

It can identify the problem, it can merge a patch, it can test it and it can ship it out. And so given that that's now possible and people are doing it, I've been struck by, for all of IT history, like humans deciding to push to ship something to production or to put code in production that they have reviewed, either they wrote or they reviewed or both and are accountable for, for some non trivial amount of working software in production that is going away.

David: Yeah, it is.

Joe: And that is going to remove like basically a bottleneck. Like the volume of changes that can hit prod is going to now go up dramatically.

David: The volume has already gone up.

Joe: Well that's true, it already has gone up.

David: What's become clear in the volume going up is how fragile our current processes are for handling code. So this is all about code review, which is actually the thing I'm talking about tomorrow. And the problem is very clear, which is programmers have a certain amount of bandwidth every day for writing code and for reviewing code.

Joe: And for eating and sleeping and everything. Haha.

David: Oh, if you have to, I mean, you know the priorities. Priorities, Joe. Haha. It's coding.

Joe: Coding and reviewing. Yeah.

David: There's a reason I had to start another company. But you could see this like, you know, it used to be, and it's actually really hard for me to remember this. It used to be that it would take a day to get my code reviewed on average.

Joe: In my first job you had to schedule them with like three people and it took weeks to get your code reviewed.

David: Yeah. I mean I've worked on teams where it took weeks to get code reviews and I would just call those dysfunctional teams.

Joe: Yes, yeah.

David: But you know, a very functional team, a day turnaround on a code review was pretty common.

Joe: Oh yeah, I would think even for a very functional team like that person who wants to review your code, they have their own code to write and so a day is pretty good actually.

David: That's right. So we were already at our review limit before LLMs showed up and it was already a minor issue, I would say on teams that we could have improved productivity by reducing review loops. That was the state of the world. And then what we did was we made it much faster to write whole subsets of problems get written really quickly. And we created two problems in the process of doing that.

One is we just created more of that natural review load. But the second problem is we introduced a second review phase because any programmer who I want to work with who's going to send me code to review is going to at least read it beforehand. I didn't used to have to read it because I wrote it.

Joe: Right, right. Like the definition of writing code is also reviewing it.

David: Yes, that's right. And you know, it was common for me on very functional teams to do a quick review of my code again one second before I send it out just to look for obvious mistakes I'd made.

Joe: Right. You walk away, you sleep on it, you get a coffee. Checking your work.

David: Exactly. Yeah. So but you know, there was easy to review the thing I'd just written because I knew how it all worked. Now I have an agent write code to a spec I created. I have to review it, make sure it makes sense. And then in the old software development lifecycle I meant to send it to a second human to read.

And so for every piece of code we've written we are now in the state of we've doubled the amount of code review we have to do. But also we've made it much easier to write code. So we've doubled the review process in the natural state and then increased the total load.

Joe: Well, "much easier," as I'm sure most people listening know, is like dramatically understated. Like, it's orders of magnitude.

David: I know, there's a natural doubling in the structure of the process and then probably a 10x somewhere else.

Joe: 10x, 100x, 1000x, it's hard to quantify.

David: And if you work in developer tools this can actually be hard to remember but to some point we changed programming around our processes. Like we build tools to make code review work better. Like we changed programs.

But fundamentally there are physical technical limits of the machines we're trying to make do things, programming results from those technical limits. And we as programmers have to build processes working around those technical limits. The technical limits have fundamentally changed because we have models writing code now. This is a thing in the world that is different and it's an object in the physical universe we have to deal with and so our processes have to change.

Joe: Well, and I think also we can have models, and again I think there's a ton of nuance that exceeds the scope of today's conversation. But like there are certain cases and scenarios, not all of them, but where models can write code in production.

And if models can write code in production, I mean the thing that, you know, you mentioned developer tools, I mean at Heavybit, all we do is invest in infrastructure, developer tools, devsecops, everything to do though with like building, operating, writing, running, securing code ultimately in production.

And if the human review, and to your point like writing in review was this like rate limiter just because of the physical limits of a human being into how many changes could go into production, then--Like the whole pipeline, infrastructure, everything has these--

Every single system has design constraints both explicit and implicit. And it has this implicit design constraint of like any reasonable thing is not going to scale faster than the next. This is kind of famously why organizations like Google or Facebook always have to build everything in house because they're these like unique companies at their scale where like all of the tools intended for an average typical company and industry don't work at their scale.

David: Absolutely.

Joe: But now if you fully embrace these new techniques and like every-- Those tools don't work for anyone anymore.

David: Yeah, we're in a lot of trouble.

Joe: Yeah, a lot of trouble. Haha.

David: Like at exe.dev, what we've done is we keep the team small and it's a very high trust team and I can rely on anyone on the team who says I'm going to go solve a problem. I'm like, "okay, it's okay. They've got it under control. I don't have to think about it now."

And so our process is we use agents to write code, we review the code and then we push it to production and deploy. And there's no other human in the loop. There's just the one person driving the agent who does the work. And this leads to extreme situations. Like yesterday a colleague of mine was showing me how when people talk about bugs in our Discord, which they do every day, he grabs the Discord link to the message and he drops it into Shelley and types " please fix." And hits enter.

And then on his phone he has a list who looks at the code and he hits okay and it pushes to main. And then we have continuous deployment that deploys it. And we were talking about it because the one missing link is the bot, then doesn't respond on Discord and say we fixed it. We always forget to tell people we fixed the bugs we fixed. And we were like, oh, that's bad marketing. We should probably tell them we've actually done the work.

But the only human intensive element of that whole process is the very kind user who sent us the bug report and the person on the team who had to read the code before they pressed the button. And so code review is now all of the work for that workflow.

And obviously we're building new features right now which involve we sit around and discuss the design for hours beforehand. And obviously that work still exists. and then there's a lot of guiding agents to do it, but the huge amounts of programming now it's all just code review. That's all of the work. So what we need to do is figure out where we can review less.

And there are some obvious cases. Anywhere in your system where you have a real, a true working undo button, and that unfortunately means your system doesn't have the ability to exfil data to the public. Because you don't actually get an undo button on that, unfortunately.

But any way you can, yes, your agent should be writing code in production and doing it without you in the loop because you can undo it later and that's more productive than you gating it.

Joe: Yeah, yeah. I mean, I think there's going to be a whole discipline emerge around assessing-- Because I think even, you know, for a code base of any reasonable scale and sophistication, it's not like a single thing, it's composed of all these different pieces that, you know, like specifically according to Conway's Law, tend to like mirror your organization and so very human centric shape usually.

But like some of those pieces, like you said, will be stateless and undos are easy and changes are idempotent either way. And there'll be other pieces where it's. To your point, it's like, oh, we don't get to undo that.

David: Yeah, I don't want to scoop my colleague who's writing a blog post on this topic, who's trying to create a term for it, but there is a whole category of agent you can build and I think this is actually very important that small agent harnesses are very easy to write and probably every programmer should write at least one. And I Actually think it's now a thing that you do to solve some problems.

Joe: It's going to be like building your own lightsaber as a Jedi. Like you have to write, you have to make your own harness. Haha.

David: Yes. It's going to be next to Compilers 101 in a CS course or something like that. But it turns out for some of these problems where you want to completely automate systems that run and do interesting code work without you in the loop, what you can do is you can write a harness that is constrained in what it can do so that it can't exfil data and has certain tools available to it.

And so this is a whole category of programming that should-- So we've deployed several at work. We have a great one that every time a support request comes in, it does a very deep dive into all of our logs and all of our source code and our previous discussions with the person and such.

And it produces a potential answer that sits right there for when you go to read the support email. There's like, well, here's my analysis and here's a potential solution. And you know, it's very important we structured this thing in a way where it can't exfil any data. And so we had to write a custom agent to do this.

But 75% of the support emails that come in, the answer is sitting right there next to the email. Our job is to review that and send it.

Joe: Yeah, I guess that's interesting. I hadn't even thought, if you are fully customizing the harness. I mean, part of what you can do is just pick and choose what tools you give it. And for that particular, whatever the particular scenario is, you can choose to not give it the tools that could cause damage.

David: Yeah. And I think you were talking about Simon earlier. He has the lethal trifecta, Right? And designing agents that can solve a problem without being vulnerable to that is extraordinarily difficult, but very valuable when you can do it.

And you can have a huge effect on your company when you do. And so one of the things we're trying to do with exe.dev is build tools into the VMs to enable you to build these sorts of agents, because we think every company needs a dozen and they need to be custom to your company. I don't think you can buy these off the shelf.

Joe: Yeah, yeah. Well, to your point, the lethal trifecta, this notion, if I remember correctly, that if you have three things, if you have an agent that has access to tools and internal private data and also can consume instructions from the Internet, you're cooked. Haha.

David: Yes. You're cooked. Yeah, that's right. And I think the lethal trifecta actually looks a lot like the CAP Theorem. It's one of these sort of immutable laws, like there's no solution to it. What you can do is you can solve a whole set of problems within the constraints of it.

Joe: Right.

David: And each problem is subtle and hard to build, just like the databases that fight the CAP Theorem.

Joe: Okay. Well, David, I really appreciate the time. This is an awesome conversation. I could keep going probably forever.

And so people can find, your current work it's just exe.dev. I'll put that in the show notes. But yeah, it was so much fun doing this doing it in person. This is great. We'll have to do it again sometime.

David: Yeah. Thank you. This was wonderful.

Content from the Library

Visit library

Jul 9, 2026

Podcast

O11ycast Ep. #92, Confidence Is the New Bottleneck with Ray Myers

On episode 92 of o11ycast, Ray Myers joins Ken and Jess to explore how observability, reliability engineering, and formal...

Jul 7, 2026

Podcast

Lab Notes Ep. #2, The Infrastructure of Intelligence with Hanchen Li

On episode 2 of Lab Notes, Amir Zohrenejad sits down with Hanchen Li to explore the systems that make modern AI agents faster,...

Jun 30, 2026

Podcast

Open Source Ready Ep. #40, Terminal Innovation in the AI Era with Orhun Parmaksiz

On episode 40 of Open Source Ready, Brian Douglas and John McBride sit down with Orhun Parmaksiz to explore Ratty, his...