MAY 14, 2026

51 MIN

Ep. #10, The Human Brain in Software Development with Steve Krouse

GuestsSteve Krouse

light mode

about the episode

On episode 10 of High Leverage, Joe Ruscio sits down with Steve Krouse to discuss the rapidly evolving relationship between AI and programming. Steve shares lessons from building Val Town at the center of the AI tooling wave, why he believes better abstractions will define the future of software, and how engineers can avoid becoming passive operators in an increasingly agent-driven world.

about the guests

Steve Krouse is the founder and CEO of Val Town, a platform focused on making serverless programming more approachable, collaborative, and AI-native. Prior to founding Val Town, Steve worked across developer tools, education, and programming environments, including running coding programs for students and building communities around the future of software development. He is also known for his essays and commentary on AI, abstraction, and the evolving role of humans in programming.

about the episode

about the guests

show notes

transcript

Joe Ruscio: All right, everybody, welcome back. Thanks for joining us for another episode of High Leverage. I'm joined today by Steve Krouse, CEO and founder of Val Town. Hey, Steve.

Steve Krouse: Hey, Joe. Excited to be here.

Joe: Yeah. Well, I've wanted to have you on. You've been writing a lot of really interesting essays kind of on the whole moment we all find ourselves in. And I've been talking with a lot of people that want to get your kind of broad opinions on a lot of things to do with agentic engineering, future of code, future of humans, like all the above.

A good place to start, though usually I do like, what's your origin story with computing? What brought you to Val Town and what are you excited about these days?

Steve: Awesome. Yeah. So much exciting stuff to talk about. To quickly address your point about how I've been writing all these blogs on the Internet. I feel like I'm constantly getting suppressed and constantly being wrong and in that spirit, I feel like I'm excited to talk about where I've been surprised and lay out some more firm predictions that, you know, we'll see how many of them hold up.

Joe: Yeah. Strong opinions, loosely held.

Steve: Yeah. There you go. It's nice to be crisp about what you believe so that when you're wrong, you learn. If you're muddy in the first place, then it's harder to learn.

Joe: Yeah.

Steve: So my story is, if anyone knows what a logo kit is or the Logo programming language--

Joe: Yeah.

Steve: Seymour Papert and Alan Kay. That's like a quick way to describe me, but the longer version is that I grew up as a kid who loved computers, but I didn't love school, didn't love math, and then I went to this after school computer science program that taught Logo and then Scheme and Haskell and I fell in love with computer science and I got good at math through that.

It was only later in college when I read Bret Victor's essay that pointed me to Seymour Papert's book Mindstorms that I realized it was kind of all on purpose. Seymour Pappard designed Logo to make kids fall in love with math and get good at math and become smarter through growing up in Math Land, is what he called it.

That's a brief way to describe how I got bit by what later I learned is like the developer tools bug. For me it was like about like the power of programming languages and programming environments to change the way you think and change like even who you are as a person.

Joe: Power of abstractions. And then how is that made manifest in Val Town? Give us a quick run down. Val Town is relevant to that, right?

Steve: Yes, yes. So I spent the last 10 years doing developer tools related things. I worked at Looker, which is kind of like a developer tool. It's a business intelligence tool. There were some cool abstractions there.

I started a after school program teaching kids to code, inspired by the one I went to when I was a kid. And I built programming language environments for kids, which is great. Then I had this podcast called the Future of Coding Podcast which turned into this like Slack community and still like lives on beyond me, which is amazing. They renamed to the Feeling of Computing podcast.

Joe: Oh, interesting. Well, we'll come back to that.

Steve: Yeah, yeah, super interesting to see what the community's done without me. And then I worked on trying to start a Dev Tools company. I was inspired by a lot of other Dev Tools companies and the work I do with kids.

It turns out kids didn't want other kid programming languages, they just wanted adult programming languages to be accessible to kids. And so in a lot of ways that's what Val Town is. It's like a developer tool that is learnable by non programmers and children. That was always the dream of Val Town.

Joe: Remind me, when did you start that?

Steve: Val Town started July 2022. So six months before ChatGPT launched.

Joe: Right. Yeah, yeah. So impressively prescient, I would think.

Steve: Oh no the opposite. Like totally blindsided by AI.

Joe: That was a polite way to say good timing, serendipity. Haha.

Steve: I guess. I don't know, to be honest, it feels like it's like terrible timing to start a developer tool. We can get into this, but I feel like Val Town is-- we wrote one blog about this like a year and a half ago, about how Val Town has just been fast following all of the ways that programming has been changing.

Like pretty much as soon as I built the first version of Val Town, developers were asking for Copilot-like completions in our web based text editor. So we had to build that and then people wanted something like ChatGPT, so we had to build that. And then Claude Artifacts came out so we had to like fast follow that.

Joe: Oh, interesting. Yeah.

Steve: So it feels like, I guess like what we were saying, you know, we only talked for like five seconds before we started recording, but one of the things you said is, you know, things are moving very quickly.

I feel like five or ten years ago I heard the phrase when describing the singularity that it would be like a year's worth of progress in a month. And I think that phrase has been rattling around in my head that like that's what's happening now and developer tools is at the like white hot center of that. Like it's just so fast.

Joe: Yeah. Well, I do think it's kind of fascinating in hindsight over the last, you know, pushing three and a half years now, right from the ChatGPT moment, how, I mean there's always been an ongoing, this kind of almost existential dread or concern about like what level of automation is going to have on human employment and condition. And it seems like it has settled.

I mean there's, there's like a class of tasks where it's like, okay, this is a fairly automatable task, like language translation, like you're translating documents. The mayor of San Jose was on a podcast recently talking about, like they're trying to find really productive uses of the tech. And one of the things they do, they used to pay human translators like at city hall meetings or whatever.

And they could afford to pay like one in Spanish and maybe like one other language. And now they're not paying humans anything. And they have like 20 or 30, like whatever language you want, they have like real time translation. So if you're coming as a ESL resident of San Jose, and you want to participate in the government, you can.

And so there are certain tasks like that and history is just full of automation taking like some kind of task where it's like, okay, this was basically 100% very focused on a thing and automated way. But I think looking past that I'd found it almost ironic that there's a whole lot of higher level work that I think is yet to really see impacts.

I mean people are using the tools but whereas software engineering, three and a half years in it turns out is the place where it's like seeing the most, I'd argue, seeing the most impact. And I was like oh, wouldn't this be kind of ironic if it turns out the only people really automated out of a job are the engineers, the software engineers, the developers. Haha.

That would be a bit of a self-own. But I want to get into actually because I do think to your point where you're like fast following with Val Town for now three years. Like you're right at the edge. What's the part of this that you feel is like most interesting and real? And then in contrast what's the part that you feel is like most overhyped and silly?

Steve: Okay, yeah, yeah great question. I feel like I maybe should take a step back and give a more prosaic description of what Val Town is because I don't think I did a great job of that.

Joe: Oh yeah, yeah, go ahead.

Steve: Val Town at its simplest is a website, you know Val Town where we have like roughly a text box, a code mirror text box where you can write code code and we'll run it for you on various triggers like uh, on an incoming web HTTP request. That's a web server. Cron, email based triggers, and you can like just hit the run button and we'll run your code on our servers in a scalable way.

So it's like a functions as a service platform. Our original tagline was "If GitHub Gists could run, and AWS Lambda were fun." So that's the basics of our platform. We want to bring programming to the web era and our current surfaces, you know we recommend using our website.

We also have an AI agent on our website which I can get into. That agent is built on top of our MCP server. So if you'd rather use Claude Code and MCP to like deploy to Val Town, we have that. And we also have a CLI for people who prefer have their agents use a CLI which I also want to talk about. I think that's an interesting pattern.

So that's as Val Town, prosaically. Back to your question about, I guess you're saying what's the realest-- Like what's the most exciting thing about AI and what's overhyped?

Joe: Yeah. And I guess, I mean I was asking kind of specifically around software engineering, but we can go as narrow or broad as you'd like.

Steve: The thing that feels most real, like where I'm spending most of my thinking, what I constantly remind myself about is the bitter lesson and constantly being like the work I did last quarter or last year being like washed away by the bitter lesson. It's something that I constantly feel.

Like that feels very real, like models continuing to get better and wash away like random little tricks. That feels very real. I guess that's another way to say like Opus 4.5. Opus 4.6.

Joe: Yeah.

Steve: It is like significantly better than what we had before and it like really was a step change and enabled a lot.

Joe: Yeah and I think I've been seeing there's, I mean it's anecdotal but a lot of people kind of running experiments, and to the extent that like your harnesses with the, not you specifically but one's harnesses with the little tricks and things for the prior generations models are often like harmful and limiting to the new models.

Steve: Yeah, definitely.

Joe: You know, we started corresponding a bit because I've been fascinated and started writing just a little bit about and going to that similar kind of inflection point. If you look at, you know, the latest models at the point of this recording being Opus 4.6 and Codex. Mythos is not quite out yet, so it'll be interesting to future listeners to kind of view this through that lens, wherever that lands.

But I was struck, you know, in the last few months and around the holidays by the types and pieces of code and even like working applications that the currently-latest models could string together without too much like extra handholding or prompting, which I don't think was the case prior. Right? Where you could, you could see them as accelerants like I had been viewing for a couple years of, "well, of course engineers are going to use this technology, human engineers, and it's going to make them faster and they're not going to have to write for loops or remember like arcane syntax."

But generally speaking this is a force multiplier and it's only in I guess the last six months where it's interesting to me that there's potentially the beginnings now of like, oh, there are some fully automatable software engineering tasks. So I guess the question is like what percentage of software engineering is fully automatable? Maybe that's another way to look at it.

Steve: Yeah. I think I roughly agree. Automatable is an interesting word. Automating software engineering tasks has a very long history. You know like every time we write a compiler, that's automating software engineering tasks. Or like IDEs that can like do refactors for you or Dependabot.

Building tools that use automation in the job of software engineering, that's very old but very precise. Like precise, brittle automations. You're talking about being able to delegate like fuzzy tasks that you can only delegate to a human.

Joe: Yeah. Or historically could only delegate to a human like or "fuzzier tasks," maybe is the right way. What fascinates me is, so Val Town, like you said "Lambda but fun." Right? And so that to me you know indicates like oh if I need like some, certainly I'm sure it does more than this but certainly, if I need like some function to run on the probably like a web hook. Right?

Or it does some kind of like takes an incoming request body and does something when, ships it off somewhere else or renders it or whatever. You could probably describe to one of the agents like your agents like a high level task and semi fuzzy and it's like, "Cool. Here's the function you need and it's all set up to run in Val Town and you can take a quick look."

So that is generally working pretty well. And then if you zoom way out. I do think it's interesting. You know you've got things like the totally unrelated but similarly named like Gas Town. Right?

Steve: Yes.

Joe: Where-- Although I guess also a Steve. Steve Yegge. That's kind of weird.

Steve: Yes.

Joe: But you know Steve Yegge-- which I think serves like a really interesting purpose in that people like on that side he's like let's see literally how far we can push this. Right? Which in some ways kind of reminds me this is just the pop my head but there was Minecraft servers kind of famously just keep generating additional landscape as you go further and further out.

And I remember there was someone who said, like, I just kept walking to see what happens when you force the Minecraft server to just keep-- Because at some point you hit like the memory limits, you know the underlying process running the Minecraft server and they had recorded and it starts to like degrade in really weird and funky ways.

And I think stuff like Gas Town is providing this very useful function where it's like, "oh, if we just really push this, where do you end up with? And it's like, well, you end up with, like, I don't know, pushing a million lines of Typescript to have an issue tracker or something," which works.

But also, I think you'd stop short and say, well, that's maybe not like, ideally automated software engineering. So somewhere between the function-- I guess that's what interests me. And I'm not sure. I'm curious about your opinion.

Somewhere between, like, oh, I can spit out a function that's tight and works, and here's this monstrosity, and no offense, I'm not picking on Gas Town, but here's this large thing that's probably not maintainable. Right? Where do you think that spectrum is? Where do you think is the right place for humans to pop in or work? What do you think is the future of humans in software development and software engineering?

Steve: Yes. Great question. So almost a year ago, I gave a talk called The Role of the Human Brain in Software Development.

Joe: Ooh, that sounds relevant.

Steve: And I start with making fun of Steve Yegge. I should probably email him an apology and check. I hope he's okay with us using him as, like, a reference point that's like an extreme. I wonder what other examples of this would be. I don't know, I feel like Nathan Fielder in comedy is like an example.

It's fun when someone takes something to extreme, and then we can all point at that person.

Joe: Well, I think it's valuable when someone takes something extreme. Like, you discover something. Right?

Steve: Super valuable.

Joe: Yeah.

Steve: And like, they end up making a fool of themselves sometimes in ways that are useful to us. It's like a very brave, vulnerable thing for Steve Yegge to be doing. And it provides a very, very valuable service to the rest of us that, like, if someone is pushing boundaries.

So I start with pointing out some of his predictions, some of which kind of come true. I rewatched that talk maybe four months ago, and I was like, yeah, that this talk mostly holds up. But there's some things Steve Yegge predicted that I was like, "no, that's not going to happen," that totally are happening.

Like, he talks about how we're going to have agents managing agents, and I was like, that's so dumb. And yet sometimes I go to Claude Code and say, like, I give it four pieces of instruction. And then I say, "actually those are pretty scoped. Spin up four agents to do those and let me know what comes back." And that's super.

And then I go off and do something else, I come back and I have this amazing report that I couldn't have gotten any other way. So he was totally right and I was wrong on the agents managing agents thing.

I think there's like a deeper thing that maybe I'll go into later. Like I have this thing against the word "management,"which I really go into a lot in my talk.

Joe: Yeah. And by the way, I've had a similar experience of where there were kind of stronger opinions loosely held 18 or 24 months ago that, you know, now are not held, you know, and which I think, you know, I think like you're saying at the beginning of the show is important.

I do fundamentally believe there's a role for humans in software engineering and large scale software development. And I should qualify, like, if there are significant systems, systems that are going to have like, you know, significant reliability and functional requirements that like some number of humans are going to be required to make sure that like the fuzzy business requirements. Right? And I think you've written about this too. Like, English is an astoundingly imprecise language. Right?

Steve: Yes.

Joe: And machine code is very precise, the stuff that runs on the processors. Right? And so like bridging that gap has always been, and I think to some degree will remain a human concern or job. But where the line sits on that spectrum I think is interesting.

Steve: I think you and I mostly agree, and we definitely agree that today that's the case. I think given what I've said so far in this conversation, it might surprise you and others to hear that I, I feel like eventually that what you said might not be true.

Like I don't see anything in principle that prevents AI from being as smart as the smartest technical minds that exist in the smartest human mind, technical minds that exist. So, like, if you're going to trust, I don't know-- I feel like Mitchell Hashimoto is like a good example of like a genius CTO kind of guy.

Like there's no reason to think that Opus 7 or Opus 11, I don't know-- And you know, is it 10 years, 20 years, 100 years from now? That's up for debate. But I think there's no reason in principle to think that that's not possible to reach that level of intelligence in a different computational substrate than neurons in a brain.

And then it's less clear if what you say is true. Are we on the same page about that?

Joe: Yeah, yeah, no, I do, on a long enough time horizon, I think there's some interesting questions there. You know, I think what we're still figuring out right now and you know, I think with a lot of the smartest companies and-- "Software factories" I think is a good term and maybe we'll come up with a better one.

But a lot of teams right now, a lot of our portfolio companies, a lot of founders of other companies I'm talking to are, you know, taking the first steps to like, "Okay, of our software development process that our business depends on what pieces of it can we try to more fully automate."

And a lot of people are doing the kind of thing where like, oh, if a bug report or even just like a report comes in from, I don't know, Sentry or Datadog, like "we've had a, you know, 500 error." Is that something that can be dropped straight into an agentic process that can then kind of do the research and do a PR? Because a lot of times those tend to be like, oh, it's like a three line fix. Right?

Steve: Yeah, I like that. And I wish we did more of that at my company. I'm getting like some internal resistance to it. But I will say that I hate the phrase "software factory" for some reason like really bugs me. Haha. But like what you just described, I want.

Joe: Yeah, I assume it kind of comes from, you know, if you think of factories as things with like assembly lines or that are automated-- Which is interesting because there are some philosophically speaking parallels to the conversation we're having and the actual introduction of the assembly line into factories, you know, a hundred something years ago.

Steve: I'm actually really glad that I listened to the Acquired episode on Hermes just last week because I think that's a really interesting touch point. Because they resisted factory and automation. They're like the only luxury brand that each product, like a Birkin bag, is built by a single artisan. And the way that they've scaled is that they've had to build training schools to train artisans.

And they believe that if you put more than 250 people working on bags in a physical structure, that it's too much like a factory. And so they limit it at 250. And so they build two new workplaces, craftsman shops per year, physically around the world so that they don't, you know, get that. So in some ways Hermes is like an interesting counterpoint that's more craftsman vibe that I hope to retain in software engineering.

Joe: Yeah, that's an interesting point. You know, one of the things I think will happen and that we've seen, that being a great example, but in the industrialization of other production things. Like slow food or master craftsman. Like there are still people doing hand leather work and hand metal work. These master craftsman, right? I suspect we're going to see--

And you mentioned I always use, actually I use Mitchell as an example of this and Ghostty is a great example. Like I still think there will be pieces of open source software like Ghostty built by people like that just really like just for the love of the game. Right? And for like a burning need to like have a thing not only exist, but to have that thing exist and to be beautiful and to be like perfect for their definition.

So in that sense I don't believe that like open source software is going away for like pieces of software like Ghostty. But one of the things that struck me, I haven't seen a lot of people talking about is I do think there's this long tail--

If you look like when GitHub came up 20 years ago, a big part of the initial push, there were people where they were Rubyists with like Ruby Gems or later like JavaScript people writing npm packages. I mean I still have some on my GitHub page like old Ruby Gems or whatever I wrote.

Like there's this long tail of open source that I suspect is going away. And what impact does that have for software developers?

Steve: Yeah, yeah, yeah. In some sense I agree with you and we're on the same page. Like you know, what happened to Stack Overflow is crazy to me. That was not on my bingo card that Stack Overflow would just totally disappear.

Joe: Yeah.

Steve: And Val Town, our original bet was on making the social network of code where people reuse each other's functions and remix each other's code and that totally, you know, mostly has not paid off at all. People would rather just regenerate code from scratch, exactly tailored to their needs than use someone else's code.

Except--and this gets back to your Ghostty, Mitchell Hashimoto point I think--except for amazing abstractions. And I think that's maybe one place, I don't know if we disagree, but like, that's one of my theses that I'm trying to champion.

Joe: What's that? Around abstractions?

Steve:

I'm trying to champion the importance of abstraction and how abstraction is only going to get more important.

So I think to your point about like, maybe there's a long tail of open source that's like going to be less relevant because you could just like regenerate the code from scratch, like you don't need like a library that's like only 50 lines. You could just like inline those 50 lines, right?

Joe: Yeah. It feels like it would be a kind of rough heuristic, but there'd be some heuristic where it's like "if this open source project is less than n thousand lines, like it's not going to need to exist moving forward," unless there's something orthogonal, which is maybe what you're getting at. What would that abstraction mean?

Steve: So I think Lib Ghostty was a good example. Another example that's close to my heart that I can probably speak more intelligently to is CodeMirror. But there are a lot of examples we could talk about functional reactive programming. We talked about Stripe as like a set of abstractions.

But anyways, let's talk about CodeMirror. CodeMirror is, I think it's on version six now. Maybe version five. Like Mitchell Hashimoto, Marijn Haverbeke is a genius. Like my co-founder Tom thinks of him as like this utter genius who's obsessed with this one problem. Oh, another folks that my co-founder Tom is obsessed with or thinks very highly of is the Zero folks, the sync engine, Robocorp.

So to your point about like the craftsperson who is obsessed with this problem and builds this wonderful abstraction that's beautiful. When you described it, you gave the vibe of like it's a person's soul is in it. And that's like an Hermes kind of idea that like someone just cares so much and there's this human passion in it and like, that's part of why it's valuable. I think that is partially true.

But as LLMs get smarter, we will be able to create better abstractions. Like right now the only people who can make these amazing game changing new abstractions are uber geniuses who are obsessed with something. But one of the things that I think nobody is realizing is that as LLM gets smarter and smarter, that ability will be democratized.

I think we've all been tricked by the current dial up era we're in. We're all so concerned about like, these little tokens and the expensiveness of the tokens. Like nobody's seeing Netflix because none of us realize how much further things are going to go, how quickly.

Joe: Yeah.

Steve: And so, we're all thinking like, "oh," or like, I feel like I was tricked. We were all thinking like, how much shitty code can we output and get away with in a safe way? Like, how many thousands of lines of bad code can we get away with in a safe way? And to me that's like the wrong question.

Once we have Opus 7 or Opus 11 or whatever, it'll be so smart it will write fewer lines of amazing code. It'll be as smart as, or smarter than the best human programmers. And the best human programmers write less code with that's more abstract in a precise way. And it's more beautiful code.

Joe: Yeah.

Steve: It's only now that it seems like more AI means more bad code.

Joe: Yeah. Yeah, that's an interesting point. Two things on that. I mean specifically and I want to be careful here because like, functional code and like beautiful code-- Well, I would argue beautiful code is always functional. Maybe that's a whole separate disagreement.

Steve: Sure.

Joe: But I'd say beautiful code is always functional. And it's like, you know, form follows function if you're an architecture student. But functional code does not always have to be beautiful. For sure. And you know, there's many cases where I guess, like, the cost of, you know, it's just cheaper to make something function.

You wrote recently on the experience of using AI to actually write, like write an essay. Right? And this is something that I've had recent experience with as well.

And honestly, of all people, like Ben Affleck, like a month or whatever ago, had a really interesting podcast where I think he just very succinctly made the point that he's like, well, these things are trained on the output of all humanity, which is a normal distribution. I don't think it's a normal distribution, but it is a normal distribution. And so what it's trained to do is give you the average and median in this case output, which is not that good.

And I do think it'll be a more difficult task to get to a point where these things can output something that's actual beautiful and crafts versus just ruthlessly functional. They're going to get so good at being just ruthlessly functional and correct. But I I wonder about beautiful.

I don't know what, what are your thoughts there?

Steve: Yeah, I think I might be more AI pilled than you on this one. I guess I have two points for you in return. One point is that I feel like the Ben Affleck analysis that it's like trained on all human text and it's like regression to the mean. I think that's missing the point of what next token prediction is about.

I think next token prediction isn't, it isn't training models to like be the most average of all humans. It's doing a more complicated thing which is that in order to predict the next token in some string of text, the only way to do that well at all is to somehow be incredibly smart or like to produce intelligence.

And so, what I should say is I'm reading this book called What Is Intelligence? This new MIT Press book where he makes this point and I really really love it that like before LLMs were out we all had this, like the people in the field of AI knew that next token prediction was equivalent to intelligence.

Like for example, if your job is to predict the next token in the phrase "the only way to turn my computer off is" Or like you know, "if a train is going 100 miles an hour," like next token prediction is equivalent to just intelligence.

The only thing we know of that can predict the end of any sentence is something that is just deeply intelligent. The fact that it's predicting the next token is just a way to simplify the problem in a way that we could train on, we can like hill maximize on.

Joe: Yeah, that makes sense. And look, I think it's probably also fair writing just like literally just as a human writing in English about any number of topics, intersections is an infinitely larger and more complicated solution space than code. Which you know, the whole point of code is to be more precise and to sort of like add some more limits.

So I think there's a fair thing there too. It's like, well maybe the models can get good enough at the code.

Steve: Yeah. I don't know if code or English is more open ended. I get the vibe that they're equivalent.

Joe: Yeah.

Steve: But anyways, I want to make another point.

I get the sense that one of the biggest bottlenecks we have in model improvements is good tests that measure what matters and that measurement of intelligence is this incredibly hard problem. It's actually one of the hardest problems around.

Joe: Right.

Steve: And, like, the Turing Test is kind of the closest we have. And RL, I guess, is trying to get at that, too. But to me, if you're using RL to make models better at programming, I feel like it'll also make models better, just smarter. And smarter things write better, too.

Joe: Yeah.

Steve: But I would like to see us come up with better evals for writing. So the title of the blog post you're referencing was called Steve Eval.

Joe: Right.

Steve: I was working on an eval for, like, how to tell if something is written by me in my tone. If it's, like, good writing. I want to see more things like that. It seems like we should be able to do that.

Joe: Yeah. One of the reasons, I think a lot of people agree that these models have gotten so good at software development is because for many, many cases, the eval is pretty-- You know, we used to write unit tests by hand. Right? Those are evals. Like, is the software correct? And so you were trying to write an eval for writing.

Steve: Yeah, I was trying to write an eval for my own writing. Well, the thing I was doing was I was, like, writing with AI and it was producing a garbage writing. I'd give it all sorts of feedback, and it produced garbage writing. I'd be like, you're not doing all the things I said. And this process of me giving AI feedback about its terrible job doing writing in my voice, I was like, "oh, I should automate this."

I was like, "oh, I need to, in order to automate this, I need an objective eval so it can grade its own writing and take its own feedback and, in a loop, try to improve its writing." That's where the Genesis app project came from.

Joe: How well did that end up working? How satisfied are you with where you're at on that?

Steve: I mostly decided to swear it off for, like, three reasons. One, it never got very good. Two, people seem super upset when they detect any whiff of AI writing. So I just want to get ahead of it and not anger people.

Joe: Yeah.

Steve: I want to be able, in good conscience, to say, hey, I wrote none of those sentences.

Joe: Right.

Steve: And then three, writing is thinking. And there was a laziness I was starting to inculcate myself. And doing that comfortable thinking feels healthier for my brain.

Joe: Yeah, well, I think, yeah. Similar to you, even with attempts to use it, I've never really happy with kind of the outputs, no matter what. And, you know, I do find it's a really useful tool for helping me think through the narrative structure.

I think pre-LLMs my, like, when I do a talk, you know, my kind of mode would be spend an obscene amount of time generating a huge amount of slides and then coming through and just like, cutting and cutting and cutting until I'm like, oh, this was an expensive but very effective way to be like, oh, this is actually the story I want to tell because it's emerged from all of the stuff.

Steve: Yeah.

Joe: So I find it can be helpful like that.

Steve: Yeah, I find it's a great sounding board. It's like nobody can just bug a human all the time for everything that they want to throw out, they're like, how does this land? And it's a good "how does this land" detector.

Joe: Yes. Yeah, yeah, I'm very good at that.

Steve: But I will say, the one success I had for my Steve Eval project, which I think I wrote about in the article was the one unexpected thing that happened was I put it in a Ralph Wiggum loop with itself for it to generate the Steve Eval.

This project was even that lazy where I told it to find my writing, make a list of all the things that it thinks identifies it. Now go pull another thing I've written from the Internet and grade it and see if the grade is good. Now pull something that someone else wrote from the Internet, grade that, see if the grade is bad, and just keep iterating until the, you know, out of a hundred, like, only articles by me are above 90 and everybody else is below 80, except for, like, my favorite writers like Paul Graham and Simon Willison. They can be close to me.

Joe: Yeah.

Steve: And I added some rules in Steve Eval myself, and one of them was very specific about, like, how I use not em dashes, but en dashes and like, the spacing around them. And I came back after a couple hours and it removed that rule that, like, if em dashes are used in the wrong way, like subtract 20 points, it's like, definitely not me. And it removed that rule and it, like, had an article that it found on the Internet that I wrote that broke that rule. It turns out that was ghost written for me by someone. Haha.

Joe: Okay. Haha. So not written by me.

Steve: Yeah. Which is a sign, like, it didn't come up with that rule. But I guess what it exposed is , apparently I have this watermark in my writing that I didn't even realize was like a pretty unique watermark. I don't know what it's useful for, but it was like a fun finding.

Joe: Yeah. I always love talking about especially at this moment, you know, some of these kind of more philosophical things. But you're a founder, you're steering a company and through like a period of time where, like you said, we're maybe approaching the singularity. Maybe things are just changing really fast.

As you look forward, knowing at least what you know today, what do you think are the kind of important things for a founder like yourself to be focused on? Like, what kind of things you are you building on at Val Town? And are there specific areas where you're like, well, we're not going to build there because the models are just going to get better in six months.

Steve: Yeah.

Joe: And I know you don't have a crystal ball, but what's your belief system today?

Steve: Yeah, in some sense, that's the most important question of all. Like, where can I put my energy and my team's energy that won't get washed away and we won't get Sherlocked out of, like--

Where is a place to do work that has durable value? It almost sounds like that's like the question of the economy. Everyone is freaking out and just trying to figure out where is a durable place of value that won't get washed away.

And like an obvious one is anything that's like, vaguely bitter lesson, one-off tricks is like an obvious one. And I'm getting much more comfortable using that as a reply. So, like, when customers come to me and say-- I've had a lot of customers come to me, or it's happened more and more where customers ask for things and I say, like, we should not build that. Like, we should just wait. Like, by the time we build that, the models will have eroded that.

Joe: Yeah. Because maybe that's an easier question. What's a good example of something like that?

Steve: So that one isn't in my product, it's in application of my product. One of my customers is a financial firm and they're ingesting all these reports from their portfolio companies. And they want analysis of these reports. I guess you guys are a financial firm that gets reports from your companies, so maybe this is something you've tried to do.

Joe: Yeah, yeah. Relevant. Haha.

Steve: So our initial demos were great. Like, we threw in a couple reports, like, on ChatGPT, the website, and asked it to give us analysis. And it did a pretty good job. And then we were trying to automate that flow in Val Town. So an email would come in, it would trigger some code on Val Town, it would spin up a ChatGPT agent and write the analysis.

All was going well until the firm was like, all right, great. Now let's just keep adding more of these reports. Let's put the whole history of every report we've ever, every scrap of information we know about this company, let's put into this agent so it has access to it.

And then, you know, the context window broke and everything was a nightmare. And we were, like, coming up with all sorts of custom systems to manage the context window, and it was like, whoa, what are we doing here, guys?

Joe: Right. Right. You know, it's interesting because there's at this moment, I think, even an increasing number of startups kind of focused on how can we be the context layer for X. Right? Or how could we solve that specific problem? What excites you the most about where we're at right now and where things seem to be heading?

Steve:

Well, my life's mission is spreading the joy of programming, helping people get smarter through programming. And it seems like AI is going to be a huge boon for that in theory.

Sometimes I feel like in the talk that I referenced that I gave a year ago, it seems like maybe there are two paths that I can kind of sketch out. One path is you can envision the characters of Wall-E, like, lying back in a recliner, fab, just, like, watching videos. And they just totally atrophy.

Alan Kay talks about this. Like if you're like, oh, you know, children when they grow up, like, are just gonna drive to work and then sit at a desk, like, and then drive home. Like, we shouldn't have kids walk around and play, we should just put kids in cars when they're young and they should just drive everywhere.

Like, the atrophying of muscles and human brains, like, and laziness and having everything automated for you, that's kind of one path, obviously, I'm not a fan of. The other path, that excites me a lot, is everyone can get like so much smarter and build so much more. And to me, a lot of it is the exciting thing is that humans can see further and do more in a very embodied, comprehensive way.

Joe: Yeah.

Steve: And like Tony Stark and Iron Man is maybe like the best vision, or Minority Report. Like these are like some of the best visions. I think it was the same special effects guy maybe who designed both of those.

Joe: Yeah. The reach around interfaces. It's interesting. I was recently having a conversation with Scott Hanselman for a podcast episode and he wrote an article with some of his colleagues in the Communications of the ACM kind of exploring like, what is it going to mean for enterprises? Like, how is the next generation of software engineers going to be trained?

And you know, there's a number of things they go into. But one that really struck me as interesting was this idea that tools like, you know, Claude or whatever other agentic interfaces should really have a mode you can go into that's like more of a training mode where it's like they won't generate the code, but almost like if you ever did pair programming. Right?

Where it's like, make the human the driver. And like, okay, you can talk with Claude but like you're going to actually write the code or it's going to say, I found the bug, it's in this file, but you need to figure out where in the file it is. Right?

Steve: Totally. Totally. The only person I've seen actually do this in practice is Geoffrey Litt, who's amazing. He's been tweeting about this a little bit and he was showing me how he works at Notion and he, when the AI writes code, like part of its job is to write a quiz for him about the code and then grade him on his understanding of the code it wrote for him. Haha.

Joe: Yeah, yeah.

Steve: Which, at least it's something.

Joe: Yeah, Well, I think, yeah, I think there's going to have to be some set of practices or techniques and probably like some "all the above." But yeah, we're very much in the, I think the early-- And it's understandable, like we're figuring out the Wall-E side. But yeah, I think there's some missing kind of techniques we're going to need to avoid that outcome.

Steve: Yes. And for me, like, because I was teaching my brother, I like still have a hand in teaching people. It seems like there's this line where if you're on one side of it, if you don't have like the critical thinking skills to learn from AI, you almost become dumber. Like, my brother, it seemed like he was becoming dumber with AI.

But on the other side of that line, where I think you and I probably are like, hey, I'm like learning so fast using AI. Like, I'm like, I know what it feels like to know something and I know what it feels like to be like, kind of confused and be fuzzy and like, dig down deeper with the AI to actually get me to fully know something. And getting people across that line--

Joe: Yeah. If you know. It's a similar line of inquisition, I think that you would historically have, but instead of like googling around, reading a bunch of Stack Overflow posts, reading a bunch of blog posts, half of which are outdated or. And kind of synthesizing your--

You just have, you're just like straight line, like, wait, why did this work like this? Or how does this work?

Steve: Exactly.

Joe: Yeah.

Steve: But if you're on the other side of that line, you just like ask the AI to write the essay for you for history class and you just pick, copy and paste. Or if you're like, you know, an engaged learner in your history class, like, oh my God, like you're going to write a better essay than you ever could have written before AI.

Joe: Yeah, in way less time.

Steve: But I think the thing that's missing and hopefully better AI or you know, in theory, better AI will solve this is theory of mind. Like, for me, what education is about, and this is where the Socratic method is so good, is a teacher has to like, inquire into the state of the student's mind through questions. That's why Socratic questioning is so great. And then see the gaps and help them connect it.

That's like what a really great teacher does. And AI is woefully bad at that. Like, for example, an example I constantly come back to is when a vibe coder has this really complicated app, thousands of lines. It's confusing to them, it's confusing to me.

Joe: I'd say literally everyone is confused by it. Yes.

Steve: Yeah. And I'm like talking to them about some bug, and it just becomes clear to me in conversation that they have no understanding of the client server database architecture. No understanding that there's three computers in three logically distinct spaces that need to communicate over protocols.

And in my experience teaching smart people one on one this topic, I can't do it in any faster than an hour. Like, for most people it takes like 10, 20, 30 hours to teach client server database architecture, but I don't see any AI system. No Opus 4.6 on Lovable chats where they're like detecting that their user doesn't understand this core insight and then spending like the requisite 20 hours to get it across.

Joe: Yeah, that does get at what I think is potentially an interesting like, a lot of people pushing these things at the forefront are incredibly talented programmers. I mean far more I, I like to think I have at least one point in my career was a semi talented programmer. There are people much more talented than I ever was, you know, pushing this along.

But even in my use, I'm kind of at some level aware that like when I'm talking to the AI or the agent and I'm describing what I want, like it's being informed by all of this just ambient knowledge of software engineering, things like a client server architecture. I understand that there's a SQLite database underpinning this like silly little thing.

So I'm like, I'm very directionally like and I'm never like super prescriptive, but I'm like, hey, it seems like we should do X or something like that. What do you think? And that I think drives it. Whereas yeah, I do wonder, if you're wielding it with no concept of those things, I feel like there must be some delta in the output. Or at least today. Maybe that gets to your point, maybe Opus 11 takes care of that.

Steve: I think one thing to circle back to that I wanted to address earlier. You asked me a question about the million lines of code to make a bug tracker topic and like that seems like it went off the rails and like you wrote an essay about write-only code which kind of spawned this conversation.

To me, I'm very skeptical of lots and lots of lines of code that nobody understands. To me, we as an industry are a little bit confused about that. But as AI continues to get better and the code it writes continues to get better, we'll become less confused about that because it doesn't really serve anyone.

Joe: Yeah, I mean, I think I agree with you on that. And I think the only question is at what point, to me, like at what point the understanding happens, because historically it's just been very straightforward. It's like it's understood as it's created, you know, for the most part. Or like, most of the behaviors are understood as it's created. There's always the emergent behaviors are like, oh, that's interesting.

But if there are ways to make this paradigm, like create code that's functional, maybe not beautiful, but functional, at a rate that's faster than you can employ humans to understand it on the way in, I think at least for classes of code, we'll find a way to enable that where like code will hit production that people don't yet understand. And maybe we're in some agreement.

I do agree that I don't think you can have an ever growing system that's just like, "well, no one ever understands that. No one ever will." And so I don't know if that means people are going to have to do stints where they spend a week just like spelunking around like incredible voyage style, the code base with Claude as their co-pilot. Haha. But yeah, we're going to have to address that for sure.

Steve: Yeah. I'm glad you used the phrase like "emergent behavior" because that's like an exact phrase that I used in a recent essay I wrote.

My thesis on this is that I think there's a way to understand this new wave of AI that fits into the same narrative that we've had before.

Like we keep getting more abstract programming languages and we have compilers and that's all great. Where it goes too far when people say ah, English is a new programming language and it's a new level of abstraction. To me that takes it way too far. English is a terrible programming language. Not precise at all.

Joe: Like, literally the worst. We'd be better off programming in French or Spanish than English.

Steve: Yeah. So precise formalisms are really, really useful. That's like one point I make in my essay that I really like. I have these like great Bertrand Russell and Dijkstra quotes in that essay that I just like love championing and bringing out. Like toting out these quotes, like I should put them on a flag, on my shirt flag. Like, formalisms are great.

And like, maybe one way to like really let people know what I'm talking about. E equals MC squared. Like, at no point are we thinking that like agents should be coming up with new science. Like E equals MC squared that like humans don't understand. Like the whole point of E equals MC squared is that it's like super precise, super abstract equation that like human brains can use to think about the relationships between quantities.

We want to continue doing that. And abstractions like CodeMirror let you do that. And hopefully as AI gets better we're going to have AI write those things. But Bret Victor has this great essay called Up and Down the Ladder of Abstraction where he is trying to teach explicitly this move of thinking of a problem at a certain level of abstraction and then going up a level and thinking about things at that level and then going back down a level.

And he uses the phrase "emergent behaviors" that when you're at a certain level of abstraction, you see different behaviors that emerge at other layers of abstraction. Joel Spolsky calls this a leaky abstraction. I think those are the same thing. And your point about how when you write code, when you write JavaScript and you understand it, you understand it in a way, but you don't understand all possible emergent behaviors that could arise from either lower level abstraction, a memory leak, whatever, or interactions with other components of the system that you just didn't model out because the system didn't expose those to you.

Where I'm driving towards, is that the way out, the way to square the circle and continue to have human comprehension rule the day or agent incomprehension when the agents are smart enough-- The way for systems to continue to be comprehensible and orderly, and that's not to have a mess of things and still move very quickly, is through better and better abstraction.

And at each level, the abstraction continues to be precise. It's not English, it continues to be precise, but it's abstract. Like we should think of a precise abstraction like E equals MC squared or functional reactive programming or reactJS or nextJS or CodeMirror or Lib Ghostty as like these huge levers that let us get incredible leverage and move incredibly quickly. But comprehensively and precisely.

We can have our cake and eat it too. We can move fast, we can have incredible leverage and we can comprehend. That's the point of abstraction.

Joe: And we can understand that.

Steve: And we can understand. Yeah, that's the point of abstraction.

Joe: Well, that seems like a good goal and a good place to end. Thank you so much. This has been a great conversation.

Steve: Thank you. Thank you so much. So fun.

Content from the Library

Visit library

May 14, 2026

Podcast

Open Source Ready Ep. #37, Is AI Killing Open Source Software? with Stormy Peters

On episode 37 of Open Source Ready, Brian and John speak with Stormy Peters about the evolving relationship between AI and open...

May 7, 2026

Podcast

O11ycast Ep. #90, Outcome Engineering in the AI Era with Cory Ondrejka

On episode 90 of o11ycast, Ken Rimple and Jessica “Jess” Kerr speak with Cory Ondrejka. Together, they unpack the rise of agentic...

May 6, 2026

Podcast

Third Loop Ep. #4, Signals and Levers with Elisabeth Hendrickson and Joel Tosi

On episode 4 of Third Loop, Elisabeth Hendrickson and Joel Tosi join the hosts to discuss systems thinking, software delivery,...