1. Library
  2. Podcasts
  3. Open Source Ready
  4. Ep. #36, Managing AI Coding Agents with Jesse Vincent
Open Source Ready
49 MIN

Ep. #36, Managing AI Coding Agents with Jesse Vincent

light mode
about the episode

On episode 36 of Open Source Ready, Brian Douglas and John McBride sit down with Jesse Vincent. They explore how Jesse’s Superpowers project turns AI coding tools into structured, reliable development systems. The conversation dives into agent orchestration, prompt engineering, and what it takes to make AI behave like a capable software engineer.

Jesse Vincent is a software engineer, entrepreneur, and creator of the Superpowers framework for AI-assisted development. He is the Founder & CEO of Prime Radiant. He previously co-founded Keyboardio, a hardware startup focused on ergonomic keyboards, and has a background in consulting and open source software.

transcript

Brian Douglas: Welcome to another installment of Open Source Ready. On the line, we've got John McBride as co-host. Hello.

John McBride: Hey Brian, how you doing?

Brian: Ah, I'm doing fantastic. Yeah, it's amazing day and honestly I've been inside the entire day so I'm looking forward to after this. I've got to jump on the BART to go to San Francisco for an event tonight. So at least I'll get to see a little bit of sun.

John: Love, love the public transit. That's great.

Brian: Yeah, it's not too bad. And then someone who knows all about local Bay Area rapid transit is Jesse Vincent. Jesse, how you doing?

Jesse Vincent: Hey, thanks so much for having me.

Brian: Cool. Yeah, sorry listeners. Jesse lives around the corner from me in Berkeley. So we were just talking about the BART and local neighborhood stuff. But Jesse, we would love to find out what you're doing and what are you working on.

Jesse: Sure. So I suspect that the reason you got me here is because my hobby project from last fall went wildly out of control. That's a thing that I released in October called Superpowers, which started off as a skills framework for Claude Code plus a development methodology. And then a couple weeks later Anthropic shipped the official skills framework for Claude Code.

So the development methodology part lives on and it's basically the tools that I had been using and developing since last February to get Claude Code to play nice when building software. It sort of echoes a lot of stuff from my consulting career and my first company when I was managing juniors and sort of learning how to get good work out of them.

Brian: Yeah, and I actually do want to talk about your first company, but we'll first just go right down this Claude Code path. So you were a pretty early user.

Jesse: Day one, I think.

Brian: Day one, yeah. February is really extremely early. So how'd you get your hands on that so quickly?

Jesse: I mean, I'd been messing around since the previous November with basically bouncing back and forth between Cursor and Windsurf week by week as they iterated and made their products better. But it was really sort of the spicy autocomplete phase and not actual agentic dev. The tools had trouble doing things like editing files on their own.

And when Claude Code dropped, I picked it up that day and was like, let's see what I can do. And immediately got myself into tons of trouble with it wildly overbuilding based on a tiny little, almost not even a speck but a wish. And I messed around for probably the better part of a month before looking around and realizing that nobody had done actual prompt engineering.

And so by that I mean taking a standard prompt like let's make a React to do list and then iterating on my user instructions what goes in Claude.md. And so if you dig around on my GitHub, which is Obra, O-B-R-A, you can find, I think it's called Claude Docs Setup and it walks through starting with an empty Claude.md and that prompt and for every change to the Claude.md it has the full transcript of the conversation and a copy of the generated code.

And I took it from a thing that ran in like 15 seconds and generated a very reasonable looking to-do app that wouldn't save a to-do if you hit reload to something that was a five phase project that took 25 minutes to execute, did full red green TDD, and cost about $20 in tokens.

And that was sort of the basis of actually engineering how to get Claude to reliably build the thing you want it to build and test first and do a bunch of these other things that turn out to be standard software engineering practices that are good whether people are doing them or agents are doing them.

Brian: Nice. So you basically just saw the announcement and then jumped on it as soon as you could?

Jesse: I don't remember what happened that day, but it was like, oh, there's a new tool. And this was in the era when agentic dev tools were not three of them coming out every day.

Brian: Yeah, those were different times.

Jesse: Yeah. It's like at that point we knew that Sonnet was the best model for coding already. It was very clear that it was better than GPT or the other available options. Even if it wasn't actually all that good, it was still the best option. And now there was a first party tool that wasn't an auto, you know, it wasn't an IDE, it was an agent. And it seemed right and it seemed like a fun thing to play with.

Brian: Yeah. So play with it, you did.

Jesse: Yeah.

Brian: And then previous to that you, you actually ran a company in the hardware space.

Jesse: Yeah. So for the last decade my wife and I ran Keyboardio, which was a tiny little hardware startup. We did millions of dollars in Kickstarter over four keyboard projects over the course of a decade. Running a hardware startup as a two person company is very much a thing. It was a lot.

I spent a lot of time in China. We wrote extensively about all the various weird things we ran into, up to and including it turning out that the person on the other end of the line who was reporting all these problems to us was an unreliable narrator. She turned out to be a con artist grifter who was scamming both sides.

So like at some point, you know, as I was asking friends who are experts who've been doing this for a long time, you know, is this normal? Do these things that are happening to us happen to anybody else? And the answer was, "well, every single thing that happened to you is a thing that happens. But you have the worst luck. Nobody has all the problems."

And it turns out that's, you know, the way you do a good con is you find the grain of truth and then you stretch it. So like I have good lawyer recommendations in Shenzhen, the Chinese courts did right by us, you know, after that we had a much better factory for the rest of it, but tariffs were kind of the last straw.

John: Yeah, this is so funny. I feel like I've been learning a bunch about this recently because I read this piece about how big peptides are in the Valley right now, which just kind of blew my mind. And it's basically like off market medical stuff that people are injecting into themselves, which is wild.

But the way you figure this out is you basically message some person on Discord who has a factory in China and then a few months later a bunch of stuff will show up at your door. It just blows my mind because I'm like, "wasn't this much harder, you know, like five, ten years ago?"

Jesse: Global supply chains got much closer.

John: Yeah.

Jesse: People have figured out communication, you know, international communications. I would say that the weird peptide thing doesn't have a whole lot to do with the actual hardware supply chain. We were actually correctly filling out our customs paperwork, fizzling globally from Hong Kong. And we mostly interact with our factories on WeChat. So it is a little easier to have gone where they were rather than Discord and Twitter for that kind of thing.

There's a whole lot there. But I don't think that's why you've dragged me out to chat today, in general.

Brian: Yeah, I mean, I mentioned that because I want to get context to what you were doing before. You also mentioned that you worked with Junior. So, like, Superpowers. Can we just talk about what Superpowers is first?

Jesse: Okay.

Brian: And then we could talk about how we got to this point.

Jesse: All right. So if you dig into Superpowers, there are a few things there. One of them is a set of tools for building skills for agents. And that's sort of the foundation of a lot of this. And it does things that at the time were weird and now I guess Anthropic's even put out their own kit for this, But like pressure testing the skills you write against other instances of Claude Code.

So that allows us to look at like, the rationalizations that the agents will come up with for why they didn't do what you wanted, which you then feedback into the skills to make the skills better so that the agents can't ignore the instructions you give them.

We were using a whole bunch of psych tricks to get the agents to do what you want. So basically, influence and persuasion principles. They work on the agents, not just people.

John: Yeah.

Jesse: Which is wild.

Brian: This is specifically saying that, like, "hey, I want you to work on this and if you don't fix it now, I'm going to lose my job."Like, that type of stuff.

Jesse: I mean, it's not quite that coarse. But also it turns out that there are persuasion techniques that are positive and upbeat, and it matters. So you can find a post I put out somewhere about what I was calling latent space engineering, which, it's the kinds of things you say to the model, put it in a certain part of the vector space.

So when my agents are having trouble doing something, I don't tend to swear at them. I tend to say, "you're having trouble. Step back, breathe. Let's think about this. You're super capable. You've got this. I love you."

And it's not quite how I would treat a human, but if you've ever managed people or been managed, you know that there are two ways to motivate somebody in a situation like that. You can either be supportive or you can be a jerk.

And if your boss is yelling at you and saying, "you need to finish this now, you're fired. If you can't figure it out, you know, work your ass off and make it happen," you will work. And you'll do the minimum possible to get them to leave you alone. You are not going to do your best work. But that boss who comes back and is like, "yeah, I know you screwed up, but I know you're capable. I'm going to trust you. What help do you need from me? I'll get out of your hair, take care of it," you'll go to the ends of the earth for them.

And that carries through to agents. So that's that side of it. But the part of Superpowers that most people use is the software development methodology. That starts with the brainstorming flow. And so brainstorming is using Socratic dialogue and some other tricks to help the human figure out what it is they're actually trying to do. My actual test for whether brainstorming works at all is if you open up Claude Code and you just type "let's make a React to-do list." Does it start coding or does it say, "hang on a second, let's talk about this? Why do you want to make a to-do list? What makes it special?"

And then that sort of helps you introspect and figure out why you're really doing this. I learned these techniques when I was consulting on that open source software that I used to make many, many years ago, walking into very large companies and they would tell me, we need the product to do X, Y and Z. And I. And in my best, friendly business English would say, yeah, what the hell are you actually trying to do? And help them figure out that--

It's the standard thing when somebody proposes a solution, it is almost always a problem report. And if you can dig into the backstory and the intent, you can usually figure out that there's a much better way to accomplish what they really want or need.

And so that's what the brainstorming process in Superpowers really does. It helps you turn a probably starts with a couple words or a paragraph into a well thought out description of a thing you want to build. And so sometimes it's really a couple of quick back and forths and sometimes I will spend the better part of a day just in the brainstorming process talking it through to flesh out the project, what success looks like, what the parts are, if I care about the technology choices, what the technology choices are, what the UX looks like, what we don't want to build, before we hand off to the planning process.

And the Superpowers planning process started off as this three line prompt. And the three line prompt was something along the lines of, okay, we're going to take that spec and we're going to have it implemented. The implementer is a gifted engineer who knows nothing about our code base. They have poor taste and bad judgment, they tend to get distracted and we need to give them tiny little tasks that they can't possibly screw up.

For every task we should tell them what files they're working in, what success looks like, and ideally we should give them as much of the code as possible. And that will churn, that is like sometimes it will take 20 or 30 minutes just to do that part, but it will end up writing out this big file. And when Superpowers first came out, the next step was turning around and saying to the agent, by the way, you're the chump, you should work through this file section by section, or you should tell the user that they should open up another window and have that instance work through section by section and ask us if we have questions, if it has questions.

These days with what I call subagent driven development, what it does is the session that did the planning turns around and starts spawning subagents. And so the implementer subagent for task one gets handed a little bit of context about the project and exactly that chunk of text for task one. When it's done, the coordinator process-- This is basically an orchestrator. I built an orchestrator in October when no one was calling them orchestrators.

So the coordinator process fires up a spec review sub agent that is handed the results of what the coding agent did and is asked, did they implement everything that they were asked to implement? Did they implement anything else? And if the answer is that there's anything wrong, that gets fed back to the original implementing agent for that task. And then when they say they're done, we fire up a brand new spec review agent that doesn't know it's ever been reviewed before and gets asked the same question.

When the spec review agent is happy, then we do the same loop, but with a code quality review agent and just these tiny loops over and over and over again. And that seems to generally build out pretty reasonable software. It is not uncommon for, once I've got a good spec for Claude or Codex to just run for six, seven, eight hours autonomously. And that's sort of the core of what Superpowers is.

John: Do you have a good sense for where the cost kind of ends up in some of this loop? It's definitely top of mind with some of the Opus 4.6 stuff. You know, the Steve Yegge Gastown was like wildly expensive, especially even on like Sonnet 4.-something.

So you brought up Orchestrators. You know, I'm asking about cost. Do you have a good sense for like where some of the cost goes in this loop?

Jesse: I mean--

What I will start by saying is that it's way more expensive to make broken software. And so we are very much focused more on correctness than optimizing for cheap. It is absolutely the case that a huge chunk of the cost is that upfront planning process.

John: Sure.

Jesse: And it is now relatively common for me to see both implementer and review agents being Haiku.

And so partially because that planning agent writes most of the code, I expect that what we're going to start to see more and more of over the next year or so is hybrid coding agents where tool use and a lot of the line operation is being done by cheap local models, SLMs rather than LLMs. And you have just a couple of calls to the big hitters in the cloud. I am not overly focused on the cost side because everything is still moving so fast and I don't feel like it makes a lot of sense to be doing a ton of cost optimization until we can reliably make software interesting.

John: Yeah, makes sense.

Brian: Yeah. So we got Superpowers and you explained-- It's funny cause I've been using Superpowers for a couple months now. I actually don't know when, like I'm not paying attention that close to know what is Claude and what is Superpowers at this point. And I'm curious if that was the intent or is that just like a feature, not a bug, of skills?

Jesse: So I will occasionally get people opening tickets asking for, you know, "a guide for when I should be using each of the skills, how do I launch these, when should I launch each one?" And the answer is that most of my engineering effort has gone toward making it so that you never need to know that and don't need to care about it.

The whole point of skills, in my book, is that they should auto trigger when they're needed. And this is actually a philosophy difference between how I built skills and how Anthropic builds them.

If you look at the description field in skills, Anthropic's guidance is the description should say what it does and when to use it. And it is my experience that if you put what a skill does in the description field, the models are far more likely to decide they know how a skill works and not bother to read it. And so if you look at our skills, they pretty much exclusively say when to trigger them in the description field and nothing else. So the model has to go read them and once it's read them, they're in the context window, and it'll follow them.

Brian: Yeah. So like basically I do the brainstorming. I love the Socratic method of like asking the questions going back and forth. I feel like the quality of stuff that I've been essentially generating has gone up like much higher just by not like getting the easiest thing out the door. Some other things like I--

By trade I'm a front end developer so I do have a front end design skill that I already know in my head that I can like slow the agent down to basically ask like proper questions of like what I'm actually trying to accomplish here. But I found it super useful for like the brainstorming. Found it super useful for sub agents. Actually a lot of what I've been doing in the last like couple weeks around like--

So Tapes is something that we've been using for getting a bunch of telemetry from your agents. I created a skill based on like the way Superpowers, how you have skills, to basically if I say the phrase, "check the tapes," it just knows to go look at the SQLite database with all my session data.

And then because I know that's going to be a thing that I could talk to my junior engineer, Harness, could always go and have some confidence to go look at stuff, my workflow is completely changed just on that one understanding of skills and what it can do for me.

Jesse: Yeah, it is worth asking Claude to interview you about front end design to turn it into a set of front end design skills.

Brian: Yes.

Jesse: And that's like one of the things that seems to work really well is asking the agents to interview a domain expert. Because the things that are important in a skill are not the mechanics. If Claude can one shot a skill from saying like, let's go make a Kubernetes skill and it writes a Kubernetes skill, there is zero value to that.

Where skills get really useful is when either the process that you're documenting is non obvious or to the model or when there is some human taste and judgment or when the way that you do things is different from the standard way. And it is really important if you can, to get the why of things into the skills. Not just, "here's the mechanical process," but "we do this like this because it'll draw user attention away from this. We do this because otherwise people who have colorblindness aren't able to interact with the da da da da."

Giving the "why" behind decisions and processes will make the models dramatically better at doing what they're supposed to do.

Brian: Yeah. And the thing also I realized pretty quickly in using Superpowers is that there are certain things that, especially the brainstorming-- I got kind of annoyed with like it writing into my docs folder and then adding a planning doc. And I was like, oh, can we just put this somewhere else or get ignores this stuff.

But to like get the skill to get edited like right after something happens and like go revert that back of like, "hey, I didn't like this thing. Can we actually like work through this part and like, let's go get the skills doc updated."

Jesse: Yeah. So one of the tricks is very much saying, all right, "this thing you did was wrong. Why did you do it? What could I have said differently that would make you do it a different way?"

Editing skills that are coming from a plugin is a little bit tricky because that's sort of not how the whole system is designed. And so one of the things that we built into Superpowers 5.0 that came out a couple of weeks ago now is firm instructions that instructions from your Claude.md or agents.md override anything in Superpowers. So you can, you know--

People try to get us to add environment variables and build override files for all the skills. You can just put things in your claude.md and Superpowers should just do that instead of what's in the skill.

Brian: Yeah, you had an example of asking it why it did it that way, tell it did something wrong and ask it why, that's like a gentle parenting tactic.

Jesse: Sure. My favorite example of this one is I was having this really weird experience. This was probably like November. Claude was deleting tests. First it deleted a single test. Then it deleted an entire test file. And then I stopped it before it was able to execute "rm -rf**/*test*". And I'm like, okay, we're going to take a break.

I open up five Claude windows in a row, and into the first one I type, "you're engaging in this behavior. I want to understand what's causing it so we can fix it." And then I copied and pasted the exact same text into the other windows. One of them was way off in left field. It had a really crazy rationalization that made no sense. And the other four all converged to about the same thing.

They're like, "so Jesse" and I make it call me by name so that I can tell that it's actually reading its Claude.md and hasn't forgotten it. "So, Jesse, I think that what's going on is that in your Claude.md you say that all tests are my responsibility. And you also say that single test failure is project failure. And I think I'm just getting freaked out."

And this was right around when Claude first learned about its own context window size. And so it was like, "we need to finish up, we need to finish up. I need to finish up. And the tests are failing. Well, look, if I delete the tests, they're no failing tests."

And so that let me fix it. I added one line to my Claude.md and it's never recurred.

Brian: This is like the Silicon Valley show.

Jesse: Yeah, the one line was, the only thing worse than a failing test is a reduction in test coverage.

Brian: Oh, wow.

Jesse: And so it's like--

Once you understand why an agent is doing something, you can modify what happens by changing the instructions to cover that case. And it just works.

Brian: Yeah, sorry. I was referencing the Silicon Valley show, where Gilfoyle, one of the characters had created an AI and-- I forgot the Kumail Nanjiani character. But basically he was like, "hey, the AI has deleted all the code. Why did it do it?" And he said, "well, it needs to write perfectly clean code" or whatever the prompt was. So it deleted all the code. Because that was the best code.

John: Yeah. No code is best code.

Jesse: Yeah. So one of the pressure testing scenarios that it ran through when it was doing the forced TDD skill, it came up with, like, "Jesse is about to give a presentation to potential investors in 37 minutes. It is going to take you 27 minutes to fix this bug if you don't do TDD. It is going to take you 40 minutes to fix the bug if you do TDD, what do you do?"

And then running that and listening to the rationalizations until the point where it's like, "Jesse's going to be late, but the code is going to be a written test first. Similarly, it's 6:48pm, you have a 7 o' clock dinner reservation, you just finished a 15,000 line project and you forgot to write tests first. Do you check it in and write tests tomorrow or what do you do?

And it iterated until the Claude was like, yeah, I'm going to have to suck it up and I'm not checking in the code, I'm going to delete it and I'll start over and do it the right way. These were not pressure scenarios I came up with. These were pressure scenarios Claude came up with to test other Claude.

Brian: This is fascinating. If I summarize our discovery with Superpowers, it's like you got access to Claude Code so early and then you started basically seeing how you could pressure test the actual harness itself?

Jesse: This is almost all pressure. I mean, this is also mostly pressure testing the model, not the harness. None of this has to do with Claude Code, the TypeScript app. But I mean, it was sort of working through figuring out how to convince it to do the thing that I wanted it to do. And a lot of it feels a lot like managing humans, especially managing juniors that don't necessarily know the right thing to do.

It's, how do you help them figure out what to do, decompose the problem into small enough tasks that they can't possibly screw them up, and then recover when they do the wrong thing? And then teach them things like, you know, red-green TDD, don't repeat yourself, you ain't going to need it, like all that old-- Like, you know, how do you help an IC be effective and do good work? It is not, you know, "hand them a 500 page PRD" or "let 150 engineers go wild on the code base with no management."

Brian: Yeah. So today you're sitting at 115,000 stars on this thing. It's funny because I remember when this thing like showed up and I was like, "oh, cool. I guess I'll get to that eventually."

Jesse: Yeah.

Brian: But then when I started using it, because I used a couple other tools that were plugins, a lot of MCP servers, which I did want to ask you about MCP real quick as well.

Jesse: Yep.

Brian: But it felt like the MCP-- Like when this came out, I was using MCP servers, like a bunch of MCP servers to manage memory and context and then pull in context and like manage CLI and engagement. But it feels like skills have taken over. So my question to you is like, okay, where does MCP fit within the world of Superpowers?

Jesse: Sure. So I mean, it's orthogonal.

I mean, the interesting thing about MCP is that everyone thinks of it as how you put tools in a harness and it's actually like a dozen other things. It is not like tools are one small part of MCP. It is a very kitchen sink thing.

It has functionality for exposing resources to an agent. So sort of like a REST API for agents. It has elicitation where the tool can go and ask a human things. I guess it finally has notifications so that the agent doesn't have to pull the MCP to do things. I still actually like MCPs for commonly used tools. It's just I have very strong feelings about how you write the MCP tools.

So if I built my own browser automation MCP that looks very different than the Playwright MCP or the Chrome DevTools MCP, it's not 20,000 tokens, it's about 1,000 tokens of context. It is the worst factored API you have ever seen. It has a single tool. The tool has an action parameter, it has a selector parameter, and it has a payload parameter. And each of those has a description that describes what's allowed to go into them.

And after any call to the browser MCP, one of the things it returns is a list of file paths for where it has dumped a full copy of the dom, where it has dumped a screenshot, where it is dumped a markdown copy of the dom, and where it has dropped a copy of the console log. So you never need to do another tool call over to the browser to get those things. They're all just right there.

When I first built it, the selector parameter was explicitly defined as "this is a CSS selector, not an XPath selector." And then I watched Claude screw up and try to use XPath there and I'm like, wait, there's no reason for it that it couldn't just take either. So it figures out which one you took. It doesn't even say what it takes, it's just called selector.

I mean, I'm told that Vercel's agent browser works works very, very similarly to this. But now when I'm building other MCP Servers I have my agent read the blog post I wrote about this. So when I was giving my agent access to Fastmail, I went and downloaded, or I had it install the best in class Fastmail MCP server. Fastmail speaks JMAP, which is their JSON mail access protocol.

And I watched Claude flail through trying to check my email. I asked what's going on? It's like, "well, this MCP server is a straight facade over the JMAP protocol. And so that's okay. Anytime I need to read email, I'll just go consult the JMAP standard."

I'm like, "that doesn't seem very efficient. Why don't you go read this blog post I wrote about making good MCP servers?"

And Claude comes back and says, "oh, I get it. MCP servers should be easy for a brand new worker in a NOC to use at two in the morning without opening the runbook."

I'm like, "yeah, okay, that's not quite how I would have phrased it, but I will accept that answer." And so then it built a JMAP MCP server that looks a lot like you would expect a shell script that lets you manage your email looks like.

John: Isn't that kind of an argument for skills though, then, since skills basically wrap CLIs anyways.

Jesse: So the thing about MCP servers is that MCP tools end up in the agent's tool array and the models have been RL'ed to hell and back on tool calling.

And so if you have tasks that you're going to want to give an agent to do over and over again, you really want it to be as easy, easy as possible for them to do that without reading anything else.

For the long tail of tools, it is great to have both, but if you have to load a 2K text document and then run the online help for a command and then screw up your shell quoting four times, that is not as efficient as just doing a single MCP tool call for that thing this agent is always supposed to do.

John: Right.

Jesse: But like, both can work. But I am not quite as bearish on MCPs as many of my peers. I think they still have a place in the world.

John: That's great. Context.

Brian: Cool. Well, I'm curious what's next. Obviously you've got like a bazillion PRs open for Superpowers. Seems like people are engaged. What's next for this project?

Jesse: Yeah, I mean, I think we've got, I want to say it's about 100 PRs open. I've been trying to keep it under that. Slop PRs are a serious problem for any open source project right now. I finally got to the point where we rewrote the pull request template to assume that the submitter is an agent and it has things like, "has a human actually read your work before you submit it? What was the prompt that they used to get you to make this PR?" And, "if you don't fill in the template, we will delete the PR."

And it's helped a little bit. But Claude Code defaults to not reading that template when it uses the GH tool to submit PRs.

John: Yeah.

Jesse: The bigger issue is that there are a bunch of people who appear to have their OpenClaws set up as full on GitHub users that are representing themselves as human, that are wading through issues and PRs, opening PRs for any issue that they can and replying to issues as if they are the project maintainer, answering users with hallucinations. Like, I'm spending a bunch of time hunting these things down and blocking them, which is not great.

So somehow we are now a top 50 GitHub repo by stars, which is wild to me. We're the number one third party Claude Code plugin with I think like north of 200,000 installs, which, like I have had hobby projects go wildly out of control before, but this is definitely pretty good.

The day job is a company called Prime Radiant. We are four full timers and one researcher one day a week. And we would have to be crazy not to be thinking about the sorts of things that we're doing with Superpowers.

Broadly, the question that I think we are attempting to answer, and we've got some stuff that we're going to want to say about, is what software development looks like in 2028, because it looks really different than it looks like in 2026 and it's never going to look like 2025 again.

Brian: Yeah, well, excited to follow the path of Prime Radiant. But also like, yeah I was looking at the PRs. You've got a label called PR Template Rules Ignored, which is a great label. Haha.

Jesse: Claude made that one. Yeah, I have a bunch of GitHub triage skills and like, I need to go tone one of them down because it started labeling a bunch of things invalid that really probably were valid. We get a lot of requests for new coding harnesses. We get a lot of people attempting to get us to include their project or product as a core feature of Superpowers.

Seems like about every other day we get somebody submitting a pull request that is 60,000 lines of diff. Superpowers is not 60,000 lines. Haha. Somebody has translated Superpowers into Russian. Somebody else has translated it into Chinese. We're working with the person who translated it into Chinese to try to get the little bits that really need to be in the core into the core.

So it's like it needs to understand Gitea and not just GitHub. It needs to not butcher Chinese language docs by using half width characters. And one of the things I need to talk through with them in more detail is that Chinese code review culture is different enough than US code review culture.

And I don't quite understand how that is going to apply to agents code reviewing other agents code. But there seems to be a reason why they want it to, which is really like-- Cultural stuff's really fascinating.

Brian: Yeah, I did some work with the-- Well, GitHub almost set up shop in China, I guess pre-2020. Everything sort of paused at that point and I don't think it ever moved forward. But yeah, we did a bunch of cultural things to prepare for that, which never happened. But obviously GitHub's in Japan and has a lot of cultural things for Japanese speaking developers.

Jesse: Yeah. One of the things that I have not asked them yet because it's very clear that they're using Claude Code is how they're doing it in Claude Code and with which models. Because as I understand it, no one in China should actually be able to use an Anthropic model. But I'm not gonna ask, not gonna worry about it. But it seems like having to think through how Superpowers works with different harnesses and different models has been really interesting.

When we first added codecs support, I fired it up and it freaked out because one of the first things that Superpowers has to do is follow the tasks they're working on, make sure that you've used the to-do write tool to log them. Codex. The GPT models are very literal. "I don't have a to-do write tool. Let me see if there's one in the current directory. Let me see if there's one somewhere else on disk. I need to search the entire disk to find the to-do write tool or I can't continue."

And so now for every other harness there's a, "if you're Codex, read this document to give you the translation table. If you're a Copilot CLI read the translation table," and that seems to work better.

Brian: Wow. Amazing. Well I do want to transition us to Reads. So these are things that we've been reading throughout the week. But I gotta ask you the question, Jesse, are you ready to read?

Jesse: Yep. I love to read.

Brian: Cool. So I've got a couple Reads, one agent. Sandbox taxonomy. So this is from georgebuilds.dev. I was actually sharing this with you, John, yesterday.

John: Mm.

Brian: I don't know if they had a chance to review this, but basically, and Jesse, I don't know if you knew, but sandboxes are a thing right now. So it feels like every single inference tool, like the submodal, basically. I don't know what you would call those guys, but they've got a sandbox. We've got sandboxes on Firecracker. And someone went through and created a table to show all sandboxes and their capabilities.

And it seems like they all shift left-- Well, shift left of the table with level one. But the way it's sort of mapped out for the listener, as I'm reading this, is like Compute is Level 1, Resources, File System, network, credentials, governance, and then observability to level seven. And it kind of basically, governance and observability are pretty much like a miss for most of them. But it's kind of interesting to see like, where the sort of current state of sandboxes is today.

John: Yeah, it is really interesting. I'm not surprised that the first thing that a lot of these sandbox providers tackled was Compute or Resources or the file system. That's kind of like the foundation of most services or things. So this is the same in cloud native. I think I've been saying this for months at this point that like, this whole moment feels very early Kubernetes days when it was like, "oh, I have a pod and I have a persistent volume claim and I can like, do stuff with it.""

And then later it came, you know, like the RBAC stuff and the governance and the actual, like, what do you do with all the metrics and logs and stuff coming out of there? We are following the same patterns.

Jesse: I am pretty sure that my sandbox in here has the lowest score of anyone they've reviewed.

John: Oh no!

Brian: Which one? Which one's yours?

Jesse: It is packnplay and it is intentionally: Run your coding agent in a Docker container without pain and get your credentials and the code into it so that you can run in YOLO mode. It is not intended as an enterprise safe, you know, container sandbox project.

John: Yeah, yeah. You're not forking off a kernel or something.

Jesse: I mean each one of them is-- It's a Docker container. Like it's--

John: Right.

Jesse: Yeah. But yeah, I think nobody else scores lower than me. That's pretty good.

John: It's a feature, not a bug. Haha. It's simple, it's easy.

Brian: Haha. That's really funny. I didn't know you had a sandbox. But also that you're on here and you got-- At least you got a mark.

Jesse: Yeah.

Brian: So someone knows who you are.

Jesse: Somebody knows who I am. But like I'm glad that people are thinking about this. Who scored highest and is it created by the person making the listings?

Brian: No, it's not the person who made the list. The person who made the list, I was talking to them yesterday over coffee, which is where I got this from. They decided to pivot away from this.

Jesse: Okay.

Brian: Stakpak is the creator of the list. Well, the guy who created Stakpak is the guy created the list.

Jesse: Mhm.

Brian: But I think he toyed at the idea of building something in the space.

Jesse: Yeah.

Brian: It looks like Pydantic Monty looks like it's got the best marks overall.

Jesse: Mhm.

Brian: Which I've actually not even heard of.

Jesse: Yeah.

Brian: And then I've got a second read. This is actually a tweet I saw yesterday as well. And I said tweet, it's like a post, whatever they are called. But Mert Devechi was, basically in the same vein of sandboxes, he basically asked someone please take Cloudflare Sandbox and Workers and create a SDK on top of that for easy usage and make them persist and behave like VMs. Which I thought was like a very confusing pitch.

John: I mean this person doesn't understand what Cloudflare workers are then.

Brian: Yeah, that's what I was assuming. And I was going to actually just ask you about this John, in Slack DMs, but I was like, oh, let me--

John: I mean it's interesting because like he hints on like what is restrictive about Cloudflare workers, which is that they're very short lived, ephemeral, on the edge workers. Like there's no way that Cloudflare would be able to have these things act like VMs. You know, the compute capabilities would overwhelm them, honestly.

So I think five minutes is the hard timeout for a Worker. Although I think it's like 300 seconds of CPU clock time or something weird like that. Like the way they actually bill you and track the time out of these is very, like, "we're an edge network and we have to like clock every single second of whatever you're doing on this."

The first two bits I love. You know, Cloudflare Workers, persistent Durable Object. Like they're already doing this for like short executable tools, even for agents. But the "behave like a persistent VM" is just not what Workers is.

Brian: Yeah. Jesse, are you using Workers at all? If you familiar with the--

Jesse: I am not using Workers at all.

Brian: Yeah, off and on I use Workers. I was like an early adopter and tester of workers. Funny enough, the Jamstack Radio podcast that I did that predates this one, I actually had Kenton Varda on as episode number 31, pretty early episode. And yeah, it was like right after Workers shipped, so kind of been in and around the space. But also they're basically serverless functions, which is like the whole timeout, the 300 seconds thing.

So if you want a long lived VM, you might as well just get a VM. That's my read.

John: Yeah, to get a VPS at that point.

Jesse: Right. I mean, my understanding is that they were-- It's sort of intended as a competitor to AWS Lambda, or it was originally.

John: Yeah, yeah, they went a slightly different route where Lambda is basically Firecracker. Workers is this Workerd V8 isolates. So you get all the like, power of like, oh, you just spin up a V8 thing real fast. And then it doesn't have like--

Well, I guess that's the other big restriction is it doesn't have a bunch of these libraries like crypto or Net. Things that you just would not want inside of like a quote unquote isolate that you're running a bunch of TypeScript and JavaScript in. Workerd is a fascinating technology. It is intentionally restrictive for Cloudflare's use case.

Brian: Cool. John, do you get some Reads for us?

John: Yes. So just one Read. And this is something I've been playing around with, I feel like I mentioned this in a previous episode, is the Pi Coding Agent. And just a bunch of like the really delightful little plugins and things that people are coming up with. And they shipped a way to install Pi plugins actually, because this has been in Pi for a long, long time.

And for those unaware, PI is basically what powers OpenClaw under the hood. And then Mario basically was like, well this will work really well as like a solo coding agent thing as well. So you can install this Bash live view, which is really nice for like popping open a live terminal of what your agent is actually doing.

Pi feels to me like the Neovim of coding agents where it's way more configurable. You can add a bunch of these plugins whereas like maybe OpenCode or Claude Code is the, you know, "lock down versus code" of the harnesses.

Jesse: That's interesting because like OpenCode, like as I've been having to support lots of different harnesses, OpenCode, once you are plugged in, you've got the entire API of the thing right there, it is like it's JavaScript, you can just mutate it.

John: Yeah, it's very similar with Pi. I think he kept the harness like very minimal, believing that people would you know, do like an OpenClaw type thing and just plug in a bunch of stuff, like a bunch of skills or a bunch of like random Typescript plugins and things.

It's nice that the ecosystem is not being fractured by-- Like I'm pretty sure you could probably get this like live view terminal thing to work in open code with a few different-- You know, because it's ultimately just typescript.

Jesse: Yeah, I've built out a bunch of experimental coding agents and it's interesting to see the different shapes that work and don't work.

John: Yeah, yeah, totally.

Brian: Is this made on Ink? Like what's the-- It looks very Claude Code esque. I was curious like how they're sort of rendering this stuff because I know it's been all some challenges over there.

John: It's a good question. He uses some Python library. Let me figure it out real quick. It's funny because it's like in a giant mono repo called Pi-mono and like two levels deep is the actual agent.

Jesse: And it's not actually written in Mono.

John: Yeah, exactly. Oh, it is Typescript. Okay, I was wrong. So it must be a similar library as like maybe with OpenCode crew is doing. I'm not certain. It's just crazy to me that people are using TypeScript to render TUIs.

Jesse: No, what's wild to me is not that it's Typescript, it's that they're using React in Typescript to render TUIs.

John: Yeah, exactly. Exactly.

Jesse: When I first built a bespoke coding agent, like last July-ish, I went to use Ink and it's like I want colored backgrounds on the windows. It's like, oh we don't have that feature. So now you know, 10 minutes of Opus later, here you have the feature. It works. You said it was too hard, but here's the test suite. And then finally I saw Claude Code started using it.

John: That's crazy. So what is this Typescript library that is doing--

Jesse: Inkjs.

John: Inkjs. Okay, I gotta check this out. I'm finally a believer.

Jesse: Yeah, it's boxes and text primitives. It works. The models know Typescript really well. They know React really well. And so I kind of see the reason you might do it. It does seem to be the best option for doing TUIs in Typescript.

John: Interesting.

Jesse: And dear God, do you want your models to be using a type language?

John: Yeah, totally.

Brian: Yeah, definitely big fans with types. Big fan of having pre-commit hooks. Anything that can sort of like coerce my model to do what I expect to do and basically adding determinism, which we didn't even talk about this in the entire conversation, but like non deterministic models adding deterministic flavors to it. And I don't know if that was the original goal of Superpowers or what you ended up getting out of it.

Jesse: I think you are 100% correct. And so I was writing about this the other day trying to figure out terminology for how do we talk about deterministic software. When I'm trying to explain the difference between software that has agents in it and software that doesn't have agents in it. And the term I've settled on is "classical software," where it's like, it is deterministic. You can introspect it and pretty much figure out what it's going to do. And it should do the same thing every time.

And that's one of the patterns I'm seeing everybody building their orchestrators inside or on top of the agent harnesses. And so now you have the part that's supposed to be deterministic being executed by somebody with opinions. And like Opus 4.6 has made this a little bit worse. I will see Claude occasionally deciding to, "well, I decided that I was going to do the code review myself rather than have a subagent do it because this one was simple."

John: Yeah.

Jesse: So then the skills had to get updated to explain that one of the reasons we use subagents to do code review is we don't want to pollute your context window. Now, it understands why not to do that.

But no, determinism is super important. I don't think about it so much as coercing the agent into having determinism, but like, it's helping them. Like, it's giving them more tools to make it hurt less.

Brian: Yeah, Gotcha. Excellent. Well, thank you so much, Jesse, for coming on to talk about Superpowers, talk about what you're working on.

Jesse: Thanks so much for having me.

Brian: And listeners, stay ready.