1. Library
  2. Podcasts
  3. Open Source Ready
  4. Ep. #39, Agents Take the Wheel with Zach Smith
Open Source Ready
43 MIN

Ep. #39, Agents Take the Wheel with Zach Smith

light mode
about the episode

On episode 39 of Open Source Ready, Brian Douglas and John McBride speak with Zach Smith, creator of Kplane, about rethinking Kubernetes for an AI-driven future. Zach explains how virtualized control planes could enable isolated cluster experiences at massive scale, while the conversation explores developer experience, AI tooling, and the possibility that future AI agents may each require their own cloud.

Zach Smith is an engineer and infrastructure builder with more than a decade of experience creating developer platforms and cloud-native tooling. After beginning his career in the Cloud Foundry ecosystem and contributing to Kubernetes multi-cluster projects, he founded Kplane, an open source effort focused on virtualizing Kubernetes control planes and enabling infrastructure at unprecedented scale.

transcript

Brian Douglas: Welcome to another episode of Open Source Ready. John McBride, welcome back. How are you doing?

John McBride: I'm doing good. I'm hot. There's a heat wave here on the east coast, but I'm excited.

Brian: Okay, thanks for clarifying that heat wave because when you said you're hot, you know, it is Hot John Summer, I guess. Or is it a hot IPO Summer?

John: Hot IPO John Summer. We declared it here first. Welcome. Haha!

Brian: Excellent. Well, yeah, a lot of hot things happening. AI, IPOs, SpaceX. But also we've got a hot guest, Zach Smith. How you doing?

Zach Smith: Hey, good to be here with Brian and John. It's a hot Friday in San Francisco as well.

Brian: Yeah, I was going to ask, how's the Mission holding up today? I think this might be the last hot day for a bit for us.

Zach: Yeah, I say hot, but it's, you know, 75 degrees. Most people would not complain.

Brian: Okay, not bad.

Zach: But it's hot for us here in the Mission.

Brian: Yeah, I was in the Mission last night at one of my infamous founder dinners. You've attended some of these with me, Zach. But it was nice to get the breeze in last night. It really cooled off.

But speaking of the weather, let's actually stop talking about the weather. Let's talk about what you're working on. We know of you for this little project you kind of started a few months back called Kplane. But before we get the Kplane, can we just find out who Zach is? Like what's your background? Why are you here?

Zach: Yeah, so I've been building developer tools, developer platforms for a little over a decade now, mostly on the Kubernetes cloud native side. So I actually started my career working with a tool called Cloud Foundry, which was a big kind of microservices containerization platform used by a lot of the Fortune 500.

And then as that phased out, everything transitioned into Kubernetes and the rest is history. So I've been pretty deep in Kubernetes, most recently working on things like the multi-cluster runtime with the Kubernetes SIG community. And then, yeah, Kplane figuring out how we can virtualize the Kubernetes API server.

Brian: Nice. Yeah. And then yeah, can we just talk about Kplane at this point? Because it's an open source project today. I'm pretty confident our listeners probably have a lot of interest in knowing what this thing is.

Zach: Yeah, so there's a couple of you know, similar or adjacent projects, one of which is called KCP. KCP is a kube-like API server, heavily influenced by the API server, mostly for internal developer teams to kind of stand up consistent API that their developers are used to and scale them horizontally.

There's also folks like vCluster that have been kind of at the scene for a few years now, I think they're Series B or Series C at this point, that are getting lots of success in the enterprise and GPU spaces as well. Kplane is kind of a mix between those where, you know, the world kind of went, if you rewind five to 10 years, we had these massive Kubernetes clusters. Everybody was doing multi tenancy on them. Every customer gets their own namespace.

And this pattern is actually very common. I would assume that 80% of the multi tenancy workloads look very similar to that. Then you get customers who graduate out of that and they kind of slingshot toward the other end, which is everybody gets their own Kubernetes cluster. The problem is that you end up having this Kubernetes tax where you have to deploy the control plane, manage the control plane upgrades. You're spending cost on compute and memory and those sorts of things.

Kplane is kind of the middle of the road between both of those extremes where we say, actually we can give you the control plane in a virtualized manner where you don't have to replicate resources for every single cluster that you want to have. And instead you can share a single group that is also distributed across regions or zones, but have hundreds or thousands or tens of thousands of clusters from that.

And so yeah, Kplane is kind of the first step virtualizing the Kubernetes control plane. And then the stuff on the multi-cluster runtime is kind of the other side of it of how can you deploy one operator to reconcile resources across many clusters at scale.

John: Yeah, I think this is really fascinating work from you know, when I was also at Cloud Foundry, then to VMware working on Tanzu and that was deeply integrated with cluster API, which is kind of this whole Kubernetes native way to declare resources, machines and you know, et cetera.

And I think with some of the groundwork here-- So I'm curious, you know, the, the deeper history on some of the multi-cluster stuff that you've been working with the SIGs, like how's that been going? Like what's the history with that as far as it relates to, maybe where some of our listeners are at, deeper in the enterprise. Right?

Zach: Yeah. So the multi-cluster runtime is relatively newer. It is getting a lot of traction. You know, if you don't do this and you've got let's say tens or hundreds of clusters, you know, you have to deploy that operator every single time and you have to of course manage the lifecycle of it.

Then you get to the question of like well is that operator being fully utilized? Like what is the utilization of that operator? Is it 5%, 10%? And you've got to like rightsize it per cluster where instead what people are realizing is that actually this kind of management plane model where it's like a centralized management plane, some region out there, but then you've got this global distribution of clusters that are really just like replicas or edge locations for your kind of core set of resources.

You can just deploy this multi-cluster operator once and have it watch all of those at the same time. And so I think people are saying okay, we can actually save a lot of just pure compute and resources by doing this model. And it also simplifies developers life cycle where all the logs, all that the reconciliation is in one place.

You're not fanning out, you're not just pushing them to the edge, you're actually reconciling them centrally and then the edge just gets the correct status and state that it should have. I think it's definitely up and coming. Lots of folks that are building large scale clusters are on top of it.

So if you look at kind of the contributor list for that project, you'll see folks from Nvidia, vCluster and a lot of different other enterprise companies that are all investing in it.

John: Totally. Yeah. This is really interesting because I actually was thinking about a lot of this stuff with a uh, recent Coinbase incident. And this was actually one of my reads where, you know, it's easy to point fingers. We don't really know the volumes and the trading volumes and all that stuff, but essentially they had this huge outage where a bunch of their stuff was like in one region.

And I think even one node, like correct me if I'm wrong Brian but essentially, you know, that one thing goes down, that one region goes down and like the whole trading platform goes down. And I was thinking about this because I was like, well, how hard would it be to virtualize some of the resources across multiple regions, even having like failovers and stuff?

Is this a kind of similar realm where Kplane plays in or that you think about?

Zach: You know, I think it depends, right? I think a lot of companies still have to have like one management plane or at least like an active passive management plane. And I think for that layer, like it typically does make sense to leverage like the bigger clouds, like the EKS, the GKS of the world, just given the reliability that they bring. But yeah, you should definitely have multiple copies of it, I think where--

And not saying that the Kplane or the vCluster or the KCP model isn't reliable enough, but it's hard to compete of course with those nines of availability. I think it just comes down to that active, active architecture that a lot of companies just aren't doing. Especially if you just have one centralized management plane.

John: Yeah, this is something I feel like I have these like kind of crisis of faith moments on where I'm like oh, Kubernetes is great. I love Kubernetes. I built so much my career on Kubernetes, but sometimes I really just want like the breeze and the ease of like a Cloudflare Worker architecture where it's just like, it's a very stateless thing.

You know, granted a lot of the Kubernetes stuff is very stateless. Maybe control planes end up being where like a lot of the most, most of that state ends up living, especially in a multi cluster architecture. But yeah, I find this technology so fascinating because it's so cool that you can virtualize a bunch of those bits to then kind of ease some of that pain.

Zach: Yeah. And it's funny that you mentioned that, like, before the Kplane project, I was actually kind of going down the rabbit hole on why are people not doing global Kubernetes clusters. And it turns out that with Raft, the consensus algorithm used in etcd, it starts to break when you have higher latencies around the world.

And so if you have, let's say a normal Kubernetes cluster with four nodes and one of the nodes goes down, the pods shift over and everything is typically fine. There might be a little bit of downtime in between, depending on, of course, the replicas that you have inside that cluster.

But if you were to stretch that across a region, or say, let's actually do global footprint of nodes, but still one cluster, things start to break. And a lot of people have tried this. And what's interesting is that there's actually a couple new algorithms that have been proposed, so, like Fast Raft and Hierarchical Raft, which would potentially allow you to have this global Kubernetes cluster that could handle a whole region going down. Where today, you know, with all the big clouds like EKS and GKS, you have to build that, you know, cluster in simply one region.

John: Yeah.

Zach: So interesting stuff, but unfortunately, bigger fish to fry.

John: Yeah.

Brian: I was just going to ask, why is Google or Amazon not really already doing this already? Like, these hyperscalers, they're all built on Kubernetes. Is this kind of out of sight, out of mind, or do they have a different game that they're playing?

Zach: I think it's also one of those things around--

If you mess something up and you have one cluster, you're now down globally.

Brian: Yeah.

Zach: Right? So it was one of those, like, it would be nice and simple to have a global Kubernetes cluster, but if you push one small config change, you're taking out everything. And so instead people go, actually, the regional model is kind of a good safety boundary where updates are pushed to one region first. Everything looks good, they get pushed to the next region.

It provides you some level of confidence before you push that single config change that takes out all the Windows servers in the world and flights stop. Right?

John: Yeah. They also, this is maybe a cynical take, but they don't want to compete with themselves by, you know, having these really great Kubernetes offerings that bring in lots of money because, you know, Kubernetes uses a lot of compute. They do have serverless platforms like Lambda. And I'm thinking of like Google's cloud run and app engine and stuff, which probably have Kubernetes components, but also have probably more stateless just on raw compute kind of components and stuff.

So it's best not to compete with your own products.

Zach: Yep, that's true.

Brian: Yeah. So sorry, when did you start Kplane? What's the timeframe we're looking at?

Zach: Yeah, so a few months back, I was working with some old friends. Their company is called Datum, building kind of an open alternative to Cloudflare and it's all open source, so feel free to take a look. Their model is, you know, global edge locations. So I think they're up to like 20, maybe they'll be at 40 before we push live.

Each of those edge locations is a Kubernetes cluster. And so similar pattern of single management plane, you know, reconcile the resources and then push them out to the edge for networking primitives. And so the problem that we had was we want every customer who signs up to have the Kubernetes experience of, oh, I have my own API server, I can deploy my resources, I know what a config map is, I know what a secret is. And everything's very familiar, especially to people that are in that infra role at these bigger companies.

And so instead of building your own custom API and saying like, here's the API definitions, go learn this, go earn our new SDK. It was actually, "no, hey, everything's the Kubernetes API. Just use Kubectl, look at the CRDs, you can see what we expose to you. It shows you the specs, it shows you how to use them, et cetera".

And so that was the kind of user experience that we were after. Now we didn't have Kaplane and for some reasons we decided to not go forward with KCP. And so the problem was, yeah, do we want to spend, you know, 20, 30 bucks a month per project in just underlying resources for every new customer that signs up.

And if you have 10,000 of those or a million of those, especially in the free tier, it gets expensive really quick. And so that was where, yeah, we took the first stab at like, how do we virtualize that API server. Definitely the earlier versions, so I think we got it down to like 20, 30 megabytes of overhead per project, which is still a lot better than vanilla. Right?

We were thinking about multiple pods and deployments that need 500 megabytes of RAM just to boot up. And so ended up taking that, investing it into its own kind of project where we could say, how far can we take this and how can we make this a real primitive that, you know, companies can leverage? And we so went super deep.

Weeks and weeks of profiling over and over, getting it down to, you know, one go routine and refactoring the informal libraries and the caches to make sure that they can actually do things the right way without breaking the Kubernetes contract.

And so I think something that we really care about is how close we can stay to what people expect in Kubernetes without making a bunch of our own opinions.

John: Yeah, you had mentioned Cloud Foundry and that being kind of a beginning part of the journey. I find that so fascinating because, in my opinion, Cloud Foundry had probably one of the best developer experiences, you know, of the early cloud era, like being able to CF push something and then like the whole build packs stuff. And granted, you know, the opposite side, the operator side, was an absolute nightmare with Bosch and having to upgrade Cloud Foundry is just like a weeks long task. Haha!

How do you, how do you envision that? You know, and you just briefly mentioned also that developer contract, that developer experience. Like, do you see a related, you know, middle ground where maybe we get closer to a CF push experience with something like a Kplane?

Sometimes I feel that way with KinD, where I'm like, KinD is closer where at least I can have this locally and I can sort of start to iterate on things, but it still ends up being a little nasty as far as like KinD of breaks some of the contracts of how a real cluster works.

Zach: Yeah, I'm so glad that you said the CF push experience because I also think it was one of the best experiences for a developer who's even coming into their career being like, what do I have to do? What is a build and how do I make sure that the image is correct and what does it mean to install packages for the image before it boots up and all these things.

It was like, no, just CF push and you're done. And it would work like 90% of the time unless you have some very weird stuff going on in your container. But yeah, to your point, I think KinD is, I think one of the closer Kubernetes similar experiences where it's like, hey, you're working locally, you want to spin up a Cluster, you run KinD and you get one.

And so we actually based the Kplane CLI very closely to the experience of KinD. we actually do use some of KinD's internals around kind of the worker nodes and like management plane. I think we actually added support for like K3s as well. But yeah, I think that's the goal, right?

It's so easy for people to be like, we have this whole new user experience, but then you have to convince them of why that's better than everything they've been using today. And so rather than doing that, we're like, no, let's just go with what people like and what they know, which is KinD if you're a Kubernetes developer. And then when you use Kplane, it should feel native and normal.

John: Yeah, I've been out of the Kubernetes, I guess the core Kubernetes product development cycle for a little while. But is this something that could enable like a whole clustering, cluster testing kind of thing?

I remember on Tanzu, we really struggled with this because we were building a Kubernetes product. We wanted to be able to test it in a Kubernetes native way and that was really hard. So even trying to like run KinD inside of a cluster was kind of a nightmare.

Zach: Yeah, I think that's a great point. I think CI and testing doesn't get a ton of love or innovation in that space. I mean, I could say the last like 10 operator repos, you know, the tests take 15, 20 minutes to finish and so you're stuck waiting there for that PR.

We did take a stab at this as well. It's called Kplane Test which is kind of similar to Env test. So Envtest is like the normal way to test your controllers where it gives you kind of this embedded etcd and control plane. So KCP has an embeddable etcd that we leveraged.

And then Kplane is the ability to say, actually how can we spin up multiple clusters instantly and then have that kind of interaction happen. And so, yeah, that is what we think is like the whole testing experience of how do we faster and easier for developers.

But I think a lot of people are still stuck on like GitHub Action spins up a KinD cluster, waits two minutes that it installs all of its operators and controllers and CRDs. And that's kind of the state of the world right now.

John: Very cool. Wow.

Brian: Yeah. So I was going to ask about production. I guess you have a use case for production grade Kplane. I was curious, is it ready for primetime? Like if folks are listening they're like ah, this sounds interesting. Like is this something they should be shipping?

Zach: Yeah, you know we are dogfooding everything we're building. So in our Kplane cloud, every organization, every project is also a control plane and those can't go down otherwise you can't do anything inside the cloud.

So you know, I think a lot of it is yes, like continuously test, continuously dog food and get that feedback loop. You know I think a lot of where we're at is just pushing the limits. So I think right now what we're seeing is that etcd as a backend, you can run into a ceiling pretty quick. Etcd has an 8 gigabyte limit. The API server caches everything in memory. So as your etcd fills up, your API server is also filling up. And when you've got a thousand of them running in one box, you're going to hit some ceilings.

And so what we're looking at is how can we scale a single API server replica to a million Kubernetes clusters? And so with that, you know, you have to swap out etcd. There are internal projects at companies like Google where I used to work, where there was like, can we try like leveraging Spanner as a backend for Kubernetes or Postgres?

I think some folks in the open with Kine have been doing that. Kine is just an etcd shim. Right? So Kubernetes actually just speaks to this etcd shim which then translates it into you know, Postgres or SQL. But yeah, so that's kind of where we're focused.

You know we've kind of proven out the, the base functionality, the base like isolation story, all of it is working as expected. We have some customers that are running this in production and others that are onboarding.

So it's definitely in its early days, but we're seeing lots of good signal and yeah, focused on how can we continue to scale this and make it work. You know, when let's say you got one big customer that comes in like what happens if you you know, the noisy neighbor problem, like how do you help automatically migrate them to their own dedicated API server with zero downtime while starting at kind of that shared multi-tendency model. So things like that is really where we're focused.

John: Yeah, you mentioned something there about Kubernetes really being a thin shim or I guess Kine being a fin shim for etcd, which you know, is that whole Raft based distributed key value store originally created I believe by CoreOS way back in the day, you know, and then Red Hat acquires CoreOS and a lot of those people were at Heptio as well. Heptio acquired by VMware. VMware acquired by Broadcom.

You know, the story goes on and on and on. I wonder, really ever since I was working on upstream Kubernetes, the like maintainability and sustainability of these things. And I'm curious from your perspective, this is a topic that comes up all the time in the podcast, like how maintainable and sustainable are these like really critical, like down at the bedrock of Kubernetes and down at the bedrock of like cloud infrastructure-- How maintainable and sustainable are these projects?

Zach: Yeah, I mean that's a great question. And it's definitely not easy. Like you wish it was, you know, your own packages and your own libraries that when you write them no one else is going to change them. In this case it's not. And so if you look at a couple of projects in this space, you often have to fork Kubernetes.

And so we tried our hardest to avoid this. It actually got pretty close. The challenge is that yeah, eventually we needed to fork it and make a couple of changes. So that's where it becomes the standard model of okay, now we have to maintain this fork, we have to maintain it across versions.

And I think for our perspective, yeah, it's just like how do we make sure that we're making the right changes in that fork so that we're not going to shoot ourselves on the foot in six months. But yeah, that's going to be a big effort this year is how do we put that on autopilot where we don't have to sit there and just bring in 2,000 changes every six months. Because that's a nightmare.

And so I think that's where hopefully AI can help with that as well. Of maybe the AI can judge, you know, the complexity of a change and you know, based on our changes and like compare them and see it, you know, it will merge cleanly. Like there might be something there of like let the AI like automatically do these syncs periodically.

John: Right. Haha. The non-forking Kubernetes bit is so funny because that was like a cardinal sin at AWS, like the EKS team was-- I mean I think that was like a charter, but, you know, surprise, surprise, they did and they had to.

And I don't think any of them would admit this, but they basically had like a rolling fork that would kind of backfill a bunch of stuff eventually upstream. But I think there were just too many things that they had to basically keep kind of on a rolling forward maintainability kind of track.

And they had the resources for that and everything, but a lot would still get upstream. So it's very funny because that is definitely the cycle. That's the cycle you find yourself in with Kubernetes.

Zach: Yeah.

I think pre-AI, yeah absolutely not. Like, good luck maintaining a fork yourself. But I think now we have the ability to understand the risks and hopefully speed up parts of that cycle. And so it becomes much more reasonable for a small team. It's definitely still not easy, but it's at least like an option to consider.

John: Yeah.

Brian: Yeah. I'll share an anecdote and I'd love to get your take on this based on this like post-AI world. We were in London for like a couple weeks and I was staying in west London in this like town called Hammersmith. And I learned from like a local news, like the Hammersmith Bridge is like a bazillion years old and it's like basically falling down.

So like out of the 32 bridges, there's like five or six bridges in London that are basically decrepit. And the reason is because the bridges were never made for semi trucks. They were only made for like horse carriages. So like everything needs to be retrofitted to be modernized to like, like we've had semi trucks for years, but now it's just like prolific how many like new grocery stores and every single department store that needs like semi trucks to deliver things.

So like they're, they have this like epidemic right now for bridges where they have to shut down. Entire bridge traffic gets diverted for like a year and it's super painful. And as you're talking about this like post AI world and like what you're building with Kplane, I immediately go to like this story of bridges are basically falling down in London.

We also watched GitHub in Q1, Q2 of this year. Like it's still like basically falling down on a regular basis, lots of degraded surfaces. And it's just the world wasn't really built for this current agent-first experiment. So I was curious, do you feel Kplane's like really fit for this new world?

Zach: Yeah.

What I admire about the Kubernetes maintainers is that they've held the bar so high and so consistently for so long. The challenge though, as you said, is that at some point you have to come to the realization that what you have needs to change.

And so that was our perspective of like, if there's billions or trillions of agents and they each need their own cloud, what does that look like? Nothing today can support that. Like, you know, the fact that it cost you 70 bucks a month for a single cluster on EKS, like that doesn't scale. Right?

And so that's where we're saying, okay, how small can we get these Kubernetes clusters? Because if you can give one to every single agent, the story gets very compelling. I think right now everyone's focused on like every agent gets their own VM, then agent can then attach storage and you know, has its own git file system. We're seeing a bunch of players like, you know, there's 100 new sandbox companies every single week.

Brian: Yeah, well, it get its own credit card as well. Haha.

Zach: Yeah, exactly. Getting its own credit card. And so that's where we say, okay, the infrastructure layer that has powered what we know is the cloud today across pretty much every big cloud provider has been Kubernetes. And so I think we need to make that the first primitive for agents. But to do that, you have to make it small. It has to be super dense, ephemeral, instant spin up, spin down.

And so I think that's where a lot of our focus is for our cloud offering is like, how can we give agents a first class experience for what does it mean to have its own cloud?

Brian: Yeah, powerful.

Zach: And no dig on the, you know, the VM guys. Like, we are probably going to end up using some of them for our own compute.

Brian: Yeah.

Zach: But you know, it's just one of those interesting things of like, I feel like we're seeing a lot of people have to like solve the problem that was solved 10 years ago of like, you're orchestrating a million VMs and you have your own custom implementation of how to do that.

And then everyone realized like, wait, Kubernetes like solved this very well for like every big company that had massive infrastructure. And so I think, you know, I think people will come to that realization.

And then hopefully it's not as scary as it was five years ago where you had to like learn all this yourself and like memorize the APIs where it's like, no, actually AI is really good at Kubernetes. Like, it has a massive corpus of data that it's been trained on from like a decade plus. I think that it just becomes a very natural fit.

John: Yeah. Amen. I love the KubeVirt. I think it's KubeVirt project, which is kind of like the full circle back to Kubernetes managing fleets of VMs. Haha. And yeah, you could plug that into like a hyperscaler cloud provider or your metal boxes or whatever.

But yeah, just the corpus of information that you get deterministically out of Kubernetes as well is very powerful for agents. Like, agents know how to use kubectl, know how to like interface with the resources and all the bits and bobs that make Kubernetes work. So can get good information out of there faster than I would be able to. So it's quite a powerful operator tool in that.

Zach: Yeah, there's a good model there as well, around like zero trust. As much as I am not a big security guy, but, you know, if you have a bunch of agents on your team and your developers are interacting with like a Kubernetes cluster, now you've got like hundreds of agents.

It can be kind of scary to give that agent access to your Kubernetes environment where like another option is actually like spin it up its own control plane, put the APIs that you have in there and then just replicate those back over to the management plane. So like, the agent is just interacting with what it believes to be its own cloud, but it has no access to like the core critical system.

And then of course you just have some simple replicator controller that goes, okay, agent did some work. Let me replicate that back over to the core management plane with policies and resources in place to just be like, don't give the agent core access, give it its own access and then you can evaluate things kind of on the fly.

It's kind of a pattern I think I've seen come up more recently of like, just give it its own control plane.

John: Yeah.

Brian: Well, Zach, I do want to transition us to Reads. Thank you for the conversation on Kplane. I think it's going to actually relate to a lot of our rRads as well. But folks, if you haven't already checked it out, Kplane's on GitHub. Also Kplane.dev if this is all interesting. I don't know if there's a sign up, I guess you could star the repo and start there.

Cool. So question for you, Zach. Are you ready to read?

Zach: Yeah.

Brian: So one one Read I actually wanted to talk to based on the last thing he had mentioned around these like sandboxes-- Prior to sandboxes, we had this world of like frameworks like the Langchains, the crew AIs and a former Llama Index employee, actually the CEO of Llama Index as well, he had a blog post that he had wrote around their transition from building Python frameworks to build AI and agents to now essentially what Laurie Voss also shared because he's not at a new company called Arise.

The focus is now on harness management and harness building. And I think all these like folks are eagerly transitioning into something that's going to have more teeth and like moving forward beyond 2026. So I put a link in the show notes, but it kind of goes through that and like how they had a pretty good time with like the LangGraphs.

But it's very clear that folks are realizing that there's something new cooking and it's been around for like a year. They're harnesses. Like ClaudeCode is an example, but I'll pause there. John do you have any thoughts on this note from Laurie?

John: I do. I think it's a pretty bad take. It's not that like harnesses are going to take over everything. I agree in the fact that, you know, maybe a company who is trying to like just get AI bot to go read some data lake thing to then tell some executives, like, here's how the revenue is doing. I don't know, you probably don't need a whole LangChain pieces of that.

But there's bits and pieces of LangGraph and the like, composability with tools from like SDKs and libraries that is really powerful. Like ultimately it's an SDK and it's a whole like code paradigm to go in and like build an agent and stuff. And that's honestly way more powerful than I think you would ever get from just the harness.

Or they'll, you know, start offering these things as like bits and pieces that can just plug in, you know, like I've been using PI for a while now and PI has a great ecosystem of plugins that you can then like tool into the whole like turns that it does with the inference provider.

So you know, maybe not everything is absolute, but I'd be shocked to find if, you know, nobody wanted SDK-based things. Maybe the real hot take here is that these are just bad businesses, haha, and that, you know, an SDK is was never something that was going to like, generate a ton of revenue.

So harnesses integrated deeply with the inference providers. Sure. The inference providers then get revenue based on, you know, tokens that you spend. I'll pause there. Haha.

Brian: Yeah. Zach, are you using any LangChain today?

Zach: No, I am not the biggest fan of like that vertical of tools. I'm more of a do it yourself, like, and do things incrementally as you need to. I think a thing that you see a lot is like you adopt something like that that's like still very early and it's a lot of opinions that you have to figure out if you need.

And then what happens if they don't offer something? Like, do you have to like upstream it? Do you just like, find like the wedge and like, you know, customize it yourself? And so I've been just in the boat of like, do it yourself. You know, there's lots of very lightweight control loops that you can just build yourself. Yeah, so I'm more of the roll your own, do it yourself rather than take something like a LangChain and try to like, figure out their opinions.

Brian: Cool. I'll move on to the next, the next read I've got, which is a paper for Meta-Harness. You know, this is actually, it's interesting because, like, if you want to kind of find out where the industry is headed, there's a weekly Wednesday paper reading that hosted by the Latent Space Podcast.

And I feel like every paper that gets shared there, maybe most papers that get shared there is like this like huge revelation for the entire industry and everyone writes a blog post on it. This is one of them. But actually I really enjoyed this one, which is about Meta-Harnesses. So this is the, so the harness, the concept of like having--

I'll go to Claude Code because Claude Code's probably the most like known harness to kind of like really shaped up like what future harnesses look like or the current state of harnesses look like. But the idea here is you have a Meta -Harness to basically instead of like leading into skills or leading into fine tuning models, what they were able to do is this lean into fine tuning the harness, which kind of leans into what Laurie was kind of talking through about the future, where he see the future going.

And this is a paper built on that. And I'm going to go ahead and go on a limb and say like a lot of folks read that paper and have written blog posts in response to it. But I've been reading this all day, all week pretty much. And I'm like trying to digest it for nuggets and wisdom and I'll take a step back and like, I spent a lot of time trying to fine tune a model, specifically the low parameter models.

And when I came to conclusions is like skills you get, you'll get the best bang for your buck today. It'll be cheaper just to write a skill. But their take is like if you just start adding to your system prompt and adding to your harness, you could probably also get some pretty good mileage for very bespoke things.

And they go through a couple different examples of like Haiku and it outperforming like basically double the performance just using base Haiku with a Claude Code which yeah, I'd love to experiment with this, probably experiment with this in this weekend, but that's pretty much all I got for the read.

Zach: Yeah, I think I agree with that take. There's still a lot of alpha in a good system prompt, right? Like take the same model, but someone that has a really good system prompt will likely get better results. I think we just saw yesterday the leak of Claude's latest model. Its system prompt is like 112,000 tokens or something like that. It's massive.

John: Wow.

Zach: And so I think that's interesting. I think another piece that's interesting is what you said earlier of fine tuning your own model is kind of still expensive and it takes a little while, maybe a few hours depending on which algorithm you're using. But I'm wondering if the trend there will continue growing where the fine tuning becomes something that could happen in five minutes or 15 minutes and cost you pennies.

Where it's like, wait, if you fine tuned it on, let's say a specific code repo, will the accuracy go up? Do you even need to have all that effort in the system prompt if it already has weights that are adjusted to the latest changes in the repo?

I think that's what keeps me up at night is, you know, which one is going to win? Is it going to be people that like fine tune the latest Chinese model because it's better at coding, or is it going to be people that are like focusing on the harnesses and the system prompts and all of those types of things?

John: That's a good question.

Brian: I lean towards folks who don't mind rolling their sleeves up and really touching the tools. Yeah, I think it's like a good mechanic is probably going to have the best longevity of a car, or going to get the best juice out of the car long term. I think it comes down with that.

Like if you're not a mechanic you're probably gonna reach for the easier solution or probably just go with foundational stuff but maybe bring back like let's open up the hood and look at this stuff and try to reverse engineer things again.

John: Yeah, I find this interesting because a lot of the harnesses even still today are ultimately like a feedback loop goes out to the inference provider, comes back, those conversations turn, and then it presents a few tools to the model that it can use like a read tool, a search tool, a bash tool, you know, et cetera.

And it, it always seems so, so arbitrary and kind of broken whenever I see like you know, Opus 4. 8 or whatever Fable, heck, you know, using ripgrep, I'm just like ah, like that's a little, that's not going to be great for you. I already know and I can see you trying to use ripgrep in this very like kind of crazy way and like that's not going to give you a ton of code that I know that you probably want to go find.

It'd be better if there was like some very AI focused chunking tool or search tool or something that could you know, expand the horizon a little beyond just like a regular expression.

So my hot take here is actually the Astral team being acquired by OpenAI, you know, basically landing on the Codex team, I would just love to see Codex more deeply integrated with UV and like little Python capabilities and things because that would be a powerhouse like really the Meta-Harness of the harness I guess itself or the model inside of the harness or whatever being able to like morph and mutate and like create this kind of liquid tooling that is just Python based to be able to to do more powerful, interesting things.

Yeah, the future is going to be wild. Haha. We'll see.

Brian: Excellent. John you got a read for us.

John: Yeah, I mentioned the Coinbase one. I have another one that I especially thought Zach would find interesting which is this thing that Apple announced at WWDC which is Apple Container machines. And this is something Apple I think has been working on a long time. The whole like idea of an Apple container is basically like their native approach to lightweight Linux containers natively on macOS.

And now they have kind of the whole idea of essentially like a fast, lightweight, persistent Linux box. And I believe what it's plugging into after kind of digging into it is actually Apple's virtualization framework under the hood which is their kind of made native hypervisor on macOS, which is a little different than how some of these others have approached it.

Like I know Orb Stack built kind of their own native vm. I don't know what Lima is doing. Obviously Docker has its own like virtual machine which is quite heavy and stuff. So I think this is awesome.

I think it's very cool that Apple is is kind of dipping its toes a little bit more into the like native Linux capabilities that I guess macOS is Linux anyways but more Linux, right? Haha. Zach, did you did you see this come about?

Zach: I did not. But this is, I think this is pretty cool. My first thought goes, can we get containers that are still running like OS X and macOS? Because in the testing world it's actually so hard to get remote Macs.

John: Oh my gosh. Yeah.

Zach: Still to this day. And I'm just like can we solve that through this? Maybe that's an angle because like you said if it's running on their virtualization technology maybe it will be super easy to just have a bunch of Mac containers now.

John: I would love that as well. I think it's so sad that OS X Serve or whatever that thing from like the 2000 and tens is gone. Maybe it could be Tim Apple or I guess whoever is now over here. If you're listening, please give us macOS as a container that we can run somewhere.

Brian: Yeah, like the new CEO is John because I think it was like Johnny Apple was going to be the new pun. Haha.

But I originally went to like this whole like the run on the Mac Minis with OpenClaw and that sort of clustered everyone, literally a cluster, that everyone was put together for spinning up their agents and yeah the sort of like the run on compute, but also if you could virtualize that stuff.

I think it's bright for Apple, I think bright in the sense that they're taking a step back into the world that they kind of ignored with the prosumer stuff or the focus on the prosumer kind of ignored all this like server things.

I worked at GitHub and I specifically remember when we sourced a bunch of like macOS machines to do macOS runners for GitHub actions and like the screenshots or the actual pictures of the actual room with the trash can Macs, like that's literally how it was solved is like we just had like a cluster of trash can Macs in a closet. And that was how you could get macOS, runners or GitHub actions back in what, 2019.

John: Wow. That's wild. The other crazy thing about this to me is it's all written in Swift. Server side Swift, which I know it's been kind of an up and coming thing, you know, not just for like your apps on an iPhone. But yeah, pretty wild how some people have been saying it's performing and I tried out a little bit. You know, nothing too heavyweight, but Swift, it's happening.

Zach: That's cool. Yeah. I mean, apparently people are saying that it's faster than Docker. So I'd be curious if people are going to actually start switching over and saying, like, we're just going to run this instead of Docker from here on out.

There's also probably like a cost angle there too. Like, you know, if you get big enough, you have to start paying for Docker Hub on all your developer machines because you have a team that's too large. Like, I wonder if this is like an angle to be like, actually, we don't have to do that anymore. We're just going to use containerization.

John: I think that's been the case for a little while. Like, I remember when I was at Amazon and they were working on like Nerdctl and a bunch of those other, like, I forget what-- There were a few of these that were like, well, we're not going to pay for this, so let's just build it ourselves based on ContainerD.

Brian: Yeah, Build versus buy is a real thing, especially in the world of agents and being able to take control. Or let your agents take the wheel.

Zach: Let your agents take the wheel.

John: Yeah. Agents Take the Wheel. Show title. Haha!

Brian: Excellent. Well Zach, thanks so much for the conversation and going through some Reads with us. Folks, again, if you have not checked out Kplane at this point, while you're doing your commute and hanging out in the BART train, check it out. Kplane-dev on GitHub, Kplane.dev on the Internet. And listeners, stay ready.