SEP 25, 2025

53 MIN

Ep. #22, AI and Container Security with Benji Kalman of Root

GuestsBenji Kalman

light mode

about the episode

In episode 22 of Open Source Ready, Brian and John sit down with Benji Kalman, co-founder of Root, to explore the intersection of AI, software development, and security. They unpack "vibe coding," its impact on API proliferation, and the hidden costs of increased technical debt. Learn why a security vulnerability is just a bug with a purpose, and discover how AI agents can be used not just to write code, but to automatically find and remediate vulnerabilities in open-source containers.

about the guests

Benji Kalman is the VP of Engineering and a co-founder at Root, a cybersecurity company focused on agentic vulnerability remediation for open-source software. With a background that includes working at Snyk, he has deep expertise in application security. At Root, Benji is leading the charge to use AI agents to automatically patch vulnerabilities in containers, making open source safer for everyone.

show notes

about the episode

about the guests

show notes

transcript

Brian Douglas: Welcome to another installment of Open Source Ready.

John, we are back for another episode. Are you excited?

John McBride: We're back. Yeah, I'm excited.

Brian: I've actually been super heads down and building a bunch of AI stuff.

I've been really good into AI Runloop containers. So Runloop's a company specifically to run containers at scale.

But the thing that's been coming up a lot with the team is security.

What are we doing about these tokens passing around? How do we leverage users and customers leverage this stuff?

And I had no idea. I'm literally just shooting from the hip on all this stuff.

So I'm really excited to talk to Benji, who's our guest.

Benji, you're coming from Tel Aviv. Do you want to say hello?

Benji Kalman: Thanks for having me. It's lovely to be here.

Brian: Yeah, pleasure. Yeah. So we got connected through mutual connections and really looking forward to hearing your expertise on this space.

But do you want to give us a quick intro of who you are and what you do?

Benji: Yeah. So hi, I'm Benji, and currently I'm the VP Engineering and one of the co-founders at a company called Root.

What we're basically doing is what we call agentic vulnerability remediation and what we're trying to do there is we're trying to basically patch all of open source and provide safe versions of open source packages to anyone who wants to use them. The idea being that people can use these open source packages, especially those running on containers which people feel less comfortable working with and fixing, and use them entirely securely.

And all of this without moving to any proprietary software or anything else. And that's our mission.

Brian: Excellent. Yeah. And you've got a bit of a background on this because I do know you from previously working at Snyk as well.

But I did want to actually talk about an article that really just came out. It was around vibe coding and API explosion.

So you actually wrote this article. So do you want to catch us up on this?

Benji: Thanks. So, yeah, that was a really interesting piece that I think we pulled together.

And I think what's so interesting about that is because it came out a month ago, I'd say that a lot of my thoughts have probably changed a little bit in terms of what was written there.

Just because everything is moving so fast in this field that it becomes really difficult to keep up with everything.

But the main thing thesis of what we're talking about there is basically vibe coding is breaking down all of the traditional rules that we used to have, basically the boundaries that we used to have to allow engineers or anyone to basically add code and specifically, add new APIs or anything else into an existing software.

And the idea here was how do we find the right balance between allowing people to be creative and go wild and add this stuff versus how do we then track and control for all of these different changes coming into our software?

And how do we make sure that we're not introducing a huge amount of technical debt that we then need to deal with.

With the amount of APIs and the inability to necessarily track them, you are also introducing a whole bunch of security threats as well, as you're basically creating more and more of an attack surface for that to go through.

So that's the main thesis of what we were talking about there.

Brian: John, I don't know if you caught up with the article as well, but I know your day job works with the API gateways.

I don't know if you have any additions to this conversation.

John: Yeah, it is a wild world right now where the velocity of changes that can happen inside of APIs and really these are just the interfaces that companies may be providing to clients or whomever to go and do capabilities.

One of the classic examples at Zuplo we talk about is just weather APIs and how everybody's phone connects to some weather API to determine where you are and then get that data.

And the velocity at which you can start delivering more and more stuff with AI is interesting.

And something we think about a lot is just the cost that people shipping these new things at a higher velocity may actually consume. So it's really technical debt, I guess is what it is.

So Benji, I'm curious how you would recommend to people that they keep their technical debts low.

Are there technologies that we have today? I mean, is Root doing anything in this space?

People are vibe coding, people are shipping stuff. What can we do to keep the technical debt low?

Benji: I think there's a couple of things here.

So first of all, let's talk about the positives, which is what's happening is that people were reducing the levels of friction and the requirements that people needed in order to be able to add features to their products.

And that's a great thing because it means that we can move fast and cut through things and we can experiment. And we should really embrace that and be really, really happy that this is something that we have been given the ability to do.

But I think that then, with the great power comes great responsibility. And the first thing is just a mindset, which is, do I clean up after myself?

So, if anybody can add an API, do you let them immediately add that API and go straight to prod with it, or are you doing some levels of checks and verification on what's going through?

Because what happens inside a business is, oh, somebody really wants to add a feature for a user. Somebody comes along now and says, I can add this in five seconds.

There's no architectural review that it needs to go through. It's not limited by any of these human processes that we had.

And then that gets added as an experiment and then because it's successful, there's pressure on the business, let's just deliver that quickly and just continue on.

So some of this is really about how do we, even though we no longer have the limitations of these, limitations don't exist which limit us, how do we add these limitations to ourselves responsibly as engineers and as practitioners and as businesses to do this?

I think that's one of the main things to think through .

The other really technological thing is how are we adding these APIs? What is the mindset that goes into vibe coding?

Because for instance, there's vibe coding and then there's vibe coding. So you can do it well and you can do it less well.

So are people just blindly giving prompts to Cursor or to Claude and seeing what happens?

Or is there a thoughtful process, is there a design process, even a vibe coding power, but a design process that goes in a spec that's created alongside Claude or whatever or anything else which you create with them before you begin adding that API?

Is that spec then peer reviewed by someone else to sort of approve that before you begin the implementation stage?

Are you adding tests alongside what it is that you're doing? And are there humans in the loop here?

So I think there's a whole bunch of things we can do around vibe coding and best practices that are coming in that are really, really important just on that level.

And the technological stuff, API visibility, the tools exist out there for us to have visibility over APIs and what's being used.

These are all the classic tools that have existed for a long point of time. It's worthwhile us using them and worthwhile us making sure. But I really think that if you've gotten to that point that you're only monitoring things at the very end when you have a thousand APIs out there, a lot of things have gone wrong in your engineering process before that. And I think that's where I would recommend people put a lot of their effort.

John: Yeah, it's fascinating because I mean a lot of these things you're talking about are really classic engineering org problems and people problems that existed long before AI tools were a thing.

Having a good engineering process to review a spec review, some docs have good scaffolding and stuff to actually be able to, at the end result, test it, validate it, make sure that things are, you know, in a good state.

One thing recently that I, I forget who I was talking to, but they were talking about Google and how Google has what they would call a better kitchen than a lot of people.

And the kitchen where you make your food, if you have this really amazing kitchen, then it's going to output some really great stuff.

And thinking about Google's kitchen or the place that they actually have process, tooling, environments, all this stuff, to enable engineers, enable product teams to go and ship stuff in a sane way, even before the AI Revolution and all this stuff and now with all the Gemini stuff that they have capabilities in, you know, the kitchen just continues to get better and better.

So, yeah, 100% on that. I think having the process and the people problems really hasn't changed a lot. It's maybe now just more painful with AI.

Brian: I think that conversation came up in our last episode with Chad. Talking about continuous AI.

John: Oh, that's right. Was he talking about the kitchen?

Brian: He was talking about the kitchen. So folks, definitely, if you haven't listened to that episode yet, definitely go back and listen to one.

But I will say to further this analogy, we're now at a place that we all have this industrial sized kitchen which is now AI gives us the tools that we could do things that we could not do prior.

I can't build infrastructure because that's never been my job, but I've always worked with people who've done it.

So I know enough to tell the AI what to do, but I still have not trained myself how to hold a knife properly or how to sterilize the counter efficiently or keep everything up to code.

And I think what you were hinting at, and I'd be curious because you're VP engineering at your current role, is there culture that you have to ingrain in the team to like, "okay, we do engineering things, hopefully we still do these things."

At what point do we bring in the reviews and then we start talking through some of the things that are happening or looking at vulnerabilities that are live in the wild and how do we apply AI to that?

Benji: So I'd say for us, we're both an AI company that we develop a lot with AI and we also provide an AI solution.

So we hit this on both ends of the journey here. So first of all, as a VP of Engineering, well, first of all, obviously the scariest thing for my team is when I start adding code.

So they need to do everything that they can, to make sure that that doesn't happen. Because obviously I am the least capable person of doing that for my very, very skilled team.

And I think coming from that point of view, the things that I'm there to do is to help provide the team with the boundaries of what we want to be doing in terms of a good engineering process.

And there's a couple of things here which I think touch into AI and some of them are processes and some of these are really technical implementations of how you go about and do this.

So, in terms of technical implementation, some simple stuff to do is basically be like, well, we're going to make sure that we have a single source of truth repo, where we have our cursor rules or our Claude rules, and we're all sharing the same rules that we're working from.

And those can be added to by anybody, in terms of anybody inside the team if there's additional rules that need to be added, but that they're shared amongst everybody in terms of what we're doing.

This is the kind of stuff that lived in documentation up until now. And to be honest, it's actually a little bit easier having it inside a repo, as opposed to having it in a Notion or a Confluence document.

Because one thing AI is better at is actually reading things than humans are. And as anyone who's done any management knows, getting humans to read documentation is a little bit more difficult.

So having that single source of truth is really, really important doing this. And then it's about, how are you setting up your coding engine?

So are you giving it access to all of the different repositories that it needs to have access to?

So does it just have access to the single repo, or can you provide it with context on all of the different parts of your system so it's able to provide the best type of solution to that?

And if your system's too big, are you creating documentation and spec, also AI powered, to explain to your coding agent what it should be doing, how the various bits of your system piece together, so it knows to suggest things in the best possible way.

And then I think all of this is just basically setting up the context for how you're going to be doing your prompting and everything else.

And then you get into the actual processes of this, which is really, as I alluded to to begin with, you want to break this down into several different stages.

You want to break down vibe coding into a brainstorming stage, where you're basically brainstorming, you're coming up with ideas, you've been given a feature, so you want to try and understand what's the best way of doing this.

And you're basically trying to be a product manager with your AI agent and create with it the best possible spec that you can to then begin that implementation.

And I mean, that should be something which you spend a couple of hours on, you know, even just sort of piecing that together and building that and then making sure that you have a process which says, "Okay, and then I want someone else to review this."

And that could be the product manager, it could be another engineering, it could be me.

You know, it can be anybody in terms of that, to make sure that you have that right process of the review of the spec before it then begins the implementation and then the stage which says, write the test for all of this.

Make sure that you're actually testing everything that needs to happen before we write a single line of code. What is the test scenarios?

So if I were to get a PR, the very first thing I would look at is a test, right? Because that actually explains to me what's the logical thing that's supposed to be happening.

So having those testing and those unit tests, you know, for you to be able to verify that what's going to happen is going to be the right thing.

And then spitting it out and having it actually try and do what you're doing and then even after that running tests with it.

And again, one of the great things that an AI agent can do, we do a lot of work with containers and everything else.

We're using Kaniko to build containers and do all this different stuff. And to be honest, it's quite annoying to spin that up myself. And it gets to the very edge, Brian, as you were saying, the edge of my own technical ability.

But if I can just have an agent do that to me and test the literal feature that I'm about to add locally in its own environment and then learn from that and improve all of these things and you turn these into regular processes that you have your people do, this is something which you get, at the end of the day, you get much better results from the vibe coding that you get.

So that's how we're trying to tackle this. And it's also something that we're talking about with other engineering teams and trying to explain to them how we're being successful with these tools.

And a lot of it just comes down to: The AI is magic. Great. What is the right process to turn that magic into something which is most useful to you?

Brian: Yeah, and I was listening to a podcast today about students in Berkeley high school using AI and how it's been a hit or miss for teachers.

But then they also went to Berkeley, to UC Berkeley, and also investigated and found that college professors are having a great time with AI .

Because the difference is, when you know how to do it yourself, it's an accelerant, like amplified developers.

That's an experience that's just going to continue to 10x people. But when you come with no information and you have no edge cases, then you don't know what you don't know.

That's where people are kind of really getting the weird vibe coding, security vulnerabilities, issues.

And I think, the way you sort of broke it out is really a clear playbook of you really need to sit down and think about what you're doing first.

And if you don't know what you're doing first, start reading the books and start traversing documentation as a human first.

Because if you don't know how to interact with the agent, then you're gonna basically give yourself a foot gun.

John: Maybe it's time we call it something else. Not vibe coding.

I mean all that. Having context, having awareness of what you're doing and what you're actually gonna be accomplishing.

I think this was another thing in the last episode that Chad was saying was, " what are you trying to do?"

Like, what are you actually trying to do? And if you're in the kitchen with a bunch of sharp knives and you don't know, then you're going to cut a finger off or something or I guess foot gun.

But Benji was saying as well, a lot of that around this whole process, it sounds to me an awful lot like, it's not really like waterfall, it's not really like agile, but it sounds like a new other thing, some engineering process that I think elevates the AI experience.

Vibe coding just gets such a weird rap in my mind because I think about the airplane game that that guy Levelsio on X was making? And it was super broken.

But it was a wildly successful product because he had all the marketing and all the reach.

But for real engineering teams that need to maintain software over long periods of time, I just wonder if we call it something else.

Brian: There's a subreddit now called Context Engineering. And I feel like that's the best term that I've seen so far.

And I'm curious, Benji, if you've seen this as well, where you provide all the context and that's what you were explaining is like, get that context of the docs and know where things are, have your rules.

And what I've been doing a ton of is like, the rules will point to the external link. So then the agent can go ask permission to say, "hey, can I go search for this link?"

So you're giving it some gates to interact with. But if it doesn't need that link, you can say, "no, continue working."

But have you heard the term context engineering yet?

Benji: I've not heard context engineering, but I think it does describe very well what we're talking about because more and more it's becoming around context management like that is what I was describing earlier as well, in terms of if the code base gets too big, you need to begin scaling down, creating smaller documents which describe the code base so it doesn't have to look.

That's context management as well in terms of making sure that you're restricting what goes in so it doesn't go off in some crazy loop.

But also because that really hampers its ability to actually do proper output if the context is too large.

And maybe with AI, the arms race between the different models, the context will just get bigger and bigger. But I still think that you're never going to have a context which is the entire Internet or every known piece of information. There is still going to be advantages of expressly telling what it is that it does or doesn't know, or at the very least being able to have that question and answer session at the beginning to get to the point where you understand what it is that it should or shouldn't know and then only provide that context in the second stage.

That's an important thing. And I don't think it's different from how we always engineered. And I think you're right, John.

When you came to a task as a human, you had a huge amount of context on all of the different things that can be done and you're given this feature that needs to be implemented in the system.

I've seen engineers get lost because they wonder where in the system should I begin? And there's too many different things and I don't know where to go.

So as engineers we've always context managed and said "oh, it makes the most sense that I should add it in this feature in this service, with this storage. And that makes sense because I know that there's another API that looks similar, etc."

So we did that when we implemented this and now it's about making sure that we implement those same best practices inside the AI world.

And I think when people get overexcited about AI and they do think it's just magic rather than a physical thing which exists in the real material world, they forget that these best practices are still going to be relevant.

They can be adjusted for sure, but it's not like we've moved into a plateau where you can simply say to the agent, "Here's the thing, just go ahead and do it. Zero Additional context asked," because it just won't work.

Brian: So I wanted to go back to our original topic about the security part too as well.

And I've got a very specific use case where I'm building a bunch of stuff. So I run DevRel for an AI coding agent called Continue.

And I'm building a bunch of SkunkWorks side projects to expand their product. I'm not full time engineering. All I do is create tech debt.

But I'm very cautious about that where I don't want to build stuff that's going to put the team at risk of like, "hey, Brian shipped something. It's working. We've got a hundred thousand users on it. But it turns out there was a security vulnerability where the token was passed and in a weird spot."

And we spent last episode talking about the Code Rabbit issue that came out a few weeks ago. But I guess it was originally disclosed six months ago. That was the story.

So I'm constantly thinking through, okay, "hey, model the code I'm about to ship, please do a hardcore security review," which I'm trusting the model to like, "okay, you search the web, you do the thing. I'm trusting you to check the boxes and make sure things are covered."

But I imagine there's a better way and it's probably do something. So I insert some blank.

So, I'm curious, Benji, how do you make sure your team's up to speed in what they're doing and not just sort of like shooting from the hip for almost everything when it comes to AI.

Benji: So I don't think there's anything magical, and from like a lot of years dealing with security, both, in offensive and defensive in terms of the security world.

There's nothing magic about security which is different to bugs. A vulnerability is just a bug which has some use to somebody else in the world. So when I think of security and best practices and everything else, the very first thing that I'm thinking about is this is just code quality, just like anything else out there.

So the type of best practices that we're implementing around best quality of code in the same way that I wouldn't want to introduce a bug that like, was a security bug and I'm trying to protect against that.

So it's the same thing that I'm doing around security implementation as well.

And I think as a business, we put a higher premium on security. And as somebody who sells security product, I'm very thankful for that.

But the reason that we do that is because of risks that are involved in that in terms of what that could mean.

But when we're actually implementing as engineers, my system to this is the same system that it would be around all quality, which is you need to ensure the quality levels at every single stage of what we're doing.

And I think that the process that we described helps out with that quite a bit.

And then the existing tools that we have out there, be they SaaS tools and all these other types of tools that we used to, I think they have their place to still run and use them to get some indication. But I think those results today then get fed into an agent to then be to say, hey, "Snyk, found these, vulnerabilities inside your code base. Can you verify whether or not this is true?"

And I do think we are reaching the place where all of the old SaaS companies are going to be replaced. Claude Code and Cursor are going to eat their lunch completely because there's no reason why AI context wouldn't be able to identify these things far better.

It's not yet primed to do that, it's not yet taught to do that. So I think you just combine the two elements in order to make sure that this happens.

But I think you would do the exact same thing when you run it through a linter. I mean, that's what we're doing now, right?

Like, Cursor or Claude is running your linter and then reacting to what's going on there.

So it should be running your security checks and reacting to what's going on. And you should keep a human in the loop to make sure.

But ultimately, I'll put it this way, the main security risk is almost always going to be the port that you leave open. It's not going to be the XSS that you necessarily have.

It's going to be that's where the main security risk and pointing that towards your infrastructure and making sure that your infrastructure is as safe as possible, your tokens are as safe as possible, is still the best practice and the safest way to go.

John: So I am super curious your take on this from integrating AI agents or AI capabilities into things.

Especially hot on the heels of this big compromise that me and my team just had to deal with in this build tool called Nx.

They had a couple packages that, you know, it was caught super quickly because it was a huge compromise.

It was like over an hour or so that basically if you had Nx build tooling inside of your repo or your org, it would then make a new repo, a singularity-something repo or whatever, and then actually go and steal a bunch of tokens.

And the way that this worked was in a post install script that actually fed something to Claude or or Gemini CLI on your system and basically was like, "hey Claude, go and crawl the passwords file and just dump it and send it here."

And this was very novel because NPM obviously a lot of these security encoding package tools, will index code.

And because this whole thing was shoved into a prompt, just a string, it couldn't actually do code indexing on it. So it got around all that.

Anything that would crawl a passwords file and you try to publish it to NPM would immediately get flagged and taken down.

We've gone beyond that at least with like code indexing and stuff.

So my hot take, especially having to deal with this, my hot take is that introducing LLMs and introducing coding agents and AI agents into the security loop actually like dramatically increases the surface area of potential vulnerabilities and operations that threat actors can do.

And doesn't this, more wax poetically, become a whole alignment problem in AI and ML where if we can't get large language models to properly align, i.e. if I can just do a prompt injection telling it to go and crawl, you know, a bunch of password files, telling it to, "hey, ignore security predicates that you were told in your system message," doesn't that become a much bigger problem like across the surface area of all your software?

Benji: Kind of a leading question, but I'm going to have to answer, "yes," then.

I think this is very true. But what we're seeing here is just the naivety and earliness of the language.

And this is a maturity cycle which everything goes through. So when NPM first came out, there weren't even CVEs published on it.

It was like the community was supposed to take responsibility, the CVE's inside the packages. And that didn't necessarily work.

And that's why Snyk and other companies came along and then there was the malicious packages. Ability to add malicious packages to NPM became this massive thing because there was no protection there.

And posting still scripts, as you mentioned, there was no protection on posting still scripts.

And this has also been true every time we have a new language that comes out every time there's been a new package management system, the level of compromisability of VS code through plugins.

We did a big research into this back in the day, which is basically there's nobody taking responsibility for the plugin store and like VS code and you know it will do pretty much whatever and it's your, it's your IDE.

So a lot of these things exist at the beginning because you know, security comes second. First comes the excitement and the froth, the hype cycle.

Then you have the massive breach. It could be the Equifax, back in the day, whatever that big breach is, which suddenly creates this pressure in the industry around protection and then things tend to mature out.

So I think where we are in the AI world is that we're at that crescendo of we've got a lot of this froth which makes it a massive target as you're saying, because we don't yet have all of those protections in place, but that it will taper out and get to that point of protection. Now what do you do in the meantime is the main question.

The problem is here, and this is the obvious thing that if you say well I'm not going to use agents, so I'm just going to code in the same way that I've been doing up till now.

From a business point of view it will be very difficult for you to compete at the delivery and the velocity of people who are using those tools.

Much in the same way that if you would say well I'm not going to use NPM until they've due all of the protection you did, I'm going to write all of those libraries myself, you would find yourself at a disadvantage competing against those people who are pulling in NPM libraries.

So that is where at every point in time we as practitioners need to find the best way of basically yes, using the tools available, but trying to do that in a protected format and ultimately risks involved, security risks as a whole--You need to then ask yourself who's actually going to be utilizing them, how truly at risk I am of breach in terms of even if something comes out there.

Is there a threat actor who is interested enough in me to want to attack me.

How much effort are they willing to put in in order to do this? Like what are the worst end results that could be?

Do I have mitigation, mitigating defenses in place in order to protect against this?

Ultimately security is a big numbers game. So I would never say to somebody, don't use a new tool which could push you forward a lot just because you're afraid of the security impact. Put mitigations in place, do all of these different things, but the likelihood is you aren't going to be breached. And get cyber insurance for those cases when you are.

John: Yeah, that's really good advice. I think that's the weird middle ground we're in right now.

It reminds me of like malpractice insurance. Like a lot of doctors get malpractice insurance, but you know, the ones that really need malpractice insurance are like ER docs, people doing trauma surgery, orthopedic surgeons, because those can go very wrong and often there's like a swath of lawyers who are ready to come after you.

Maybe the analogy here is, you know, it's the nation states ready to come after you for these high value targets. Yeah, that's really good advice actually.

Brian, do you do you and Continue have a direction you've been thinking about moving with some of this? I know talking with Chad last week that the container security stuff was top of mind.

Brian: Yeah, we're thinking about how we interact.

So the way Continue works is that it originally started as like you run it local with Ollama on your machine.

So everything goes through your machine, you control it. And it's actually been super, super nice with the enterprise customers where we don't need access to your information.

It's on your team to manage that across whatever. So that's been super helpful.

So now everything else we do we interact with that sort of mindset of "okay, what actually comes to Continue and what actually doesn't come to Continue."

And we've been able to build a product around that. So not to make this a podcast if it about Continue, but like if you want a local first open source coding, agent, definitely check it out.

I think it's been useful because folks at large enterprises are now running Continue at scale inside of their network and behind their firewall and it's working pretty well.

So as an alternative for folks who don't want to send data, send keys, tokens and stuff like that, that's how it works.

And the thing I've been working on is like a container agent that runs in GitHub.

So what my struggle is really is I need to build a container that's sandboxed, that only has access to what I need, but doesn't need the whole code base, doesn't need all your private information.

With another caveat of like, once you install this thing onto a repo, this is what we're adding and this is what we have access to.

So that's what I work on later today. But yeah, it's a fascinating problem right now because we have a bit of a land grab right now when it comes to all these review bots, all these AI agents that are now moving from your editor from your terminal.

I think you had mentioned Claude Code a few times. They have a really good security action that does have extra context around security and CVS to be able to run alongside of your PR review at time of PR review.

So yeah, there are a lot of SaaS companies that are probably going to be consolidating moving forward or expanding their footprint. So it'll be really interesting in the next few months and I think actually, probably a few weeks, to be quite honest, to see what happens.

John: Yeah. This whole conversation reminds me of an announcement I recently saw from Anthropic.

They've put in like extra guardrails to prevent Sonnet and Opus from telling you how to build a nuclear weapon, which just got this podcast on a list or something.

But that was a legitimate threat. Or I guess the concern that a lot of these AI safety researchers have is that you could use these large language models which have consumed the entirety of human written existence, including stuff that could legitimately enable you to build a nuclear weapon, a bioweapon, something like this.

But now Anthropic is saying that they've built in additional safety measures. I guess you could say I'm like ML-researcher-curious.

I don't have a background in any of this, but I'm so curious how they did this because it seems very close to problems like this where the individual developer obviously doesn't want to get accidentally pwned by running Claude or some LLM or something that has access to tools and mechanisms to run legitimate software on your computer.

Just like Anthropic wouldn't want to accidentally tell how to like, you know, build a nuclear weapon or something.

Benji, I want to loop it back to Toot and what you all are doing with containers and all this.

Have y' all thought about the container space and how that correlates with AI and how this whole space with AI could be solved for.

Benji: So we look at containers as the unit at which we try and run protection, specifically in the open source components involved there.

So what we do is users give us their image. We analyze that image, we find the vulnerabilities inside the packages. You know, that's something which like everyone can do now, that's trite.

But what we then do is we design an upgrade plan for that based on the least amount of breaking changes, all of the vulnerabilities involved.

And that involves upgrading to the, you know, if you're using Debian Bookworm, you know, and Debian Bookworm have the fix, we'll do that.

But if they don't, what we'll then do is we'll see that you're using a package which is vulnerable to a vulnerability. Debian haven't released a fix for that.

An agent will then begin running, looking for any fix for that vulnerability basically anywhere in the world.

You know, it could be the OpsTree, maybe, Ubuntu have done it. They'll find that and they'll then try and apply that fix to the version that's being run inside that container.

And then, if successful, we'll create that new package and install it inside the container.

And we're able to do this thanks to agents being language agnostic, we're able to do this across all the different languages and all the different ecosystems that we can sort of support.

And you're able to then create basically hardened images for peanuts for people.

And this is something which there are companies out there, I don't know, like the Chainguards of the world or even like the Red Hats, who are charging hundreds of thousands of dollars for creating these secure images because there are people who need them.

But with the AI out there, you can do this with very easily for far lower costs, and make it much more available to a lot of people out there.

And I think that's the first thing that we're doing in the container world. But we're also just looking in general at the other type of stuff that we could do now that we have the container and we're playing with it, and we have agents that are smart enough, you know, how else would you make this container more hard and how else would you protect it?

Which isn't just, you know, fixing vulnerabilities and all of this trying to steal with the context and the limitation of don't break the container friend let users still be able to use their code and run on it.

So this is how we're tackling this problem specifically in security in the container world.

And then in terms of like agents running on containers, I mean I think that's obviously the way forward and Docker are pushing this and everything and all of this.

But we also do that the way that we run our fleet basically is spinning up instances of Argo to orchestrate each node that we need in order to have an individual agent run on it.

That way we also no agents know what the other, you know, don't have context or anything else, can't poison each other and just manage a whole fleet through that using the classic Kubernetes, Argo, Carpenter type of infrastructure to basically manage this whole game.

John: Yeah, that is actually really fascinating.

I worked at AWS years ago on this project called Bottlerocket, which is a container based operating system. And we thought about trying to tackle this problem.

This was before AI and the way we thought about doing this was using Amazon's internal ginormous build system called Brazil, which is basically like they went and invented an NPM in Gradle before they were a thing.

And so we thought, we get flagged for a CVE in these containers, we automatically go out to Gradle to try to build a new patch for that software or something.

And I mean it was an untenable solution because we literally would have had to boil the ocean. But with AI, I definitely see that vision.

With AI, it has semantic reasoning on like these fixes that it can go out and get right. Am I understanding that correctly?

Benji: Yeah, yeah, that's basically it.

I mean backporting has always been this nightmare, right? I t's just dirty work.

John, I'll posit to you two things. It was a lot of work and it was work that you didn't want to do.

John: Yeah, yeah.

Benji: Where is the glory in backporting hundreds of fixes onto old pack packages?

Not many people want to sit down and do that every day, all day. So I mean that's part of an issue.

But hey, AI doesn't mind. AI is our friend.

And that is a place where you can begin solving these engineering problems where there was a solution, it was just slightly too complex for "and if" logics of what to do, but not out of the realms of possibility.

And if given enough time, we as humans could have done it, but we didn't want to.

So now there's something else out there which can do it.

We still need to be in the loop. This is very clear. You can't just let it run wild like it will do some wonky stuff if you don't like keep an eye on it.

But getting that like complete build systems. And Amazon aren't the only people. Google do this.

A lot of the big banks do this, have their own internal repositories where they're building all of this, but they're just spending thousands of man hours on it.

And now you could do it. You know, anyone can have that in their back pocket, basically.

John: Yeah, I can personally validate this because after I left Amazon and they were doing a bunch of stuff with their AI infrastructure, I saw something that Andy Jassy, the CEO at Amazon, said about how they'd written hundreds of thousands of lines of code, updating all these things from Java 18 or whatever up to better standards.

And I was like, I know exactly what that patch looks like. It's in Brazil and it's not that exciting, but it's a perfect use case for AI just like you described.

You just let it get in there and like go make those little tweaks and those patches to these things that would have been like very boring for any Amazon team and probably would not have led to a lot of promotions. So nobody wanted to go do it.

So yeah, I can validate that. Yeah, your approach makes a ton of sense.

Brian: Cool. Benji, real quick, if the listeners want to stay educated on security and containers, where would you point them to?

Benji: I think a lot of what the information that I'm gathering now is from like the main sources.

So like you know, looking at everything that Anthropic and Google and everyone else is dropping is really important.

They're dropping tons of new information all the time, keeping up to date.

The other thing is just around like there's some really, really good stuff now in Medium in terms of like a lot of different blogs out there that I'm subscribing to and using from their system.

The Pragmatic Engineer is a great blog. I don't know if you guys know it, it's an amazing blog historically, but he's doing some really great work now also in AI and best practices and everything around that.

And then finally, you know, just keeping it loose on Twitter and basically seeing what happens. And also asking AI a lot about updates in AI and what's the latest news.

Brian: Cool. Yeah, well, we appreciate that. And at this point we're going to transition to Reads.

So, Benji, question to you. Are you ready to read?

Benji: Yeah, let's do it.

Brian: So, John, you had some reads, some articles. Do you want to catch us up on what you've been reading?

John: Yeah. Two spicy ones.

First was that Google supposedly is only going to allow verified apps and verified developers on Android, which is a massive departure from Android's more open ecosystem compared to an iOS ecosystem where you could sideload apps.

There's like whole forks of Android, purpose built for security around this whole idea of being able to sideload a bunch of stuff.

A little wild. I am super curious how this is going to impact the open source ecosystem around Android. It really was a "oh gee" sort of type of project.

I would put it in the same camp as like Chrome and Kubernetes. Kind of wild that it's going that direction though. Are either of you on Android?

Benji: I am on Android, yeah.

I'd love to say for all of the beautiful reasons that you just said, I'm more on Android because it's cheaper. It does the job. So, I mean, that's a good one for me.

I hadn't heard that. And I do think that is wild because I think that, in general, Google's positioning now and I think like Google's maturing as a company as we're kind of seeing it and maybe reactions to various lawsuits and stuff against it and, you know, potential breakups.

They're beginning to put additional protections and everything in place from what they see as their IP and kind of like trying to make additional dollars, but they haven't made dollars before, you know, like, what's the point?

Like, they're going to charge people for being, you know, Android developers and everything else.

So I think this is a way for them potentially to increase revenue. I don't think it's a good thing for the community at large.

It might be a good thing for Google to do from a business point of view, as an open source practitioner. That's wild.

John: Yeah, I had the exact same take that. It really was probably monetarily driven, just given that they yeah seem to be having competitors from every direction on the ad search business.

I have it on good word in the Kubernetes community that internal to GCP and Google's whole cloud division that them open sourcing Kubernetes the way that they did and really just giving it away in hindsight it's like a huge mistake and they wish they could retcon that but I doubt they can or ever would be able to given how CNCF has taken over that.

Brian, do you have any thoughts?

Brian: Yeah, I mean I actually had Android up until last year.

I would take my Android international because the Google Fi was like the best option and then my provider for my iPhone made it better so I didn't need two phones.

But yeah, also surprised like where Androids likes to sort of like it's a place where you have lower friction to get things deployed if you wanted to get involved in developing.

Actually my first attempt at developing was building Android apps. I failed miserably.

Never actually deployed anything and shipped anything there. But that's what got me really interested back when I was in college.

But yeah, it's sad to see this change but also we're seeing a lot of different changes as these big conglomerates, big clouds, they're course correcting and like we're seeing this with even smaller companies who are trying to do a land grab and like close the ecosystem.

I actually just wanted to like share a Reddit post and I think I was trying to grab the social card because I was like, "oh, it'd be great to have social card of the Reddit post to like have that screenshot to share. "

And it blocked me because I was like scraping it directly from Reddit. It was like, "oh, you can't scrape Reddit."

John: Oh wow.

Brian: And I'm like okay, this is interesting.

I get why they're doing this because of financial reasons. But that amount of friction's like, "oh, maybe I just don't share Reddit posts anymore."

John: Wild.

Benji: I think that's the risk that they're running. You know, when everything's free, people don't mind it.

If you have to start paying for stuff, people will be more judicious about where they're going to spend those dollars. And I think that will lead to this additional tightening of the market that we were sort of talking about because people are going to have to choose where they want to spend that money and what they're willing to give up on.

John: Yeah, totally. Well, speaking of the market, I'll go on to my next one, which is that the US government and we almost always try to steer away from politics on this podcast, but here we are.

The US government is taking a 10% stake in Intel. This is a personally crazy saga for me because I worked at VMware when Pat Gelsinger was there.

He went to Intel, he was huge in making the chips act happen and then got kind of, you know, volun-told to leave and now there's some other people at the helm.

And Intel's just longer saga in the public zeitgeist is kind of a failure, I guess.

You know, if I was going to go build a new PC today, I definitely would not pick an intel chip, personally.

I know some people would, but it's a little wild. Yeah, I saw some people calling this a bailout that, you know, Intel is too big to fail.

So, government is going to go take a stake, give them a bunch of money. A little crazy.

Benji: Yeah. I don't want to weigh in on American politics either, to be honest.

John: There you go.

Benji: But I think in terms just of Intel, if we look at Intel and we look at Nvidia and everything else, I think it's just really interesting to look at where Intel have ended up and what's happened there.

And I think we see these cycles in the market as well. And I wouldn't count Intel out because look at where Microsoft was only a few years ago and look where they've rallied to and come back to.

But they look like they're going to be in the wilderness for a bit. And it looks like they made some bad bets and unfortunately they made some bad bets on hardware.

We're all software engineers. It's much easier for us.

But when you mess up hardware, it's a lot, lot more difficult and it takes a lot more time and money to course correct.

You know, we can just scrap our code and start rewriting from scratch. T hat's not the same with chips.

And I think that's where they're finding themselves now.

John: They really needed the Arc line of GPUs to be a success, not only for the consumer market but really enterprise to go and compete with Nvidia. And it was unfortunately not that great.

Brian: So I have a coworker who's from Korea and he was explaining how the Korean government has their hands in almost every single thing when it comes to the tech.

I guess if I had a speculation, it's like America is just basically following suit with how everything else operates outside of America.

But that's also a naive take as well because there's a lot of nuance to that discussion and conversation. But I could also just say it's interesting, and definitely something to pay attention to moving forward.

John: Yeah, plus one.

Benji: Unfortunately I'm an economist as well.

South Korea was founded with government intervention. There are no companies which weren't involved in it.

That was how the Tigers basically, you know, back in the 60s and everything, rose to prominence.

So South Korea and Japan for instance have always played in this way.

America's always played in a slightly different way. It's been very successful for them.

I think it's probably still going to be a successful makeup overall.

But it's not like there's not been bailouts before for companies inside America. I don't think it's a massive change of pattern.

I think it's just right now's point in time, Intel needs some help and potentially that's what it's getting.

John: Yeah, I don't know this but I'd be very curious to understand if any American tech company like this had needed this bailout before and how that went in the past.

Because it seems that maybe the Silicon Valley Bank thing is a close comparison. You know that was a year or two ago.

Brian: Yeah, it didn't quite take ownership but when the automakers had a bunch of issues back during the Obama time.

Yeah there's a bit of a bailout and a bit of oversight that was instituted but I don't think the ownership was actually passed too.

And I could be wrong in that. I know I could ask ChatGPT real quick to catch me up on that but I'll digress and not go down a whole rabbit hole there.

John: Yeah, yeah, not economists, listeners. Be sure to remember that.

But Brian, you had a few picks as well, didn't you?

Brian: Yeah, I've got some, two quick ones which Stack Overflow Server Results came out this week.

So I think it's around 41% of folks are using AI. I think the majority of the use cases are software engineering. So not surprised.

I think it's very relevant and folks are using AI to solve problems. And yeah, it's definitely worth looking at the survey results.

It's an upward trend on people adopting AI. So the conversation around security is definitely, definitely a valid one.

I guess the real question is Stack Overflow. It's doing the survey results but just checking in, when was the last time we looked at Stack Overflow?

John: Recently, actually. I mean it was only the most esoteric thing ever. So I was like, "well I'm going in the weeds."

Brian: Yeah, I gotta get some historical records.

Which, I think maybe last summer I dropped a Stack Overflow post into Copilot to unblock myself because I was like, "all right, I know I can get to this."

And you always gotta go through all the back and forth and you don't quite get the answers but then you get the marked answer that's not the answer.

But then the one that's not marked answer is the one you need. So I did do that last summer, but yeah, it's been a while since I needed to leverage Stack Overflow.

John: One thing I would love is if the Stack Overflow ecosystem of websites--because there's actually so many.

There's a mathematics one and literature and just so many of these different communities.

I would actually love if they competed with Reddit because it seems that Reddit and more social communities are going the direction of, I don't know what you would call it, maybe "AI-ification."

And I would love if there was more human-centric community basis stuff in the Overflow ecosystem of websites.

But I don't know if that makes sense for their business. Probably not, but every once in a while I'll browse the literature and science fiction.

Are they Stack Overflows? How do they, how do they call themselves?

Brian: Stack Exchange?

John: Yeah, yeah, yeah, it's fun to browse those.

Brian: Excellent. And then relevant to the conversation we were having earlier, I wrote a blog post yesterday about pre-commit hooks are back.

And it was super relevant to what you were mentioning earlier Benji about rules and setting up the context.

I got super annoyed with the AI model just constantly, just breaking and linting and making choices that I already made decisions about, like we've got the rule.

And sometimes I'm finding where you have so much. I think Sonnet just announced their million context window now so you can get a million context tokens, specifically in Sonnet, but that's a lot of context.

So when the compacting happens, usually it forgets parts of the pieces that are important.

So now I have a pre-commit hook that I set up which I hate pre-commit hooks.

I hate when I commit and I've got an error or I got a message, I'm like "no verify."

But now in Python I've written a script for pre-commit hooks. Like everything I'm doing has a pre-commit hook so that the agent will just stop and look at the error and fix it before we go to PR.

Benji: I think that makes a lot of sense and I think that, again, it's like a new use for an old tool, but just one that makes a lot of sense in this context.

The pre-commit hook used to be for us to remember to do various things and to do it now, it's reminding the agent that it needs to do all of these different steps.

And I mean I think if there's a simple way of doing it, you can prompt and re-prompt it to try and get it to remember to lint every time or, "please stop writing these giant commit messages. I just want a small commit message, please."

But if there's already a simple tool out there which does that, we should use those simple tools. We shouldn't be using AI or AI techniques for the sake of it. Like there's no good reason to do that.

Brian: Yeah, I did see Scott Hanselman recently did a post around how we're using AI to basically build scripts and bash scripting and stuff like that where there's already a proven model for us to like build scripts to make things work and have a process and things like having a runbook.

But we're over engineering these SaaS applications with AI to do things that we've already solved for 10, 20, 15 years ago.

Yeah, so it's interesting to take a step back and be like, "oh, I've seen this before, I know how to apply my knowledge to this problem."

But the scary part is like there are new college grads and folks that are junior who just don't know what they don't know and they might over engineer something because AI got them into that place.

Benji: Well, I mean this goes right down to that idea of limitations.

If you tell a human to create a cache library, they're going to download a cache library. If you tell AI, "I need a cache library," they're going to create a cache library from scratch. They're going to write 8,000 lines of code in order to do that. This is basically where human intelligence versus AI intelligence differs, and we need to be in the loop to keep it on track.

Brian: Yeah, 100%. Well with that, Benji, thanks so much for staying up late with us and chatting about so many different topics.

Folks, definitely find Benji online and our X, all his information will be in the show notes. And, listeners, stay ready.

Content from the Library

Visit library

Oct 30, 2025

Podcast

Open Source Ready Ep. #24, Runtime for Agents with Ivan Burazin of Daytona

In episode 24 of Open Source Ready, Brian Douglas and John McBride sit down with Ivan Burazin, CEO of Daytona, to explore how his...

Oct 30, 2025

Podcast

Generationship Ep. #46, Canned Monkeys with Don Marti

In episode 46 of Generationship, Rachel Chalmers and Don Marti trace a thoughtful arc from the open source protests of the 1990s...

Oct 16, 2025

Podcast

Generationship Ep. #45, Perspective Density with Allegra Guinan of Lumiera

In episode 45 of Generationship, Rachel Chalmers speaks with Allegra Guinan of Lumiera about the trust dynamics and design ethics...