
Ep. #19, Kubernetes at Scale with Josh Rosso of Reddit
In episode 19 of Open Source Ready, Brian and John speak with Josh Rosso, Principal Engineer at Reddit and author of Production Kubernetes. From his early days at CoreOS and Heptio to running Reddit’s massive compute platform, Josh shares insights into managing Kubernetes at internet scale, the business realities of open source, and the risks smaller OSS projects face. Lastly, they dive into AI’s growing role in engineering and the challenges of keeping the internet human.
Josh Rosso is a Principal Engineer at Reddit, where he leads the compute platform team responsible for everything from the Linux kernel to Kubernetes orchestration. With a career spanning CoreOS, Heptio, and VMware Tanzu, Josh has been at the forefront of building and running production-ready distributed systems. He is also the author of Production Kubernetes.
- Josh Rosso on LinkedIn
- Production Kubernetes (Book)
- CoreOS
- Heptio
- VMware Tanzu
- Kubernetes
- Kubernetes Cluster API
- NATS.io
- etcd
- CNCF (Cloud Native Computing Foundation)
- Fly.io “You’re All Nuts” Article
- Yoav Shoham – Agent-Oriented Programming Paper
- Apple Research – “The Illusion of Thinking” Paper
- Communicating Sequential Processes (Book)
- A Philosophy of Software Design (Book)
- NixOS
transcript
John McBride: And welcome back to another episode of Open Source Ready.
Here again with Brian, as always. How are you doing?
Brian Douglas: I'm doing fantastic. We've got overcast, it's the time of year where San Francisco's basically Seattle, so yeah.
Enjoying some overcast and sprinkles.
John: And you're about a week in at your new gig at Continue.dev. How is that going?
Brian: It's going great. The first day, I think I did a talk at GitHub about AI native coding assistance, and then I did like three videos last week.
So I hit the whole DevRel, developer experience, like ground running, super excited to be back at the saddle doing this type of stuff.
John: Very nice. Well, I'm excited for our guest today.
This is somebody actually who I would consider a mentor, kind of introduced me personally into the open source ecosystem way back in the day on Kubernetes.
He's the author of the book "Production Ready Kubernetes", worked at Heptio, VMware doing Tanzu Community Edition stuff back in the day with me.
I'm excited to bring on Josh Rosso. Josh, how are you doing?
Josh: Doing great, John. Thanks so much for the intro.
John: Yeah.
Josh: To tag on to the weather that Brian mentioned, I'm in Colorado where it is excessively sunny, as it typically is, same as John.
And we're in that cool time of season where Colorado gets very dry, obviously in the summer and things kind of brown out, but we've gotten so much rain and the snow is melted, so everything is green and lush.
It's like prime season for us, so. Yeah, excited for the time of year and excited to be talking to you all.
John: Yeah, it has been really nice actually, which makes me want to go touch grass at some point today.
But besides touching grass, Josh, what have you been up to recently? What are you working on that is, I guess, your current thing?
Josh: Yeah, absolutely. So maybe just like a tiny bit on my background.
I've done a lot of things in the Kubernetes space for a while now I would say.
I don't talk too much about this gig, but I worked at a ESB company, Enterprise Service Bus company called MuleSoft, and we were like experimenting with Kubernetes stuff and Docker stuff back when it was like, I think pre 1.0, and very, very early.
And that got me really excited about it. So I joined a company called CoreOS, and CoreOS built a bunch of key tech, like Container Linux ETCD, which is at the core of Kubernetes, and a bunch of really important open source stuff.
We got to really see the evolution of where distributed systems were headed, especially in the open source, which was super, super exciting because, you know, a lot of companies aren't like Meta or Google, and they're just going to build their orchestrators from the ground up.
Like a lot of us are kind of building on this, if you want to call it, commodity layer. But anyways, CoreOS was very instrumental at building that commodity layer.
Then like you said, John, joined Heptio, got acquired by VMware. John and I were coworkers for a while at VMware, which was really cool.
We built a bunch of Kubernetes-oriented stuff into the vSphere ESXi stack. We did something called Tanzu, or Tanzu.
And yeah, now I'm a Principal Engineer at Reddit. And yeah, the cool thing about being at Reddit just to like add some color, I'm in charge of the compute platform in particular.
So the way I describe that is everything from the Linux kernel at the lowest level, up to the orchestration layer, which is all Kubernetes-based.
And the reason I went to Reddit, or at least one of the main reasons was, I'd argue they're one of the bigger at-scale users of open source tech, and especially Kubernetes.
John: Yeah.
Josh: So it's been so cool to see this thing that I have seen for a really long time kind of building up and see it like at production scale, in use, hosting everything from storage systems, to serving stack of one of the world's biggest websites.
So, I don't know, it feels like a bit of a capstoney type role for me, which has been really fun.
Brian: It is literally the front page of the internet.
Josh: Yeah, exactly.
John: That's partly why I wanted to have this conversation was, you know, like, yet have we really talked to somebody who wields open source software like you, or the compute platform, or really Reddit does.
Can you give us an idea of like what that scale actually is?
Like number of clusters, number of instances, you know, we can always beep something out if we need to.
Josh: Yeah, yeah, absolutely.
So these numbers aren't accurate for Reddit, but just to give you like an idea of scale that Reddit's at.
Like you can think of like in the realm of a hundredish clusters, in the realm of many, many, many thousands and thousands and thousands of nodes, right?
And the complexity for our scale is not just like a numbers game, but largely making sure that we're building infrastructure that services the different types of workloads, or profiles of workloads that Reddit has.
So I think what's really interesting about the distributed system space are things like, what does it mean to run Kafka fully on Kubernetes, and how does that differ from running serving workloads and things that probably are a lot more easy to evict and move around without much notice at all, right?
So yeah, the problem space and scale there is really interesting.
I would say it's a pretty big scale thing, and then the biggest complexity comes into like the workloads themselves and their lifecycle expectations in making that work.
John: Interesting. So it's the kind of scale that, you know, I remember us kind of talking about at VMware that some of our customers were going to be using on top of this open source technology.
You know, I feel like we have this conversation a lot when we get together and get a beer or something, but like what are the challenges that you see in that like open source model?
This always comes up in conversations with people, and I've always wanted your take on it, be it around like the business model of open source or the licensing.
Reddit and you have such a unique perspective on like actually utilizing that technology.
Like where are the failure modes, where does it go wrong?
Josh: Yeah, it's tough.
Like John and I both, I think, at least coming from Pivotal and VMware, John, like we have both spent a lot of time in the vendor software space around open source.
And I actually think that's quite a bit different than the end user space.
What I'm getting at here is, when you're using open source tech at a certain scale, things inherently will go wrong with that open source tech, and you will get to the tattered edges and end up at these different impasses where you have to say, will I contribute changes upstream? Will I build something new? Will I fork and maintain the thing? And all these options have trade-offs, right?
Some of them allow you to go really, really wicked fast and solve your business needs, but carry a bunch of technical debt indefinitely.
Other ones require you to show up to 27 committee meetings and maybe get the thing changed in six years from now, right?
And it's really, really tough, and the reason I bring up the vendor versus end user perspective and why that's especially tough is, one of the benefits of being in the vendor position I have found, is that the business value add to open source involvement and contribution is a lot more clear and easy to justify.
So if you are Red Hat, and you've built a lot of OpenShift on top of, let's say, something like Cluster API, well, being involved in that core ecosystem, and helping that project move forward is critical to the thing that you're selling your end customer, right?
John: Right.
Josh: And at Reddit, we have similar problems.
We build a lot of stuff on top of Cluster API, we have a whole KubeCon talk about this thing we built called the Achilles Framework, which uses Cluster API under the hood.
And at the end of the day, we hit tons of issues or limitations with this tech, and it's harder to make the business case for, we're going to pull engineers away from business problems to work on this open source tech in their day to day.
It's not impossible, and I think Reddit's getting better at it, we're getting more involved in some upstream projects for open source projects, but it truly is a complicated business case to make, at least so far in my experience.
John: Yeah, that is a funny like journey, working in the vendor space, you know, shipping cloud products essentially for the enterprises and whatever.
And then being at AWS, which is very similar.
And then Brian and I were at OpenSauced, where I kind of flipped that script, shipping a bunch of Kubernetes platforms that essentially were in service of running OpenSauced and some of the AI stuff we had, not so much like contributing it back to then go and sell that cloud product and technology.
And even in, like, quote unquote, open source AI and a lot of these things, like we shipped this thing, VLLM, which is a open source inference engine, and like hitting a lot of rough edges on that and being like, I don't have any time to go and contribute that back.
I don't know if we have engineering resources to go and like fix these things that maybe are edge cases, or we can kind of start to, you know, bubble gum and duct tape around.
It's an interesting dichotomy. I mean where do you try to push Reddit leadership? Like finding a delicate balance or one way or another?
Josh: You know, I think the good news is that our leadership does really get it, especially in the infrastructure domain.
So I don't think the sales pitch is particularly hard, but you know how all businesses run, and this is not just a Reddit problem.
It's at the end of the day, you're here to help the bottom line in some way, and when certain fires prop up, and certain business cases need to be solved, like it becomes a prioritization exercise.
And when you're looking at, you know, making a change to upstream Cluster API, versus making sure the website doesn't go down, or if we're at VMware, the vSphere engine doesn't have like a really key bug or whatever it might be.
It's going to be hard to win that battle.
So I think it just comes down to really trying to set up a culture around open source involvement, and just know that overnight, you're probably not just going to be able to say, my engineers spend 50% of their time working on open source. But I think as key issues come up, as projects drift in directions that might not benefit you, you start to use that as, it sounds a little weird, but like ammunition for your case around, yeah, we really should start investing here. We are one of the big at-scale users and so on.
John: Yeah.
Josh: Here's actually a really interesting perspective that I sometimes think about, right?
You look at like some of the biggest websites in the world, right?
And you've got everything related to Google, you've got Google, YouTube, so on.
You've also got the Meta related stuff. So you've got Facebook, you've got Instagram and so on.
Now, Meta and Google, way bigger scale than Reddit. Not trying to compare apples to oranges here, but Reddit is in that top 10 somewhere with them.
So at their scale, they're building internal platforms, things like Borg.
Meta has, I can't remember if they call it Tupperware or Twine now, but something in that realm that does the internal orchestration.
They have engineering teams who can build this from the ground up.
And that's really good for them because they can move really fast in a lot of ways.
They don't have to make changes with a committee of 20 representatives from different companies that all agree on it.
They can just solve the business needs. Now, you look at shops that are smaller than them.
So I'm thinking like Snapchat, Pinterest, Airbnb, Reddit, and then the even smaller shops than that.
I think you're at a point where you can't fund an end-to-end orchestration platform development project from the ground up.
Open source is largely what allows your medium to medium largest size company to scale and be able to solve a lot of key problems.
But at the same time, you're again in an ecosystem where the general purpose solution needs to solve problems for more than just your business, right?
So the TLDR, I guess I'd give is like, I think there's a good business case you can give to, you want to have a seat at the table, even if you're an end user.
And whether that's through CNCF, whether that's through being in a SIG and so on, there are really good ways that you can justify that.
And then you just have to do some defense.
As fires pop up, as other priorities come up, you need to constantly remind leadership and others why this is important, not just for the project but for the business, because the business is standing on the shoulders of said project.
Brian: So you mentioned medium-sized companies, and like sort of like the layer below, like the Metas.
Recently we had this NATS thing happen with Synadia, and I'm curious your take on smaller companies that are up and coming, trying to put something out there, and like creating the table but not quite having anybody who wants to come up to the table to help support this.
Josh: Yeah, yeah, it's tough.
I see it as like an existential risk for any project that you adopt or get involved in.
You know, whether it's the NATS stuff, whether it is Redis, right?
Like there's just so many different ways that things can go wrong in regards to the thing that you're adopting and potentially building your stack on top of.
I'd be actually really curious on y'all's thoughts of how all that played out.
The best thing that we've been able to do to kind of protect against us at Reddit is we have kind of a principle where we think of everything we build and offer in our platform as something we always wrap a moat around.
The idea being that we have an API surface area that maybe somewhat would allow us to change the underpinning.
So whether that is a message broker, a cluster, a network that you need, right?
And under the hood, if that's being satisfied by Crossplane or Terraform, or whatever infra automation tooling, homegrown stuff we've built, we actually have an API surface that will allow us to shift off that over time.
That being said, like if we were fully built on NATs and needed to shift off it overnight, I don't think it would be that easy of a story.
So yeah, I'd actually be curious on y'all's take of how that kind of progressed and so on, 'cause I don't have a really strong opinion.
Brian: Yeah, I mean I lived it while my time during CNCF, and it just really turned into like, the project was very complicated.
There was like a lot of hopes and dreams on getting this project, like having more adoption and folks at the table.
And this turned out there wasn't as much of a priority to do that.
And then there were also some like closed doors to be able to like invite other people to the room to chat.
So where it came to a head, it really came down to like the, and I think a lot of this is public as well, which the CNCF has done a really good job of trying to push the conversation in public.
So hats off to them, but also there's some other stuff that's like questionable.
But I guess what I'm getting at is like, I think that there were more of the sort of API layer the control plane layer inside of NATS, I think we probably could have seen the, so the evolution of the ecosystem, like what we're seeing with Kubernetes, like I'm not deploying my own clusters, like I'm using the click ops stuff.
Like, let's just be real at this point in my career. So like we've gone to a place where like I don't have to go drop down to like Linux or Kubernetes, and I think NATS needs a bit of that.
So I think hopefully moving forward, there are folks who are now invested in the future of that. And I was, you mentioned ETCD, this is another one that is like, I was doing a lot of project health stuff when at the CNCF.
And that's a project that's, I'll say it out loud, it's at risk of like not enough people paying attention to it, and like either the industry moves to something else, or folks come back to the well and see where all the holes are to start plugging those.
John: I think about this sometimes whenever I go to a KubeCon, I'm like, this is huge, this is insane how big this thing is.
And like all the stuff that is just kind of congealed around Kubernetes and the cloud native ecosystem, I guess.
And then I hear and have started to get involved in like the next wave of some of these open source things, like model context protocol, I mentioned VLLM, and people having these conversations around like do we start a foundation?
Like do we actually go and try to build that table like NATS was trying to do?
And I almost just wonder if that model is a little broken where it's like we're going to sell courses and certifications, we're going to have a big ass event that tens of thousands of people go to, and we sell tickets for, and creates a whole economy of open source that then can go back and support companies supporting the open source work.
And I just, I try to like wrap my head around like how that evolved and I'm just like, man, what, is the model itself broken, like was it a fraught effort, you know, using the NATS example to try to go build that table that could then support a whole company, I guess, shipping something into the open source?
I can't remember who said this, but people are pretty skeptical of open source today because of that kind of like economic rug pull that seems to always be inevitable. You know, you're either like getting into an ecosystem that's so vendor locked in that it's like you kind of have no choice but you're just going to eat that cost now. Or you're investing in an open source ecosystem where maintainers disappear, or there's that rug pull.
I don't know, I don't know how to feel about it these days to land the plane on that, I guess.
Josh: It is a wildly hard problem to solve, like truly like, you know, whether it's we're looking at CNCF or Apache, or any foundation, and any people who are trying to solve this problem, it is massively, massively challenging.
And I wish I had really strong insight into how to make it better.
But I guess from the pure engineering standpoint and the adoption of it, you know, the best thing I try to keep in mind, especially when, you know, working at Reddit or different shops is just that I think it was Craig McLuckie who used to say, "free as in puppy, not free as in beer," right?
And try to really think of open source adoption as a little bit closer to free as in puppy, in that like, you're with this thing, and should the maintainers disappear, or the project change its licensing, like you will be on the hook for finding resolution in that for the sake of your business.
And it's tough. I think at the end of the day, at least with where we're at today, you just inherit some amount of risk when you adopt some open source tech.
And going back to our earlier convo, hopefully you also make that open source tech a little bit better, and chip away at it and help the ecosystem, and to your point, Brian, come to the table, right?
But if we're being honest, like all open source adoption has inherent risk with it, especially for the projects that aren't as big as like Kubernetes.
Right? The sub-projects of the ecosystem, so.
John: Yeah, totally. I think that's incredible perspective. I'm going to use that, free as in puppy. Not as in beer.
'Cause yeah, I mean you adopt this dog, you're going to be taking care of it, right?
Josh: Exactly.
John: One of the things, you know, that we've talked about before over beer and whatnot that I definitely wanted to get your take on, as it seems to be just ever-evolving.
In our show notes, I linked something, and listeners will have it in our notes, it was this article from fly.io that kind of made the rounds that was titled, "You're All Nuts".
It was kind of a rant around, you know, somebody who's been in the software ecosystem and building for a long time, their adoption and skepticism, and then an inevitable, I guess, embrace of AI, and using AI for software development.
And Josh, I've always, again, like I've considered you a mentor, and I'm very curious to your hot takes on AI these days, and you know, maybe even if you're willing, what Reddit does along AI development, and how you enable your developers to use it or not use it.
Josh: I'm so sorry to disappoint, but I think you're going to find my takes lukewarm rather than hot.
If I can actually start by asking you two a quick question, then I promise I won't dodge it and actually give you some of my thoughts.
John: Yeah.
Josh: The agenic space that's been evolving MCP, trying to like encapsulate, I don't understand it fully, so I want to just ask in terms of things I understand.
Is the rough idea for the trajectory of in-editor support basically something similar to LSP, and that LSP can basically go out and talk to models based on the context of where you're at.
Is that in the ballpark of where, or maybe you could enlighten me on where my gaps are real quick?
John: That would be a component if you were building a full like end to end experience.
Like if I was going to go do this in Neovim, you know, I could definitely just like, you know, grab the chunk around, you know, the text inside of, you know, wherever the LSP had gotten context from, and go and query that from the language server protocol.
But then also feed that into an LLM context window, and then also use model context protocol to go and, you know, do anything else, really, be it use an external service, utilize tools on your machine, really anything.
So, I mean, what's your perspective, Brian? Because now you're in deep with kind of the editor ecosystem.
Brian: Yeah, I'm in the belly of the beast at this point.
So I did a video last week on LinkedIn and a few other places where I kind of walked through the idea of like, at Continue, where I work my day job, Amplify.dev.
The idea is like we could Amplify developers, not replace them or automate 'em out of a job.
And where I'm thinking of the context where, yes, when you generate Ruby on Rails code, it's going to be awful, because like a lot of the folks that are training models, it's going to be on React code, it's going to be around like Python.
Maybe if you're lucky, some Go code. So the model is just not really ready for prime time.
They're like this hit Tab and see everything generate. But the way it works is like you have the foundational models, like the sort of Codestral, or you have like the Llama or Sonnet.
And then on top of that to make it better, you can now train the model through, like you could do indexing direct, you can do MCP that talks directly to API servers and has more context there.
So what I've been doing is I've been building like specs and test first, and then letting the AI work within the bounds of that, which I think is the, what was the article?
"Are You Nuts" from the fly.io that we have linked in the show notes?
John: "You're All Nuts".
Brian: Oh, "You're All Nuts", yeah. Like I appreciate the article 'cause these are folks I looked up to when I was coming up as an engineer, and I see them seeing the world pass them by, or maybe morph into something new.
And I think these are folks that are going to help get us grounded, 'cause I don't think AI is like Web3, I think it's going to stick around. So, which is my take.
Like I think we're probably going to have to like start talking to the bots and the agents. But like what's next? I don't know.
John: Yeah, one good perspective somebody recently gave me around that exactly was that like, unless one day Anthropic is like, "oops, sorry, Claude is actually just a bunch of people, I don't know, offshore" or something, which recently did happen with a company, right?
And you know, that's obviously not the case with Claude. It's generating way too fast, way too quickly.
But you know, it's around, and it's tools that I think are going to get more and more deeply integrated.
I mean, it's funny being in the belly of the beast now too 'cause on this product at Zuplo that I work on, it's mostly going to be around AI enablement, building MCP servers on top of people's APIs that they're essentially AI ready, and just deeper and deeper integration with more and more of these things.
It's funny, Josh, you mentioned MuleSoft. I actually think they're one of our competitors, so.
Josh: Oh, interesting, yeah, yeah.
John: Tech is very small.
Josh: They've pivoted out of just enterprise service buses, of course. So, okay, this all makes a ton of sense.
My opinions on what I'm about to say with AI usage, and LLMs, and coding does not reflect Reddit's, to be super clear.
Honestly, we're still early on that journey. This is just kind of my take from using it a little bit and seeing some coworkers and friends use it.
On the, "You're All Nuts" argument, I think it's pushing back in a good way actually.
At the end of the day, the narrative of like, you know, it's bad at this thing here, or it hallucinates this thing here.
Like sure, yeah, there's gaps in this technology, and it is not going to be perfect.
And I think we overrotate on trying to find every corner case where we can prove it wrong, and it does a bad job, and then we just use that as a, you know, stake in the ground of this will never work.
You know, obviously, I don't know, like in regards to our trajectory with AI, I have no idea if we're like approaching some maximum that is going to peter off, or if we're just like super early.
But regardless of what it is, like the jury's still out on this one.
We're going to see how it evolves, and I think that at the end of the day, you would be naive to not be trying to use LLMs in your day-to-day workflow, or use AI in your day-to-day workflow. It's so obvious it's something that you should be trying to integrate.
I think the like most novel argument that is a little bit trickier for me is I think in this article they call it "But the Plagiarism", specifically around what you train on, the licensing available to it and so on, right?
So like, here's an interesting example. So John, you have or still have a blog, I believe, that you write technical content to, and I do as well.
And I actually am totally fine with AI training on my stuff, but it's interesting that I don't really have a say in it, right?
Like the best parallel I can draw here is, in search, you would do like a robots.txt file to control some like indexing stuff on your content. I believe that's how it works at least.
And it's funny that there's no control there for AI. Like I think you should be able to opt out of being trained on and being used in certain regards.
And the reason I kind of think that's important too is I worry that more websites will start putting their stuff behind a paywall or a subscription wall, or a "you have to be signed up" wall, right?
Because that will be their mechanism for protecting against AI.
I really like the internet being open. Like I like being able to click on a link and have no commitment to consume the thing. It's one of the things I love about this crazy place that is the internet.
And I do worry that the incentives will flip a bit if there's no control, and dare I say, sorry to use the L word, legislation maybe around what is allowed to be trained on and what controls authors have.
I think that's a pretty interesting thing. Yeah. But my lukewarm take, John, is, it seems great, it's progressing in an interesting direction.
If you're a software engineer, you should be actively using it and learning how to take advantage of it at this point.
John: Yeah I mean, let's get into that around like the verifiable nature of like just the web, and like that direction that's kind of moving, because I know this was a topic you wanted to bring.
It does seem like that's the way the open web is moving, is that things are more, I don't know, you got to accept cookies and banners at the least.
You have to sign up, pay for a subscription, et cetera.
You're in a fascinating position at Reddit, obviously a place that has locked down things in the past.
Where do you see this going? Like where where's the worst case? Where's the best case?
Josh: Yeah, and again, this won't represent Reddit's product direction by any means.
But just to kind of be clear on my own personal preferences, and I think Reddit does a good job of this and it's kind of why I love the website and the company.
I think that as AI gets more and more integrated into everything from searching to content creation, my theory is we're going to start salivating over the opportunity to be guaranteed that we're consuming human discourse rather than AI-generated bits, or even things that are AI-assisted that help kind of work in advertisements and other intentions that you aren't super aware of.
An interesting example that I can give you all is I am renovating a basement in my house right now, and I have no idea what I'm doing. I'm an absolute idiot, like.
John: Are you doing it?
Josh: I'm doing it, yeah.
John: Wow.
Josh: Which is a huge mistake, but it's fun, right?
So as you go through this, like if you search, I want to know about the benefits of a mini split HVAC system on Google. Your first result is no longer an article from somebody.
It's an AI summary of a bunch of stuff, and then from there it's some sponsored ads, from there, and then you might actually get into some content, right?
So I think the key thing is a lot of times when I'm looking up information about stuff of the shape, I really want to know that it's John and Brian arguing over HVAC trade-offs, rather than John and some like arbitrary content that was potentially generated outside of the human mind.
John: Yeah. Well, and this is why so many people would put like, you know, "best music from 2008 Reddit" in Google. Cause you would just pop up a great thread about music from 2008, right?
Josh: Yep. And the question, I think, also becomes like, if we do head down this path, do we need to really head into a more nuanced take of human verification?
Like CAPTCHA is obviously one of the things that we've used for a long time.
But I don't know, is CAPTCHA going to withhold all of the like AI progression that, maybe, I don't know the tech deep enough, but it worries me a little bit that CAPTCHA will be able to hold the line.
Brian: Yeah, 'cause this world's moving so quick, and I think people are getting responses to all the questions like within weeks and not even months.
So I did Google to find out llms.txt exist, I think only just started getting adoption in February of this year.
So like we're on the cutting edge of this world right now. I don't know how many of these LLMs respect that, so I can't speak on that.
I haven't got that far into the research.
But I did actually chat with, there's a YC company that I think is like either currently in YC, or maybe it was the last batch, that like we all know SEO, and like you can write this sort of like, hey Reddit versus Dig, like tell me what are the pros and cons?
And then it's going to show up, and like Dig is going to be number one and then sell.
But I guess what I'm getting at is like there's these like standard SEO articles you write so that we show up on the top.
There are now standard articles that you write that are like listicles of like top coding assistants. Here's the list.
And those are written to be trained inside LLMs to be sort of crawled and captured by Perplexity, or whatever Gemini thing that they're doing now in Google.
So like the world is shifting very quickly on us.
Josh: Yeah, the llm.txt thing is really interesting, yeah. I had no idea that that was the direction it was progressing. That's really cool.
Brian: Yeah, and CloudFlare has a feature as well that we should, I should probably mention that you could also, if you wanted to block your blog from being ingested into LLMs, there's a feature for that.
I haven't used it, but it was something they shipped last November.
Josh: That's really cool. Another interesting thought, and if you can't tell, my like thinking on this has not gotten too deep.
It's mostly just of like, how do I like to consume the internet, and what will I miss if AI takes over a large portion of content creation and discovery, right?
Another interesting thing on the flip side is, as we get smarter about human verification, like a really maybe obvious approach that could take is some type of like biometric type physical verification.
But one of the beauties of the internet, at least in sites like Reddit and other sites is the ability to be anonymous as well.
So that opens up a really interesting can of worms of like, sure, I can verify John's a human at a physical human level, at least I think.
But you know, that does come with like a really big trade off around privacy and other related concerns.
So it's going to be a really interesting thing to see evolve.
John: There was something I heard about last time I was in San Francisco a couple weeks ago that I just thought was so silly, but is kind of in this realm of like human verification that you're talking about.
It's a Sam Altman company, it's Sam Altman from OpenAI and Y Combinator. It's called Worldcoin, and it's a cryptocurrency, it's a blockchain.
But what you do to go get, I guess a wallet, or, you know, I'm not a crypto person, but essentially you go and get your retina scanned at one of these like kiosk things around the world somewhere.
And then that gives you some kind of verification on their blockchain that like, yes, your retina has been scanned, you're a human, you can then go do stuff with Worldcoin, you know, on Web3 or whatever.
And that would be sort of the chain of trust starting at your retina scan.
And in theory, people's retina scans are so unique that it would create that hash that would be, you know, verifiable 10, 15, 20 years from now if I needed like to rebuild my wallet, I guess, I don't know.
I'm not exactly sure how this would work.
It's just, it sounds so complicated to build that verifiable chain of trust of humanity.
And if Sam Altman's thinking about this, then clearly it's something, yeah, that could be a problem going into the future that needs to be solved, right?
Josh: Yeah, it's super interesting. And I'd ask the same question too is like, is there any way to achieve being anonymous still in a model of that shape, right?
And honestly, like some people don't want to be anonymous.
Like some people probably want things to be tied to their identity as a human and relate to them and so on.
But you know, I think we should keep in mind that a lot of people have benefited from, you know, sharing life experiences, and being in, you know, quote unquote, safe spaces that they can like talk to other people pretty openly about things they're experiencing on the internet because they were under the banner of being anonymous, right?
And I think that's really important to a lot of people who utilize the internet.
And I'm curious if we head down this path, will there be any way to maintain that option at least in cases that it is desirable, right?
John: Yeah. Well I mean even tying this back into the open source ecosystem and community, like this was kind of a core tenet for a long time, right, in like the '80s and '90s.
And sure, people could, you know, have their, I guess GBG keys that get verified by your git commits and stuff.
But more or less, you know, you weren't really sure who was who inside of a git commit log.
Maybe that's very different now where people's GitHubs are closely, you know, tied to their real identities.
Maybe even have, I don't know, like other chains of trust that you got together with a bunch of people to sign keys together.
This was actually a problem space Brian and I were thinking about at OpenSauced, because we already were seeing that it's just like, man, like how do you know that, you know, Jia Tan isn't some North Korean threat actor trying to hack XZ and SSH, right?
Like who is Jia Tan actually? How we like actually verify in that chain of trust who this person is, let alone that they're a threat actor or where they're from, but that they're a bot, or that, you know, this was Cursor committing this into the repo or something, right?
Brian: Yeah, you would think like the tools we have, and this is why I'm thinking like with AI is great, and it's great 'cause it's like it's leveraging like it's 10Xing myself, like my 10X engineering experience.
But there are other things that we, like I mentioned, writing test and like using a spec, and like keeping your boundaries.
But also like we've got SSH keys as well that you could totally just like, hey, it's cool that you generated this, but like who's signing off on this before this comes into my open source repo?
And like a lot of CNCF projects have good hygiene around this. A lot of open source don't, and I think that it's going to be on a lot of us to, if we're going to accept that AI's here to stay, like we're going to have to all sort of rise above or sort of like the rising tide raises all boats type of situation.
It's a world that we're going to have to really lean into and like share and tell more often than like previously.
John: This actually makes me think of, you know, we recently talked to Thorsten Bell at Sourcegraph about Amp, and you know, their AI agent that they're building.
And I forget who this is, but one of his coworkers at Sourcegraph has been running Amp with like hundreds of sub-agents trying to build a compiler.
I don't know exactly for what, but I see this pop up every once in a while online, and just letting it run for weeks basically.
Just seeing like where it will get on like a really difficult problem space with basically unfettered access to, yeah, like committing into the log, to running tests, to using compute, to like spinning off other subagents.
Like that whole space is very interesting. I'm curious where it'll get.
But I mean, some of these things that maybe I feel like I have that expertise in is when it starts to hurt and that, you know, maybe landing the plane on like "you're all crazy," and that article from Fly.
That's I think where it kind of, you know, stung for a lot of people, where it was this person saying like, you know, you should try to adopt, and like you were saying, Josh, utilize some of these things, it ends up being very personal, right?
Josh: Absolutely. Less on the personal side and more on the technical side of like this agent thing that Sourcegraph is trying out, I think an interesting thing is we're really focused right now, at least it seems like on like the actual development of code side.
I am actually a little bit more interested in where LLMs will find their place at the runtime level.
So like here's an interesting parallel. I don't want LLMs replacing all of the code in a Boeing jet anytime soon that I'm flying on.
However, we are seeing LLMs replace, as least as I understand it, large swaths of C++ code, and things like fully self-driving vehicles.
I think Tesla has bragged about doing this and stuff like that, right?
So the interesting thing is like over time, you train, you gain information, and you potentially replace code bases because the model can help make really informed capable decisions.
Well, at the runtime level in these distributed systems, we're making really interesting decisions all the time.
Where are LLMs going to be in terms of making scheduling decisions?
Like one of the things that my team spends tons of time on is making nuanced eviction, autoscaling, and bin packing decisions.
'Cause at the end of the day, like us packing stuff efficiently is many millions of dollars on the table, right?
So to wrap it in a bow, like I do really wonder as we progress and we train models more, like what are chunks of runtime level code in my world and distributed systems, that we might actually delegate complex decisions to LLMs, and just replace some of our code wholesale?
That's a really exciting, I think, fun thing to experiment with.
John: Yeah, that is fascinating. I've had the same thoughts around like, you know, nuanced decisions that an LLM could make by just like consuming a bunch of stuff from tool calls that it makes on a platform.
You know, maybe a scheduling or bin packing decision, or even something simple as like, oh I see there's new requests coming in.
Maybe I should like, I don't know, start scaling things up just a little bit. See how that goes. See where requests continue to pile in.
It's an interesting space, yeah. Well with that being said, Josh, I wanted to ask you if you are ready for Reads.
Josh: Yes.
Brian: I can jump in, John, and mention my read.
I was sitting on a talk with Justin Bowen, who's currently working on, in the Rails framework, they've got these like active mailer, active record.
So he is working on active agents, and so it's like a separate Ruby gym that's not quite integrated to the big monolith.
But it was cool to kind of see what the Rails community is doing. And he brought this one paper, which is about agent-oriented programming.
And I was like, oh what a great thing for him to create. Turns out he didn't, it was like a '90s paper out of Stanford from, is it Yoav Shoham?
And it's a interesting paper that's actually, it is actually instigated a lot of this AGI stuff that we're sort of seeing, and folks trying, attempting to do agentic programming, but also agentic framework type stuff, which I thought was really interesting 'cause I don't have a CS degree, like I sort of catch up, I sort of like pivoted my career into writing code full-time.
So I love when I find stuff like this, it's like influence the stuff we're doing today to kind of catch up.
So it's agent-oriented programming, so similar to object-oriented programming.
And the idea is like object-oriented programming of like these dumb objects that are like classes and attributes.
And with agent, it actually has the sort of framework of reasoning, which is not meant to always be autonomous. So if anybody's read the books like "iRobot", and who's the guy?
John: Isaac Asimov.
Brian: Yeah, Asimov, yeah. So like basically the sort of the rules of robots and bots, it all falls within that sort of confines in that framework.
So, and this is actually a really good paper if you think about it, 'cause like can I go automate or have an agent to go book me a flight and have my entire vacation set up with like what to do next.
That agent has really just had the confines of like, oh cool, you're probably going to do a United flight 'cause you're flying out of San Francisco, and you're probably going to want to sleep somewhere.
So like here's a bed, and it's king because you're over six feet.
So like there's like, there's different parameters that you put into place that make it look like it's magic, but it kind of this, if you read the paper, none of this that we're doing is magic. It's just really clever programming.
John: Yeah, it's so fascinating to me. This was the '90s, and they were already thinking about some of these things.
And even the deeper history of like neural networks and machine learning, like there was that whole AI winter when they just assumed that things that we're doing today, like transformers, and neural networks, and some of the underlying algorithms and things, just assumed it wasn't going to be possible until people went and made those breakthroughs.
And then like, you know, in the early 2000s, it was like all you need is, oh gosh, what is that paper from Google?
All you need is context or all you need is, I'm forgetting what it's called, but you know, revolutionary, it created transformers, and now we have GBTs and now we have LLMs.
And that in theory was something that we could have tackled like 10, 15 years ago in theory.
I saw another really fascinating talk, I have to pull it up somewhere or go search the internet for it, but it was something somebody at Microsoft was talking about how they quantized a small LLM down to like one or two bits.
So it was very, very small, like basically a one or the other switch LLM, which aren't very efficient but gets you some of the way there.
Very similar to like a 1 billion parameter model, or one that's been quantized down to four bits, very small.
And they got it to run in a Windows 98 virtual machine, which tells them basically that there was no real technological, at least on the software side, that was preventing this, you know, way back when we were running, you know, Windows and Windows 95 and 98.
But along thought around, yeah, just the history of this is fascinating.
Josh: Yeah.
John: All right, well I will give some Reads here.
So I have three, even though I didn't really put them in the show notes.
The first one, and astute listeners will know, I've mentioned this before. Josh gave a great talk at a KubeCon a couple years ago about deploying Nix Kubernetes.
And essentially it was how he goes about building Kubernetes systems, I think on your home lab. Was this hell or Hades?
Josh: Hades. Named after the video game.
John: Oh, okay, so not not your personal hell?
Josh: No, although it could be.
John: Josh, why don't you give us a real quick, you know, not to put you on the spot, but I'd love a real quick overview of what this talk was and how you think about it even today.
Josh: Yeah, I mean at the end of the day--
Reproducibility in infrastructure is a really important thing to be able to achieve. And it is an amazing ideal to be able to say that I can link everything from the sources'state all the way to the end artifact and be able to rebuild that down the chain all the way.
Especially when you're working with systems at scale, like the smallest little tweak.
Like we have an outage at Reddit that's in our public blog around a small change that happened with IP tables, and just like a character changing somewhere in between two versions of a patch kernel update, right, caused massive outages in degradation.
And I think what's really cool about Nix, whether it's the right approach or not, more so the concept, is that being able to really express thing in ways that you can drive guarantees around reproducibility is a really, really cool concept, right?
So at the end of the day, this talk and the associated blog post is all about what if we use Nix to set up everything from literally our hypervisor on our bare metal, all the way up to our Kubernetes cluster, all the way up to our container images that ran on it.
That being said, I don't think I've advocate anyone uses this tooling to do it ground up today.
I don't quite, I think the bar kind of high to make that successful.
But it's, again, the thought experiment of like what does it look like to truly be able to reproduce everything from the ground up? I think that's a really cool concept.
John: I don't know, I would actually argue that some of the other tooling that gets you there, things like TUF, or Sigstore, or, I mean, oh my gosh when I was at AWS on Bottlerocket, like trying to reach that producibility using Fedora paradigms, and then all the other stuff we bolted on.
I mean, so many times people would just be like, can we just do this in like, you know, normal functional Nix? And the answer was always no.
Josh: And I think we're operating under a bit of an illusion, honestly, of reproducibility.
Like there's so many Docker files out there in the world that people don't realize that every time they run Docker build or whatever the command is.
They're just rolling the dice. And Whatever the app repository has at that given time, like it's just going to be that thing, and they don't even realize until something terrible happens that they might not be able to tell the difference between Docker file A and Docker file B, because it's pretty opaque to them what differences happened in build A versus build B.
So, and there's a lot of tools and things in the ecosystem that help here, layer analysis and so on.
So I'm just more calling out that like Nix is kind of interesting 'cause it's a very pure way to kind of look at it, which is a interesting thought experiment.
John: He said it. I'm sorry, Brian, that this has becoming Nix podcast.
Brian: Yeah, no worries. This is like, the what, third episode in the row, and I'm just over here just twiddling my thumbs.
Josh: Surprisingly, I'm not that like wildly fan boyish about Nix, to be super clear.
I'm just more obsessed with how cool the concept is. And I do use it for my workstation and stuff.
It's fun to play around with, but I don't use it in serious production capacities today.
John: Yeah, yeah, makes sense. My other Read, I'll make the next two very quick.
There was a paper that Apple around WWDC here released on, they call it The Illusion of Thinking, Understanding the Strengths and Limitations of Reasoning Models Via the Lens of Problem Complexity.
And I just thought this thing was so fascinating, like obviously don't understand a ton of the math and the PhD level of going on in here, but they basically like had a bunch of these games, like the "Towers of Hanoi", and you know, various complexities that could measure around these little games that could get reasoning models to play.
And in their, I guess, assertion, reasoning models really aren't reasoning as much as they are just like memorizing and understanding patterns in their underlying neural networks.
Very fascinating. I think I made quite the buzz. So yeah, readers will have that in the show notes.
My other read this week is "Crime and Punishment", which my wife has been getting me to read.
And I actually quite like. It's very dark. Have you read it, anyone?
Brian: I have not read it, no.
Josh: It's on my list of like classics that I want to pick up and read. I have not read it yet.
John: I don't know what it is about this weird specific genre that I really like.
It's like "Wuthering Heights", and this kind of like dark gothic classic literature, and he has all these like inner thoughts.
It just goes on these long rambles about like the meaning of things.
And it's definitely the kind of, you know, man versus God kind of narrative that I really like.
So I don't know what that says about me, but who knows?
Brian: No comment.
Josh: I have two reads for you now.
John: Yes, go.
Josh: All right, so I pulled them off my bookshelf live since you all were talking about like great books and literature that you've read.
So there's two books that I really, really enjoy, and if you haven't checked them out and you work in like, especially like coding and Go, and the distributed system space, I think they're both really good.
Number one is I got a copy of "Communicating Sequential Processes" off of Amazon a couple years ago, and this book is so awesome.
It is largely, as far as I understand it, inspired the concept of Go routines and channels in the Go programming language.
John: Oh.
Josh: And even if you're not a math nerd, like it has like equations and stuff, but nothing that's not over too complex to be approachable for someone with a little bit less of a math background.
John: Yeah.
Josh: So "Communicating Sequential Processes" is my first one.
And then my other one, I think I maybe recommended this book to you at some point, John, "A Philosophy of Software Design".
John: Oh yeah.
Josh: John Ousterhout, probably saying it wrong. Really good book.
I think you can tell John spent a little bit of time drinking the Go Kool-Aid, but if you're like, you know, newer to programming, or even if you've been programming for a while, I think it's a very pragmatic approach instead of ideals around what it means to design software.
So those are my two.
John: Amazing, I'm definitely going to check that out. I love a good philosophical software book.
Josh: Yeah.
John: But Josh, I want to let you go. I know you're short on time.
Thank you so much for spending this time with us.
And remember, listeners, stay ready.
Content from the Library
Generationship Ep. #41, Covenant with Suzanne EL-Moursi
In episode 41 of Generationship, Brighthive CEO Suzanne EL-Moursi joins Rachel Chalmers to unpack the “three-layer cake” of...
Generationship Ep. #40, ExperimentOps with Salma Mayorquin of Remyx AI
In episode 40 of Generationship, Salma Mayorquin of Remyx AI unpacks the shift from traditional MLOps to ExperimentOps—a...
Open Source Ready Ep. #18, Consent Management with Christopher Burns
In episode 18 of Open Source Ready, Brian Douglas and John McBride are joined by Christopher Burns to unpack the complexities of...