- Library
- Podcasts
- The Kubelist Podcast
- Ep. #49, From Containers to Unikernels with Felipe Huici of Unikraft

Ep. #49, From Containers to Unikernels with Felipe Huici of Unikraft
On episode 49 of The Kubelist Podcast, Marc Campbell and Benjie De Groot sit down with Felipe Huici to explore how unikernels are reshaping modern cloud infrastructure. They discuss virtualization, containers, and how Unikraft enables millisecond startup times and massive workload density. The conversation dives deep into performance engineering, Kubernetes integration, and the infrastructure challenges emerging in the AI era.
Felipe Huici is the CEO & Co-Founder of Unikraft and a long-time systems researcher specializing in virtualization, operating systems, and cloud infrastructure. With a background in academic research and deep involvement in the Linux and Xen ecosystems, Felipe focuses on building high-performance, secure, and minimal execution environments for modern cloud workloads.
transcript
Benjie De Groot: All right, welcome back to another episode of The Kubelist Podcast. Today, I'm here with Marc as always and excited to have Felipe Huici from Unikraft here to talk to us about unikernels and some of the work that's been going on at Unikraft.
Welcome Felipe.
Felipe Huici: Thank you Marc and Benjie, it's a pleasure. Thanks for inviting me.
Benjie: Well, just to dive right in here, Felipe, tell us about, so you're the co-founder and CEO of Unikraft. I know that there's an open source component and obviously a company component to that. So maybe just tell us real quick what is Unikraft?
Felipe: Unikraft is a new cloud platform with scalability and efficiency as first class citizens. We think that the way the cloud was built was a little bit patchworky, obviously, super powerful. But in the age of AI where scale, you always add a few zeros to everything that you do, just throwing money at the problem and warm pooling everything and bringing up even more instances all the time is not going to scale forever.
So we're trying to tackle specifically that problem. That's sort of where we ended up. That's not where we started. So at some point during this interview we can rewind a little bit.
Marc Campbell: Yeah, I think we will. And I'm actually really curious to hear that whole journey. And I think that's why we're here to talk.
But I would love to hear kind of your background. How did you get into cloud native open source? Talk a little bit about your journey as an engineer, going through and learning tech.
Felipe: Yeah, sure. So I have a research background actually. So I spent the better part of eight, nine years doing research. Came out of UCL in London with a PhD and then randomly landed in Germany to work for the Japanese company NEC because they had a big research lab of all places in Heidelberg, Germany. And we started doing research on performance topics.
How to do first software packet processing on x86 hardware at 1040 gigs per second, back when that was fast. Some of the precursor work that eventually led into Intel DPDK, if you heard of that framework. And then we started getting into, "okay, that's great, how do we put that in the virtual machine? Because we want isolation and how do we make the virtual machine as lean and fast as possible?"
That became known as network function virtualization eventually. And from there we said, okay, not just software packet processing, but let's take any Linux application, put it in a virtual machine and try to make it as lean as possible.
We're trying to break this myth of virtual machines are big and heavy and chunky and very resource heavy because ultimately virtual machines are the only thing that provide the strong isolation you need in public clouds.
So that's sort of where the journey kind of started.
Marc: And what year was that?
Felipe: The research probably started back in 2009. Then leading up to, we created the open source project that, you know, we'll get into that, but that started in 2018.
Marc: Okay. And so the motivation was VMs are great for security and isolation, but they're slow, they're big in resources and stuff and around the same time, clearly like Docker and containerization and cgroups and everything was coming out.
Do you think that you were complimentary to that, like, parallel to that, solving the problem in a slightly different way? I mean, I know that's going to be a lot of what we talk about here today, but I'm kind of curious, as you were going through this and you were watching the work that Docker was doing in containerization in general, there's clearly overlap in the problem space. Maybe less in isolation on the Docker side, but also, I'd love to hear about the tech. Do you think it was overlapping some of the tech here, too?
Felipe: So the tech does not all overlap. They're both isolation primitives, where one is further up the stack and the other one's-- But I don't want to get too geeky about that, but when Docker came out, it was described back then as the silver bullet. You can get very strong isolation, you can get very lightweight processes. Everything starts up immediately in milliseconds.
So let's throw away virtual machines and everything that came before, because the world is containers, right? Of course, anybody who was a security expert knew that that was false, because as it turned out, you know how many problems and CVEs and things do containers have? So, in fact, in the public cloud these days, if you go and deploy a container, it'll almost always run within a virtual machine underneath for isolation.
But back then, when they first came out, this wasn't the general understanding. And so we, with our research hats on, were writing papers to say, hey, my VM is more secure and lighter than your container, as long as you're clever about what you put in your virtual machine and making it as lightweight as possible. Trying to say that the fact that a VM is chunky is not fundamental, it's just how you built that particular VM.
But what is fundamental is that it'll give you the best possible security isolation that you could have on the public cloud. And containers definitely don't have that. Now, containers are very, very good at build environments, reproducible environments, things like that, which I think is where we landed many years later today, where containers are really good for builds and development, but when you go to production, you use VMs.
Benjie: So, Felipe, wait, let's go back a second. We have a pretty technical audience, and so I think this is a great opportunity. I've actually done a talk on this, but I'd love to hear your perspective.
Let's go back to 2004, 2005, I think that's when we added the ability to add hypervisors at the CPU level. Let's talk about why virtual machines, our hardware isolation versus like containers, which is like cgroups and all this other stuff. Because I believe it was like 2005, 2006 when they added stuff to CPUs and Intel did this?
Felipe: Yeah. So arguably started a little bit before, but certainly the sort of the first hypervisor was the Xen hypervisor, arguably. There's been hypervisor like technologies that predate all of that stuff. But what we call it Hypervisor. Right.
Benjie: Well we'll leave Sun Microsystems out of this for now. Maybe we'll go back and touch on that. But post Sun, talk to us about virtual machines.
Felipe: Yeah, so the Xen hypervisor came along and yes, the CPU started having all of these virtualization hardware primitives to properly isolate the memory of eventually the different virtual machines that would come on top.
And one property of a hypervisor, obviously, is it runs on the bare metal. Basically the machine starts and it takes over. But also hypervisors tend to have a relatively thin trust compute-base and generally speaking in security, the more code that you're running and you're sharing, the bigger the chances of having a vulnerability in there.
So the hypervisor not only leverages some of these virtualization, harder features of, well I would say modern CPUs, but they've been around for a long time. But in addition it provides the thinnest possible layer that's going to give those memory CPU primitives that the virtual machines on top need to actually run.
Whereas a container, it'll group an entire Linux kernel roughly speaking and will have that be shared across all the containers running on top. So the layer that all the containers are sitting on is much, much bigger than the layer that a virtual machine will be sitting on underneath.
And then of course the hypervisors are using these virtualization hardware primitives to make sure that you cannot, from one VM, go over to the next one and sort of break that isolation boundary.
Benjie: Right. And I just looked it up. VMware 1.0 was 1999, so even earlier than I said. And so this hardware layer for virtualization, the hypervisor layer, that becomes kind of mainstream 2005, 2006 with like VTX support. Right? T hat's when it starts, I want to say. Is that right?
Felipe: Yeah. And arguably also with AWS adopting the Xen hypervisor and public services and multi-tenancy and the explosion of the cloud, basically based on that primitive.
Benjie: Right. And for those that don't know, I want to say 2006 is when AWS really started. I think it was like S3 was the first product for years. But there were other data centers, there was like Linode and Rackspace and all these other ones. And I don't think they were leveraging hypervisors too much, but I bet they were doing it before AWS was.
So this hardware isolation layer. So at the CPU level, just to be crystal clear about this, there is slicing and dicing of system resources for bare metal machines at the hypervisor level. And that gives you this essentially hardware firewall between services. Now there has been over the years various CVEs around virtualization, but comparatively speaking to containers, it's almost nil.
And so you're in school and doing a doctorate and in 2009 you start working at NEC. How did you start thinking about, why did you start thinking about hypervisor? Why did you start thinking about the virtual machine issues?
Felipe: Yeah, it came about sort of a roundabout way. I was at UCL and UCL has very close links to Cambridge. So some of the Cambridge professors hang out at UCL pubs in London and vice versa. And Cambridge was sort of the birthplace of the Xen hypervisor. And so we were regularly in touch with the Xen community, which is having these events called the Xen Summits that still exist to this day.
And we were starting to get involved in that and we were starting to say, okay, we had a research project where we wanted to build these sort of tiny virtualized routers. And the Xen community said, hey look, we have this sort of tiny reference virtualized operating system that runs on top of Xen. You could grab it and you could slap your software packet processing application on top, mash it together and then that's a very tiny, very lean, very fast virtual machine that's kind of running on top of the Xen hypervisor.
And that specialized virtual machine eventually became known as a unikernel, meaning you have the strong primitive of a virtual machine because it is a virtual machine, but inside it it's really, really small, tiny, and lean. So it's almost like all the nice things that come with containers about lightweightedness, but without any of the downsides in terms of security problems.
Marc: So let's jump all the way to the present right now. And so now you're the co-founder and CEO of Unikraft and you're creating a company, I would assume here, around the open source unikernel product a little bit. Talk a little bit about what Unikraft does as a business.
Felipe: Yeah, so Unikraft presents a cloud platform that you can install either on bare metal servers or EC2 instances. And what that allows you to do is any workload which you will define by a Docker file, so you can run anything, will cold start in about 10 milliseconds or so. So almost instantly.
And because that's the case, obviously it means that whenever that workload is not doing any useful work, not getting any requests, you can put it to sleep. Take a web server as a dumb example, right? It'll pop up. As long as requests are coming, it'll answer. When there's no request, it'll go to sleep. But because it can start so quickly, what we can do is we can start it up again just in time.
So we'll wait for the next request to come in, we'll buffer the request, we'll wake the now sleeping virtual machine up, we'll release that request, it'll now go to the virtual machine that's now up and running, and then it'll answer and that entire loop will happen within under 10 milliseconds. So that users of the platform think everything is running all the time, but in fact you get a lot of efficiencies from the fact that a lot of the workloads most of the time are idle, so you'll just immediately put them to sleep.
And then what that leads to is if you have 1 million users as your user base, but only 100 active, now you can dimension your infra for those 100 active ones, and the other 1 million you can sort of bin pack onto just a few servers.
So you can imagine, instead of having a row of racks of servers, just a few servers will do the job. So to the outside world it looks like you have massive infra, but under the hood you don't need that much power to run your service.
Marc: So would the idea then be to tie this into like, okay, I want to run it on AWS on EC2. So I tie it into an auto scaling group and I have like kind of two different signals because auto scaling groups on AWS take a little bit longer to spin up. So they're like, there's a couple kind of ready. Because then you can actually spin up the workloads within 10 milliseconds as the bursting of traffic starts coming in.
Felipe: Yeah, that's right. So you could have some sort of base one or two or three EC2 instances, and within that you get a big, big multiplier. Right? Because you can bin pack a lot of things, as I was saying. But if you happen to get a massive spike that totally overruns even that big multiplier, yeah, within, let's say, 10, 20 seconds, you can spin up a fleet of more EC2 instances with a platform on it, and then you get orders of magnitude more capacity in a few seconds in that case, where you get a massive, unexpected spike.
Marc: And you said that I can define that workload basically in a standard Docker file? It's not like a custom proprietary format or anything right now?
Felipe: No, they're literally Docker files. Yes. And what we do is, of course, as I said, it doesn't make sense to deploy a container when you go to deploy it, because you'd be putting a container inside a VM, and so that's inefficient. So what we do for the Docker file is we treat it as a recipe.
We have a look at it to extract the file system, the binary that you want to run, and then we automatically turn that into a file system and a virtual machine only in OCI format. So to the outside world, it looks and cracks and breathes like a container, or in Kubernetes world, like a pod, but under the hood, it's actually a microVM.
Marc: So you said pod, and maybe this question makes no sense at all. Forgive me, but most of us run Kubernetes to actually schedule and orchestrate all of our containers. So my Kubernetes might have ContainerD or CRI-O or something like this as the runtime. Does this interface with, like, I have EKS to run everything, and I have, you know, auto scaling there for the node groups. I can actually then run the Unikraft technology inside that EKS cluster to have it dynamically create pods and scale up deployments?
Felipe: Yeah, that's right. So we have an integration that's done by a kubelet, which basically makes Unikraft, the platform, pop up as a node. As far as the cluster is concerned, there's just one more node. It just happens that that node can maybe put 100,000 pods on that single node.
And the one thing that we struggled with for a while was obviously the scheduler in Kubernetes is not millisecond scale. Right? But our platform is millisecond scale, and we didn't want to lose those semantics. So what we ended up doing is you go and deploy pods, and that takes however long it takes. But thereafter, these semantics of scale to zero, where our instances are coming up and down whenever there's no traffic going to them, which happens in milliseconds.
As far as Kubernetes is concerned, the kubelet will report that everything is available and running, because in effect it is. If you send a request, sure, we'll wake it up in time, but it'll reply. Right.
And so what that allows us to do is out of loop from Kubernetes, things will have these millisecond semantics. So you get the best from both worlds, the millisecond semantics of the platform, but in a Kubernetes transparent world.
Marc: That's cool. The first part of that description reminded me of the virtual kubelet, which was like the similar kind of concept about, like, report unlimited capacity and it can schedule anything. But then you're taking it one step further. And when you spin down a pod, you let that kubelet still continue to act like that pod's available, so Kubernetes thinks that can route traffic there. Like DNS, everything there is like, we're good. Like, continue to route traffic there and it's oblivious to the fact that it's actually missing at the other end.
Felipe: Pretty much. Nothing to see here. If you went to the dashboard, it's all green lights. But then if you went to the metrics that the platform itself is putting out under the hood, you would see these dynamics of things going to standby and waking up and going to scale to zero and waking up, et cetera.
Marc: Cool. Yeah. I mean, this makes sense. I don't know why everybody doesn't use this in EKS right now. We're just wasting money on these static node groups, right?
Felipe: Yeah. I mean, we're not very good at awareness, I'll give you that. So it's our fault, not yours.
Marc: No, I mean, this is actually cool. Is that kubelet, the Kubernetes integration, where you see a lot of the current usage and adoption? Or just for bursting SaaS is kind of the use case that you're building for right now?
Felipe: Yes, it's a good question. It's not where we started, because the Kubernetes integration is something that we developed last year, so we didn't have it to start with. And basically the platform itself exposes a REST API and there's various ways to go at it, there's a Go SDK, et cetera. But then the kubernetes stuff came on top. So it's not where we started.
Where we started was with a Kubernetes-less world and then use cases ranging from people who need massive amounts of Postgres databases. You can think millions of them, but you know, it'd be nice if you could scale them to zero and run them on little infra. So Prisma was our first customer all the way up to anything that's functions obviously where you have ephemerality, large scale, et cetera.
And then when the AI wave started to take over, a lot of those same principles apply. Because in the AI world, take agents for instance, or code generated by agents: huge scale, things pop up, you don't know when they'll just run for a little bit and then go back down.
But you don't want warm pools of everything to run tens of millions of agents when they're not always running. And another one has been headless browsers. Like same thing, big chunky monsters that agents need all the time. And because each consumes 8 to 16 gigs of memory at least if they have to stay running the whole time because they take too long to start, there's only so many you can fit on a server.
So very quickly having a fleet of headless browsers gets really, really expensive. So there's a whole world out there that's living on burning through a lot of money and warm pools, etc. Which I fully, fully get why that is so. But that's the problem we're trying to tackle.
Benjie: So with these unikernels it's basically, okay, I have seemingly infinite number of nodes and pods and resources for my Kubernetes cluster. I install this into bare metal machines or EC2 machines? Like what's the requirement to have Unikraft on a machine? Like what do I need to do there?
Felipe: Yeah, so metal, we don't get support ARM just because we haven't cross compiled more or less. So if you're on bare metal it has to be x86 Intel AMD. We don't care which provider. Right? It could be a metal instance from AWS, could be Hertzner, it could be Vultr, anything will do it.
The server is a server as a server or a virtual machine, in which case it runs under essentially nested virtualization where you have the virtual machine below us and our platform on top and the virtual machines on top of that. And we had to do a lot of tricky engineering to make sure that those layers do not overly affect the performance.
So where on bare metal, things will start in 10 milliseconds, maybe in EC2, maybe things will start in 30, 40, 50 milliseconds, but still reasonable.
Benjie: Is there any special kernel that I need to have for my EC2 instances to try this? Do I need Ubuntu? What's the underlying OS need to be?
Felipe: Yeah, we use a Debian distribution. Some customers ask for Ubuntu. That's also something that we can provide. So there's nothing special, the kernel, I think V6.15 or one of the newer ones, so that's relatively standard. Obviously KVM is enabled, that's the hypervisor.
Benjie: So KVM is basically the requirement for this?
Felipe: KVM is basically the requirement and then for what's called the virtual machine monitor-- So the user space, let's say agent that talks to the hypervisor and says, please create this VM list, VMs, et cetera. We use Firecracker, although we have a pretty big fork of it by now.
Marc: Where's the line between open source and commercial? How much of this can I just get to the website, download the thing from GitHub and run it and like, where would I have to talk to you and pay you money?
Felipe: Sure, yeah. So the open source part is the unikernel part. So the images that are small and tiny, that's all open source and you can run them on your local laptop on top of QEMU and Firecracker and various things. Also all of the tooling of the platform is open source, from the cli to the SDKs, all that stuff is also open source.
But if you wanted to deploy on our platform or even on your own sort of AWS, EKS or ECS or EC2, whatever it is, then you have to come to us. You could create an AMI out of the unikernel OSS and it would run.
The thing that you'll soon discover, which is something we discovered back when we started the startup, is if you just take a unikernel as an AMI and you dump it on AWS and you start it, it'll still take 10 seconds to start because it's not just about the image, it's the control plane, it's the proxies in between, it's all the other infra.
So when I say we take a bare metal box EC2 instance and we install our platform, what I mean by the platform is we actually had to apply the same principles we applied to Unikernels to all of these other components. So we actually built the controller from scratch that would also be millisecond ready, that could scale to lots and lots of instances on a box.
The proxy had to be reactive. Firecracker we took because it's built for scale, but we had to enhance its functionality, et cetera, et cetera. And it's only when all those components come together that the platform has these sort of dynamics that I was mentioning.
Marc: That makes sense. I mean, look, I think it's great the tech is open source, you can look at it. But like when you get to the place where you want zero performance degradation and massive cost savings on your infrastructure. Yeah, you should get paid a little bit for that work that you're actually providing there. I get that. It makes sense.
Felipe: Yeah. And frankly, when you go production on infra, it's good that there's a commercial entity there, just in case, right?
Marc: Oh, for sure. Is unikernel just-- What's the license on it? Is it part of any foundation?
Felipe: It's a BSD license, so pretty easy to use commercially or otherwise. And it's a Linux foundation project.
Marc: Okay, cool.
Benjie: So the stuff that I'm buying from you, you talked about the control plane, you talked about the proxies, like you made the point earlier, you said, "hey, this kubelet thinks that the pod is still there." So obviously you're doing some level of caching proxy stuff there for requests going to these various pods. That stuff is commercial. And that's what I'm paying you for, essentially?
Felipe: Yeah, that's correct.
Benjie: Okay.
Felipe: And basically that's the controller that does large scale, fast VM life cycle management that knows about it. And also the system is fairly heavily based on what's called snapshots, which is essentially you take whenever the running VM is about to go to sleep and scale to zero, we take a memory shot, you take a picture of the memory and you save that. And what that allows you to do is when it wakes back up, you can wake it up from the exact same state it left. In other words, the virtual machine won't even know it went to sleep.
Marc: So like stateful loads, everything is totally fine in this.
Felipe: Exactly.
Benjie: Right. You mentioned Postgres was a big use case for you guys. Did I hear you say thousands of Postgresses on one machine? Did I hear you say that?
Felipe: Yeah. So one of our clients, I think they're running upwards of like 50,000 scale to zero Postgres databases on the box. They're always pushing us for more and more and more because obviously depending on again, this active to inactive user ratio that you have in your user base, and usually the bigger you get, the better it gets. You have more inactive users to active users, the more you can take advantage of these skills is your dynamics and bin pack more and more.
So I think yesterday we were having one of our field engineers benchmark an internal box because we removed a bunch of bottlenecks and we no longer knew where this density limit was. And he got it to like one point something million, scale to zero virtual machines in a box. So that's something that we should probably do a write-up about.
Benjie: You can't see this, folks, but my chin just dropped. One point something million? That's ridiculous. I mean, the inefficiencies in cloud are something that, you know, I work on all the time and so does Marc. So this is really powerful then. A little too good to be true, sounds to me. So I'm skeptical.
So tell me, what did you write all this in? What is this? Is this all. I'm assuming there's some hardcore C. You're pretty cool, so there's some Rust, I'm guessing. Tell us about whatever is appropriate about like what the actual tech is underneath here or what you wrote this in, what some of the challenges were.
Felipe: Sure. It's not a state secret. And I was going to have you guess, but you guessed. It's written in C. And of course, once we're outside operating system circles, people are like, why did you write it in C when there's Rust out there? And our answer usually is because if you really, really know what you're doing, big asterisks, C will go faster and you have more control when you're doing C.
But of course you got to be very careful with what you're doing. Right? So if you don't know what you're doing, by far, go with Rust. And Rust, obviously we also do because Firecracker is within Rust and we have a big fork of it. So our developers, our core developers are C people plus Rust people. And then all the tooling around it, all the integrations, the integration with the cli, the integration of Kubernetes, all that stuff is Go because the CNCF landscape world is written in Go.
Marc: How many of those core developers do you have? How do you go about finding these folks passionate to work on, I mean, there's probably a ton of them, right, that want to work on C. But how do you stay connected into that group?
Felipe: Yeah, so that's a good question. It has to do with our research roots and the fact that we have a lot of, let's call them tendrils to the top technical universities in Europe. And so we kept that alive. We do hackathons things of that sort. Some of the professors know us and at some point it becomes a sort of domino effect, let's call it, of people start to know us and sort of learn about us.
Also, the fact that we're pushing the boundary of engineering, of what's possible with adding a zero to whatever's possible, that attracts a certain profile of people. But yeah, I agree. Obviously the funnel of how many C programmers, low-level C programmers are out there, is not that big.
Marc: Right? And you know, you see over the past couple of years, a lot of engineering teams are moving to Cursor, to Claude Code and Codegen for everything. It's one thing when you're writing a SaaS application in TypeScript, probably a totally different world when you're at this performance level and you're like using these languages and this tech, like, are you finding that you're able to like use AI Codegen a lot or not yet, really successfully?
Felipe: I love the question because it's something we debate internally and externally, regularly. And regularly also because the goalposts move so fast.
What was true two months ago is not true now.
And so obviously our natural instinct is we have a lot of low system programmers who may be reticent to use that because it may introduce a low level race bug that is really hard to figure out and then it'll affect Infra, which is pretty bad. So generally speaking, we had done internal surveys and the programmers who were further at the top, field engineers and the tooling team that does everything in Go, they were using it more often.
And as you went down the stack less and less and less, partly because of what I said, about potential bugs and risk, et cetera, but also because the amount of code out there you can train on at that low level is definitely less than at the top of the funnel.
However, definitely with Claude 4, 4.5 and the stuff that came out like within a month, our head of low level engineering is always checking that out. And like very recent, like this week, he started saying, "guys, there was a bug that I've been chasing. I threw Claude at it and for the first time it figured out what it kind of was."
So we're not at the level where we're going to tell it, "code this for us." But if something is a little bit broken, you know, can you debug this? Can you find it? Yeah, that's starting to happen even at that low level.
Marc: But you see the advances in the models and the tooling around it getting, like it's getting deeper and deeper and more capable for even the stack that you're working with.
Felipe: Yeah, for sure. And even for us, it's helping us debug. Can you imagine what it does at the top? Because what we're doing is not only low level, it's very performance critical. So it being functional is only half the equation. And arguably the really tough part is it being functional and being very high performance.
And that's where vibe coding this would be really hard, too.
Marc: Yeah, that makes sense. I mean, sometimes vibe coding or AI can generate massively inefficient code and we just are like, "you know whatever. I'll throw a couple more nodes in my Kubernetes cluster and be done with it. It's better than me spending the time," but we're counting on your tech to make it so I don't have to actually spend that extra infrastructure costs.
And so if you're doing the same thing at the other end, it's not going to work out.
Felipe: Yeah, it's also good--
Benjie: Well, okay, I have to interject here, guys. First off, it is January 28, 2026. We're recording this because people listen to this who knows when. So I think you just Referred to Opus 4.5. I guess that came out like a month or two ago now. So this is a very different conversation, I'm sure, in six months. So that's the first thing. I just want everyone to know that.
The other thing I was going to say here that's interesting. I had a conversation recently about builders versus artisans, basically, and I think that this is a really good example of that, where some people just want to, like you said, Marc, "hey, I want this functionality. I don't care what looks like underneath the hood."
And then there's the people that, like, want to craft this beautiful code. I would make the argument that 97% of products these days probably don't need the artisanship, but this one does. So this is really cool. You guys actually do. And it's interesting that you're saying that, at least at this moment, from a debugging perspective, there is real value add there.
I would think in my head that C is one of the oldest languages, not the oldest, but it's the grandfather. It's definitely older than Rust and that there would be a decent corpus of at least open source C code out there. But I wonder, like, it feels like the Anthropics of the world would have to train specifically on code with limited comments and like I have no idea why someone in 1972 used this like Assembly hack to do this memory efficient thing.
Like, but, in the back of my brain there's like, "well maybe actually it would be good at C because this is the foundation of so many things." But like you said, it comes down to functionality is not the hard part of what you guys are doing at Unikraft. It's performance.
Felipe: Yeah, and let me put an asterisk on what I said. It has a big corpus of C code and if all you care about is functionality, for sure. What we're doing is low level C programming with a very specific performance bend. And it's a lot of low level type, operating system type, kernel virtualization, stack programming.
And when you start scoping it down to that, there aren't that of course the Linux kernel, the FreeBSD kernel, the Xen hypervisor, Firecracker. I'm not saying there's not anything out there, but comparatively speaking, if you go up the stack and you say how many GitHub projects are there built on top of NodeJS or JavaScript or whatever, obviously we're talking orders of magnitude difference.
But even with that, my point is it's not at the level where it can code a core feature that we need to code, but it's already helping us debug and do some adjacent functions, certainly unit tests and things of that sort.
Benjie: Yeah, it's really interesting. It's really interesting. Again, I'd love to have you back on here in six months and be like, how about now? We'll see.
Okay, so Unikraft is ridiculously high performance. I think the other thing that Marc said a second ago that's really interesting is you're enabling the rest of us to write slop. So thank you.
Marc: I avoided the word "slop" when I said it. I almost said it, but I didn't actually say the word slop.
Benjie: I mean, it's slop.
Marc: Yeah, okay, fair.
Benjie: But I mean this is hopefully a way to real efficiencies here where the environmental impact, let alone just the cost impact, would be mitigated by this. So there's a lot of angles here of why this is really interesting, what you guys are doing.
What would you say the top three challenges from an application perspective are right now that you guys are like trying to figure out?
Felipe: Yeah, some of the things that were tough were okay, we need these millisecond end to end semantics so you can spin things up and down transparently so you can bin pack, you can get density, et cetera, et cetera. And so we had to go component by component and optimize each of them.
And once we were done, we started obviously discovering that the applications themselves sometimes take a while to start. You know, Node or Java or whatever takes a second to start, two seconds to start or even a build environment. You need to download things before it starts.
And obviously it would have been an impossible task to optimize the applications themselves. Right? And you can ask application developers, please, please, please be nice and don't initialize 16 gigs of memory just because you might need it, but then you don't use it. Right?
And there's literally lots of fun code that kind of does that. Allocate 16 gigs and then see what I do with it. So the fact that we started turning the entire platform into a snapshot-based system helped a lot with that, because what you can do is out of a CI/CD pipeline, let things initialize, snapshot it, put it to sleep, and then make it available to requests in the pre-initialized stage.
So that was a bit of a challenge. The Kubernetes challenge we talked about, how do you marry these sort of second semantics of Kubernetes with the platform's millisecond semantics? Kubernetes, we really want the API, we really want the transparency, but we want to keep the millisecond semantics and value adds. That was also a bit of a challenge to find this middle road I was talking about earlier.
And then there's a lot of important features that are related to snapshotting because if you can now do snapshots, you can not only wake up and go to sleep and wake up from the same state, but you can do things like if you're happy with the state, you can turn it into a template and launch a bunch of instances from that pre-initialized state. A lot of clients have run times that they like and then they launch from that.
We have a couple of clients who wanted to have a fork feature where just literally like processes, but VM forking. So you could fork a database where you could do it transparently without having to do application level forking of anything. The VM will fork, it'll have children. After the children are done running, you can reap them, et cetera.
And a client is even applying it to headless browsers because they'll just launch a bunch of in parallel and whichever one finishes first, they'll then reap the other ones, right? So there's a lot of advanced snapshot based functionality that comes into play, including continuous snapshots. You can have an instance and you can say checkpoint to checkpoint, checkpoint, rewind please. So it's a way to save states.
So a lot of these primitives are very, very useful to many, many verticals when you have them.
Marc: It's really interesting because, you know, play back the story here. Like I'm kind of connecting it all and it's starting to make sense. Like you initially started when containers out and the value prop that you were going for, that you saw, that made a ton of sense at the time was like the security model, right?
Like containers didn't have a strong isolation and security. And so you built unikernels around that concept and you were able to deliver that. You didn't drop that, right? Like what you're doing still has that isolation, but the value prop today has changed to scalability and burstability, which is like not at all the value prop you started on.
I'm curious, did you see that back then or is it just like the evolution, like things like Kata containers and Firecracker kind of made more isolation accessible maybe? And then you realized that this is the problem that nobody's thinking of?
Felipe: Not fully.
I think the inkling back then was fundamentally when you put security next to performance, it seems that they are at odds and you cannot have one, you have to take one or the other, right? And we were trying to say, look, if you're smart about how you build these things, you can have the security and you can get the performance.
And then eventually that slowly turned into okay, if I look at the cloud, is it as efficient as it could be? But no, we didn't know back then that we get into the rabbit hole of redesigning controllers and all these things, right? So that came later.
And as for value add, yeah, security is still there and it's fundamental, but it's a checkbox, right? You need it, but it doesn't give you value add because you have to have it right, it's like saying, hey, look at my amazing new car. It has four wheels, right? Who cares?
Marc: Yeah, the world has changed now. It's like no CVEs, strong isolation. There's not a SaaS provider in the world that's not running some customer generated code or agentic workflows in a production environment. And if you're running that in your Kubernetes cluster, just running ContainerD as the CRI next to the workload, forget about scale, like that's a problem. You should not have that. Security is a requirement across the board now.
Felipe: Yeah, 100%. And maybe that wasn't so evident back then, but I think now, these days, talk to a cloud engineer, hopefully they sort of understand that. Although there's a lot of people who just say, oh, I'm just using containers and you ask them, what service are you using? In fact it's an AWS, ECS or whatever service that has a virtual machine under the hood transparently. So a lot of that is also going on there.
Marc: Cool. If I have dynamic workloads that I want to be able to run, or like I'm going to go zero to one on something brand new today, I have no infrastructure in place, I haven't even started writing code yet. Would you advise and encourage the Kubernetes control plane and that kubelet extension that you had? Or is there a different system architecture that you find like easier, where it's literally just running containers and you don't have kubelet and Kubernetes and all this technology to deal with?
Felipe: It's a good question. It depends. So first I would ask, do you know something about Kubernetes? And if they say, yeah, I'm familiar with it, or I know how to do the basics, then maybe that's like the easiest point of friction in terms of getting started.
I mean you see this in a lot of YC batch, out of the batch, startups. If you do a bit of a survey, a lot of them are using Kubernetes out of the box because you can get started quickly and you don't need to think about infra. So yeah, adding a node there, that would be the easiest thing.
Although our platform is not 100% married to Kubernetes, it's like an additional integration so you could run it somewhere else. And especially if you're looking at really large scale, Kubernetes control plane, it has limitations depending on who you are somewhere in the range of 100 to 150,000 pods in the cluster, depending on what if it's EKS or GKE or whatever.
So now if we're trying to plug our platform, trying to run several hundreds of thousands of things in a single box, that's going to break the Kubernetes control plane. So we're playing with how do you work around some of those intrinsic limitations of the Kubernetes control plane too? But yeah, to your question, yes.
I would tell, especially an early startup in the AI world, it used to be the case: I'm an early startup, I don't need to worry about scale, it'll come later. But if you're in the AI space, you could be a startup today just out of YC and you could be hitting scale two months later. Right?
So to all of those companies, I would say, you know, maybe your value add is not infra, but you need to think about scale in infra from day zero because I hope it for you, you might hit scale very fast.
Marc: Yeah. That's cool. earlier Benjie asked, what were the challenges when you were building it? I'm actually curious what you're able to share about what you're thinking about next as you're continuing to build it. Are you just really focused on, you know, making performance faster and faster and scalability? Or are there some new product directions and new product features that you want to talk about?
Felipe: Yeah, for sure. So a lot of it has to do with efficiency, scalability, density, which is the term we use to how many of these things can you put in the box? Because as all of this gets bigger is, are you still able to cold start? So imagine you have 1 million instances in the box. Can you start any of them in 10 milliseconds still? What happens if you get a burst where you need to wake up 100 of them at once within a very short time window. Is that feasible? Right?
So all of these performance challenges such that the box feels the same whether you have a single instance or you have 100,000 of them, that's something that we're always improving. And also as we started scaling the density of it, obviously the controller by design was built such that it could scale to large numbers, but it's running on KVM and it's running on the Linux kernel.
So you start encountering a lot of bottlenecks because the user space in the kernel of Linux wasn't meant to have maybe 100,000 processes running. Right? And Some of the things that we were doing is having a TAP device for communication for each of these. And you start hitting limits of how many TAP devices you can do. Or even funny things like if you have Tailscale on the box, Tailscale will scan through all the TAP devices and sort of crash. Because, you know, we potentially have 10,000 tap devices.
So there's a lot of things that, when you push, you just, there's a lot of discoveries and things that you need to get around to eventually get to these very large numbers.
Benjie: So speaking of which, you're snapshotting this stuff. So let's say I'm running a Redis instance. Okay? So there's like, it's not writing to disk, it just has stuff in memory. It's got my queue in memory. And let's say it's running, it's using 4 gigabytes of RAM. There's like physical limitations to how quickly you can snapshot.
And then also if you need to load that snapshot, how can you get 4 gigabytes into memory in 10 milliseconds? Isn't that physically impossible?
Felipe: Yeah. So on the going to sleep thing, in fact, that's less sort of time critical because if it's going to sleep, nobody's waiting for has it gone to sleep yet? Has it gone to sleep? So if that takes you a bit longer, that is okay. So writing down to disk, you can do that.
I mean, you may sort of then retort, yeah, but what happens if as you're going to sleep and you haven't finished the operation, a new request comes in and you need to wake it up again? And we have a mechanism that makes sure that, okay, forget that going to sleep, let's bring it back up and sort of, it's back up. There's mechanisms like that.
But then going from a standby state to back to up, which is what you're saying, you know, if I have four gigs, there's a lot of mechanisms that are basically copy and writing things where we make sure that only the memory that's needed for the VM to actually wake up and start answering things happens very quickly and then the rest is being pulled in on demand, so to speak.
Benjie: Okay, so basically there's like a Redis service that's running in my Unikraft kernel. And so you guys are like, okay, let's make sure that that's going. But like, how do you know? Is that service specific or is that just a generic?
Felipe: That's generic. It has to do with resident memory and a lot of low level tricks so that, you know, the controller and the Firecracker fork that looks into this stuff does not know what the application actually is. It does it transparently. It's from the outside.
Benjie: Yeah, we do some cool snapshotting stuff at Shipyard, but nothing like that. So that's really cool. And that's all part of the open source Unikraft.org?
Felipe: No, all this snapshot stuff is part of the platform, which is the commercial part. Yeah.
Benjie: Aha. Ah, ah. Very smart. I like it. And then, like you said, you just have an interrupt, basically, if it's writing all this stuff to disk so that you can turn it back on very quickly.
Felipe: Yeah. You can abort that. Yes.
Benjie: So let's talk about, again, I want to talk about how you-- And I think I have an idea of how this works. But let's say that you're bin packing the heck out of some bare metal box and you've got, let's just say it's a thousand, because that to me is crazy, but you can do a million. But let's say it's a thousand. Okay. Let's just say each one of those things running has 2 gigabytes of memory in it.
Felipe: Yep.
Benjie: Or they're using two gigabytes of memory and you do get a hundred requests to turn this on and this machine, okay, I'm doing stupid math here, let's pretend the machine has 100 gigabytes RAM. Whatever. You obviously can't turn all of those unikernels on in the same box.
So do you guys auto spin up another node or how does that work when you get past the-- Like if all of the stuff that needs to be on is past the physical limits of the actual machine that it's running on?
Felipe: Yeah. Then we magically generate DIMMs and we automagically plug them into the box. Right? No. Yeah. So obviously we cannot go beyond on the hardware limits. There's two things you can do.
The platform can queue requests if it has thresholds where it would go over some memory limits. So you can queue that up, but that's going to add delay. So you're trading off the ability not to have to expand horizontally versus delay. But then, yes, what you would do, what the platform would do, would trigger the creation of, let's say, multiple EC2 instances with the AMI of our platform, such that essentially you would expand your cluster horizontally in real time, yes.
Benjie: So if you're running on Azure, it's going to take seven minutes. If you're running AWS, it'll take off 40 seconds is the point.
Felipe: To be honest, I'm not sure about Azure.
Benjie: Trust me, trust me. That's me being generous to Azure.
Felipe: Haha. Okay, fair.
Benjie: Okay. So the Unikraft control plane also has access to my infrastructure to turn on new nodes if need be. This is a kind of dumb question, but what happens if you're just out of resources? It's just all kind of like queues?
Felipe: Yeah, it would just be queues and then some KPI would suffer, whether it's latency or whatever it might be. Obviously you don't want to run it all the way up to the maximum amount of memory because then even the control and other things would suffer. So there's protections against that.
Benjie: Right. There can't be a queue if there's no memory to put stuff in the queue.
Felipe: Yeah, but the controller in those components are fairly memory friendly and CPU friendly. So even when you have a lot of those VMs, I think the metadata for the VMs is in the order of maybe for 1 million, you're consuming a couple hundred megs of memory for the controller, a significant portion of what the server is using. Right? Of course, it's scale to zero. Right.
Benjie: So a tenth of what my Gmail tab on my browser is using.
Felipe: Oh, yeah, yeah. So, gosh. Funny you mentioned that. I do enjoy myself going through the tabs from time to time and sort of be amazed at how much memory is consumed by a browser.
Benjie: Yeah, no, that's super cool. You guys using eBPFs for anything?
Felipe: We're using some eBPF, Yes, at the proxy level for sniffing out the traffic that's coming in in real time and for communicating that information up to the controller and asking the controller, "hey, I'm seeing this traffic. Do you know what it is? Are you supposed to break something up or not?"
Benjie: Right, right. So that's like part of the kubelet triggering mechanism.
Felipe: It's part of the platform. And you can imagine in this case the platform being one server box with the software installed. Yes, it's part of that.
Benjie: Right, okay. So on the networking side. Yeah, that's a good question. I think it's a good question. Networking. Networking can be tough when you're starting to put a lot of stuff on one box with limited stuff here.
Were there a lot of challenges around that? Like, I just imagine, you know, a thousand services or a million services have one IP kind of, I guess. I mean, they're virtual machines inside there. But talk to me a little bit about how you solve all the networking, subnets, all that stuff.
Felipe: Yeah, so, of course, if you own the box and you have an IP address range, we can assign that to the server and you can have multiple IP ranges. But by default, what each of the instances will have will be a DNS name and then we'll use DSNI and the HTTPs to sort of demultiplex onto all of those so that if you only have one IP address, you can use that single IP address that's usually like the default node.
Benjie: Okay, another quick question. Noisy neighbor problems. Does that ever become a thing?
Felipe: So, I mean, noisy neighbor problems is an active instance problem. So generally speaking, because they're VMs, you can set CPU limits as to how much each of this is supposed to use. So up to a limit. Obviously, if you're starting to get close to 100% of all the CPUs on the system, things will start to degrade.
Anything that's scaled to zero is literally consuming 0% CPU and 0% memory. So none of that is going to contribute to the noisy neighbor effect.
Benjie: But if I have two active nginx ingresses that are just getting DDOSed or something, that can affect each other on the same box, I'm guessing.
Felipe: Well, you can put CPU limits on each of those two VMs so that they don't overly impact each other or anything else on the box. And also obviously the core components like the controller are also sort of performance protected as well. Right?
Benjie: Right, right, right. So really the bottleneck there would probably be the actual requests coming through the controller rather than the virtual machines doing things?
Felipe: Yeah, exactly. Just like any DDoS, if you're sending 10 gigs of traffic to a 10 gig card, eventually something's going to sort of go. But this is why the internet has Cloudflare and a lot of these services that prevent DDoS. So that that doesn't happen.
Benjie: Yeah, sorry, pretty much the correct answer to any noisy neighbor DDoS is Cloudflare.
So if folks want to get involved and help out, how does the open source side of this project work? Do you guys have monthly meetings? CNCF? Where do we find you guys?
Felipe: The easiest way is to go to Unikraft.org so that's the open source as opposed to Unikraft.com the commercial side, and Unikraft.org has, you can join a Discord server. You could just pop in, say hello maybe give a little profile about who you are and what you want to do.
That's one way. Obviously there's a GitHub repo that goes with it and we do GSoC every year if that's of interest.
Benjie: What is GSoC? I mean, I think I know, but just for everybody else .
Felipe: GSoC is Google Summer of Code, where essentially Google pays you some stipend to participate in open source projects.
So Google will sort of present you with a ton of different projects that might be of interest. You choose some, you apply to some, if you get accepted, then you work for a period of months on that project and then you make open source contributions to that project and it looks pretty good on your resume. And if the project has some commercial entities working around that you may even get employment out of it.
Benjie: Is there any really cool customer deployments that you're at liberty to discuss that are just going to make my jaw drop again? Is there anything cool that you could tell us and if the answer is no, that's totally fine as well.
Felipe: Yeah, so there's names I can give you, but I cannot give you scale. But for instance, Prisma for sure, running millions and millions and tens of millions of databases in as few servers as possible is the name of the game.
And so they built an amazing, amazing Postgres offering based on this technology that allows them to sort of basically have unit economics that nobody else has.
Benjie: Right, right. So that's a cool one. So Prisma is using you guys and doing some really cool stuff here.
If I wanted to just play with this, but I don't want to pay for the control plane, what would you say is the right way to get started?
Felipe: So if you go to our hosted platform, there's a free tier. So if you go Unikraft.com and you click on the button, sign up or whatever, that's totally free and that'll allow you to specify a Docker file, drop some things in the cloud and so forth.
On the Kubernetes path, the best thing is that you sort of drop us a line and we can happily set something up for free as a trial. We don't tend to charge for any POCs or anything like that. So take your control thing, we install something on there and then you can play with it.
Benjie: Amazing. So by the way, we kind of skipped over this, but you are a commercial entity. Unikraft.com is a commercial entity. When did you turn that into a commercial entity? How did the company side of this thing start?
Felipe: Yeah, so we were part of this NEC research lab in Heidelberg that I mentioned at the beginning of the conversation and we, as part of the research unit of it, we were always flying over to Tokyo to talk to the business units and say "hey, you guys have big data centers that could be more efficient, you should be using this stuff." Right?
And sort of they were telling us, yes, but our business units don't have the know-how to adopt something as technical as this, et cetera, et cetera. And they were starting to encourage us first to take it open source, which we did. And eventually they were saying, hey, we have this sort of spin out program where maybe you can find external investors and you can spin this out. Right?
And that's essentially what we did. We started looking for some investors. I was reading some articles about investors tracking. This was back in 2021, investors tracking GitHub's stars and growth and et cetera. And as I was doing that I got an email from an investor saying "hey, I've been tracking your GitHub stars and I noticed it's growing. Would you mind jumping on a call?"
And that's how the beginning of the road started in the VC world and with our back then pre-seed lead investor.
Benjie: Right. And so you've been a commercial entity since like 2021? 2022ish?
Felipe: 2022, yeah.
Benjie: Okay, cool. Anything upcoming Felipe, you want to tell us about? Where do we find you? Anything interesting that you guys are doing?
Felipe: Yeah, so we'll be at KubeCon in Amsterdam in March. So if you happen to go there please drop by. We'll have a booth there. We should be easy to find. And if you're not planning on going there, you should change those plans and come over because it's going to be quite fun.
Benjie: Do you have swag planned? Do you have good swag?
Felipe: Yeah, we have some swag planned. We've been thinking of having, maybe I shouldn't preview this but we're--
Benjie: No, no, no, don't, don't. No spoilers, no spoilers. You have to figure out what the swag is. You have to go to the Unikraft booth at KubeCon in March in Amsterdam. I will say this Felipe, I don't know if you know this, but Replicated, Marc's company, is kind of known for their KubeCon swag. So some stiff competition on this podcast.
Felipe: I will measure success or failure by Marc's very high bar.
Marc: We'll compare. Like, I'll come over to Amsterdam. I'm convinced. I'll go to Amsterdam and we'll look.
Felipe: Yes, please come evaluate us. Please.
Benjie: Well, I think that I learned a lot. I'm very excited about where you guys are going. And, I mean, we just touched on it, but the AI component to this stuff in regards to sandboxes and just figuring out how to scale this stuff in somewhat of an efficient way, you guys have a pretty bright future.
I'm really excited to see what you do with that. Thanks so much for coming on and really appreciate the time.
Felipe: Same here. It was a pleasure and thanks for spending all this time with me.
Content from the Library
Open Source Ready Ep. #24, Runtime for Agents with Ivan Burazin of Daytona
In episode 24 of Open Source Ready, Brian Douglas and John McBride sit down with Ivan Burazin, CEO of Daytona, to explore how his...
Open Source Ready Ep. #17, AI Native Software Factories with Solomon Hykes
In episode 17 of Open Source Ready, Brian and John speak with Docker founder Solomon Hykes about his latest project, Dagger, and...
Open Source Ready Ep. #14, The Workbrew Story with Mike McQuaid and John Britton
In episode 14 of Open Source Ready, special guests Mike McQuaid and John Britton join Brian and John to share the story of...
