
Ep. #44, Why Unikernels Are Cool with Felipe Huici of Unikraft
In episode 44 of Generationship, Rachel speaks with Felipe Huici, CEO and co-founder of Unikraft, about the powerful world of unikernels. Felipe breaks down how these lightweight, specialized VMs can achieve millisecond cold starts, enabling services to truly "scale to zero" without users noticing a delay. Learn how this efficiency is not just about saving money, but also about providing a secure and scalable foundation for the next wave of AI-generated applications.
Felipe Huici is the CEO and co-founder of Unikraft, a startup focused on making cloud infrastructure more efficient through lightweight, open-source virtualization technologies. He was formerly a chief researcher at NEC Labs Europe and is one of the founders and maintainers of the Linux Foundation's Unikraft Open Source project. His work centers on high-performance software systems, specialization, and security.
transcript
Rachel Chalmers: Today I am thrilled to have on the call Dr. Felipe Huici, the CEO and co-founder of Unikraft, a startup dedicated to lightweight and open source virtualization technologies and significantly lowering cloud infrastructure bills.
Felipe was formerly a chief researcher at NEC Labs Europe in Germany, where his main research and interests lay in the areas of high performance software systems and in particular specialization, virtualization, and security.
He's one of the founders and maintainers of the Linux Foundation Unikraft Open Source project.
Felipe, it's wonderful to have you on the show. Thank you for taking the time.
Felipe Huici: It's a pleasure to be here. Thanks so much for the opportunity.
Rachel: So let's cut to the chase. What are unikernels and why are they cool?
Felipe: So a unikernel is a specialized virtual machine. That's the short of it.
Basically you can imagine an operating system in distribution, all targeted to a single application.
Take a web server and imagine which bits of the kernel I don't need to run a web server, which bits of the user space I don't need to run a web server.
Chuck everything out, leave only the bits that you need in, and then compile it all together into a single memory address space thing, wrap it in a virtual machine and deploy it, and that's a unikernel.
Rachel: What's the size differentiation between a normal Linux kernel and one of these little specialized guys?
Felipe: Well, I think you can think that in a unikernel, if you do it right, 90% of the resources go to the application and only 10%, or maybe less, go to the unikernel itself.
So if the application, say, consumes-- If you're talking Nginx, something relatively performant that only consumes, I don't know, 5 megs, 10 megs of memory, then the entire image, the entire virtual machine will consume that much, 10 or 11 megs.
Rachel: So it's literally just a tiny sliver of the whole operating system, all that you need to run this one app.
Felipe: Yeah, that's right.
So the primitive you need underneath is the hypervisor for strong isolation. The primitive you need at the top is the application. Everything in the middle is not really relevant and so it should be as thin as possible. That's what the unikernel is trying to achieve.
Rachel: Tell us about millisecond cold starts and what they mean for service providers.
Felipe: Yeah, of course.
So if you're a cloud engineer, you've probably heard about the problem of cold starts, especially if you're running user-facing services where people expect a response to come right away.
So internet response times are in milliseconds and if the actual servers take hundreds of milliseconds, seconds to start, your users will notice, will complain, will maybe shift over to your competition.
Rachel: We are impatient and entitled people.
Felipe: That's right. The human body can actually discern millisecond timescales unfortunately, so that gets projected onto the cloud.
So that's the beginning of the problem. But it's not just cold starts.
Another problem is , you've probably heard the saying "scale to zero," which is if something is not being used on the cloud, we could potentially put it to sleep.
Makes sense, a lot more efficient. The problem is that if waking it up takes seconds, then users again notice, they can tell, "this thing was asleep obviously and you're cheating me. I don't like this service."
If you could sort of automatically have something wake up in milliseconds, then end users don't notice and then you on the back end can put a ton of services to sleep and hopefully have your infra be much less expensive.
Rachel: Again, what can you potentially save? How much money can you save by using these technologies?
Felipe: So in some of the deployments we're doing, you can think of in the order of millions of instances being run on just a few servers. Think more a rack rather than a server farm.
So it can be important depending on the types of servers you're using, especially if your service is the sort of--
Well, there's a lot of services like this where you have a Pareto distribution of 80, 90% of the services are running but not active and only 5, 10% of the users are active at any one point in time.
So you could sort of bin pack a lot of those onto just a few servers because only a few active ones are there.
What's tricky if you cannot scale up to zero in milliseconds is you cannot react to which things are going to wake up and it's really hard to predict which users are going to wake up when.
Rachel: Yeah. So you're sort of the Ikea of the cloud. You can flat pack everything and make it super efficient, but unlike Ikea, it's effortless to rebuild things.
Felipe: You know, we've been working on messaging for months, but I think you nailed it. I think we're going to go for that.
Rachel: So we're an AI show. What does this all mean for LLM users?
Felipe: Yeah.
So we're working with a bunch of AI companies, because all of those wonderful LLMs, they produce code and code needs to be deployed. Now you could say, "oh, websites and code have been deployed for ages. What's new?" The scale is new.
So these LLM agents are producing millions of instances in a day and they all need to run.
What's also different is that this code cannot be trusted. It just has been produced by an LLM.
Maybe it works well, maybe it doesn't. Maybe it has safety problems, exploits, whatever. So they need strong isolation.
And then the other thing is, while you may deploy 1 million such services, again this big Pareto distribution, it's very unlikely that a large majority of these deployments are active at any one point in time.
So all of these three properties map pretty well to the platform we've built.
Rachel: So it's just coders on steroids. You've got like this enormous explosion in the amount of code that's getting deployed and this lets you run it in a secure and cost effective way.
Felipe:
I think in traditional services in the past you would build a new product or service, it would ramp up slowly. You have a one year time plan, maybe you would be at scale in a year or two. With AI, you could be at scale in weeks or even less, depending on what you deploy.
Rachel: Yeah. Could unikernels make foundation models greener and more sustainable?
Felipe: Yeah. So we toyed around in the research arena, about six, seven years ago, sort of putting models and attaching them to unikernel and then they can start quickly, do inference, and then go back to sleep.
Of course LLMs, not so much, but maybe as we get into the arena of more specific models or models that are a little bit more lightweight, you could imagine a platform where you keep the popular models in memory and then you spin up these instances to do inference, based on them.
The models that can be read only so they can be shared, and then the instances are isolated and separate.
So I definitely think it's possible to have a cloud infra platform that optimizes for at least inference. Obviously the training will always require GPUs, I think.
Rachel: Do you use AI tools in your own workflow and if so which ones and how?
Felipe: Yeah, I'm a shameless ChatGPT user. As a CEO I don't get to program so much.
The other thing I do is I scout AI tools all the time. I watch a ton of vibe coding videos. I tried that out.
Not that it's useful in my day to day work, but it might be useful to my team.
So I always get in touch with my team. I try to find a technical champion. These are the younger kids in my team who like vibe coding and so forth.
And once a month we try to do a session where we tell people this is the state of AI and it might not have applied last month but this month because things are progressing so quickly it might be useful to you.
As an infra and sort of systems company, the people who are higher up the stack are much more open to the idea of using AI, like Cursor and Copilot.
As you go down the stack to the lower level bits, there's more resistance especially in terms of an AI automatically introducing a difficult to find bug that then it takes them a week or two of lost work to find it.
So there's a definitely inverse correlation as you go from up the stack, down the stack and the usage and applicability of AI.
There's also a bit of an internal theory that because AIs are trained on data, there isn't so much data on low level systems, high performance code out there that is open source that it could feed from.
But that's just the theory so we're constantly retesting that.
Rachel: Is any of the product generated at the moment or is it all handwritten do you think?
Felipe: No, this is all handwritten. Obviously in the infra space, we've got to be a little bit careful.
I'm trying to push the team to leverage LLMs for unit testing and so forth. But again, the same algorithm applies, when you go down the stack, people are like, "okay," but if the LLM writes--
The test is super important because then it goes into production and production is infra production and it's critical.
So the chain goes all the way out to the unit test and "can I trust the unit test that was written by the LLM?"
But of course, it's about also training staff to tell them there's a human in the loop. It's never going to be: hit the button, unit test passes, we're good to go.
Rachel: Kids stay in school, go into low level infra programming. This seems like a job with a future.
I'm actually really interested in that age distribution of reactions to AI. Like, your more senior engineers are getting comfortable with cursor and your more junior engineers--
I mean, there's probably an element of wanting to prove themselves and have pride in the code that they're creating.
But doesn't that sort of flip the usual model of new tech adoption where it's the kids dragging stuff in and the senior engineers going, "oh, that'll never work?"
Felipe: I think there's two axes. One is definitely age.
There's definitely some level of correlation with the younger ones who grew up with this stuff, so to speak, saying, "why wouldn't I use this tool? It's just like any of the other tools that I learned. "
Whereas if you're a little bit on the older generation, you definitely differentiate AI tools from the others that you'd already trust.
But it also has to do a little bit with personality. We have about 20 staff now.
And if you look across, there's a personality range of people who love trying new stuff and people who are more traditional. Right? So that's also a factor.
Rachel: Yeah. On the big five personality traits, the openness thing I think correlates really strongly to that.
Felipe: Yeah, exactly.
Rachel: All of which leads to my next question: Do you worry that LLMs will replace engineers?
Felipe: No, I'm not worried they'll replace engineers. They'll replace certain kinds of engineers.
If you have the sort of engineering job that's super repetitive, you may be in trouble. But I don't think overall that it'll replace engineers. But also if you're an engineer who doesn't use LLM tools at all and you'll reject them forever, you're going to get superseded by an engineer who knows how to use them.
But I think really good engineers, at least in the short term, will never be replaced.
Because the sort of engineer that can think about a high level problem, come up with a reasonable solution and architecture that takes into account the entire ecosystem of components and so forth, I think that's always needed. That engineer with an LLM tool can sort of implement and execute that way faster.
But if you remove that high level engineer from theprocess, you might get into a lot of trouble depending on whatthe application you're targeting is.
Rachel: And the language models almost by design aren't ever going to get to that systems thinking level.
Are there other machine learning approaches that do more closely approximate what a really good senior engineer does in thinking about the whole system and each of its dependencies?
Felipe: Yeah, so what I said is a generalization. And by definition generalizations have falsehoods built into them.
So there are a lot of people who-- What you can do is you can train, you can have a big test bed, you can do a lot of synthetic tests of whatever platform you're running and then train your LLM on it and generate data that way.
So it is possible, but it's a little bit more difficult than anything up the stack where you have reams and reams of open source code to train everything with and the LLMs have already absorbed that. Right?
Rachel: Yeah. I'm thinking more about the kind of modeling that really good infra and other engineers do in their head where they're holding the whole system in short term memory at the same time.
What kinds of machine learning could mirror that particular kind of manipulation?
Felipe: Yeah, it's a good question. I think you could definitely do it.
First, you've got to gather a ton of performance metrics because the stuff we do is not only functional, performance plays a big, big part of it. So when I say we do unit tests, you can imagine half are functional tests and half are performance tests, and both are equally important.
So we cannot have performance regressions. And in fact we always try for the performance to improve.
So whatever LLM we use would have to have a lot of data about the performance of what we're running. And performance here means a lot of different metrics.
We talked about cold times earlier. But we also care a lot about memory consumption. We care about IO, whether it's storage or memory.
We do a lot of snapshotting on our platform. And so it's really important that the way snapshots load from memory onto disk and from disk back to memory, is really, really efficient.
Also how much storage we need for the snapshots.
So if we were to build such an LLM, we would have to feed it a ton of information and metrics to take into consideration.
Rachel: Yeah. That makes me think of another metaphor, which is that you're building a Formula One car and every single component has to be entirely optimized for speed.
Completely different from a streetcar.
Felipe: Yeah. And I remember some years ago where the goal was to have some sort of gimmick where we could say we have, you know, under a millisecond cold boot time.
We were shaving like literally every line of code that we didn't need to hit that microsecond mark.
Rachel: Yeah, it's super cool.
More seriously, what advice do you have for folks who are studying CS at the moment or thinking of coming into the industry?
If you were at the beginning of your career, how would you optimize it now that AI is here?
Felipe: I think a couple of things.
I think it's an exciting time. I don't think people should see CS as threatened or "I shouldn't go into this because of AI."
All CS curricula already have, or going forward will have, very strong AI components, which is kind of fun, I think.
And then going forward, as I said, all that systems engineering low level stuff is always, always going to be needed.
I mean, it's still very hard to hire system engineers. It's not like AI came along and they're like, "oh, fantastic, now I don't need 10 people, I just need one. Plus the AI tool."
I don't think that has happened, at least not yet. Who knows where it'll go?
So I think that that's still going to be pretty much needed going forward, always with an AI component to speed things up actually.
So I don't know. I think if you're going to CS, you should be kind of excited about it.
I also grew up in a place where-- when I was doing my undergrad, Google was barely starting to kick in.
And when you were stuck on a bug, you were stuck on a bug. I mean, if it took you a week of trying out reasoning, maybe bumping into a fellow student and chatting it out, that's what it took.
You had to solve it on your own. There were almost no references. You could maybe buy a book, but that wasn't going to help you debug.
And then we moved on to a stage of Stack Overflow. And there's a lot of resources on the web these days, you ask, "Here's my code. Debug it, please. "So that's really cool.
Rachel: Even I can debug with Stack Overflow.
Felipe: I'm exaggerating a little bit because some bugs are really tricky. There's conditions.
Rachel: But anything that anyone's seen before is shallow now.
Felipe: Exactly. So you're not wasting your time on some low level trivial thing and you can concentrate more on actually building, which I think is kind of nice.
Rachel: Yeah. And LLMs sort of do that at a general level of like sucking down all of the wisdom and institutional knowledge.
But I like your point that there's always going to be these edge cases that specialized knowledge isn't in the corpuses that the LLMs ingested, and so moving to those edges where there just isn't a lot of received wisdom is going to create a durable career.
Felipe: Yeah. Or a durable startup that works on that. Right?
Rachel: Yeah, yeah, that sounds good.
What are some of your favorite sources for learning about AI?
Felipe: Yes, of course, I'm going to sort of preach to the choir. There's Humans in the Loop and all these sorts of Heavybit podcasts and newsletters that I'm sign up to. So I do that.
I'm also subscribed to console.dev, so whenever anything AI related but general, I read through all of it. I check that out.
And then I listen to some tech podcasts, Lenny's podcast, and some other bits. So I think, maybe those are good resources.
But in general, if you want to learn about AI tools, it's almost like you wake up, open your eyes, and you learn about AI tools whether you want it or not because even mainstream news is talking about it all the time.
Rachel: We are all drinking from the fire hose right now.
I wanted to ask, why ChatGPT and not Claude? I'm a Claude girl myself.
Felipe: I don't know, I don't think I have a good reason. Would you say I'm a fool and I should switch over to Claude? I can become a Claude fanboy.
Rachel: I would never say that, Felipe.
And it's hard for me to quantify exactly why I think Claude is better, but it just feels a little more subtle to me.
Sorry, that's a very subjective measure. But after I started trying Claude, I just never went back to ChatGPT.
Going back to your Google analogy, it was a little like I was an AltaVista devotee until the first time I tried Google.
Felipe: AltaVista. That's nice. That brings back memories.
Rachel: That ages me, yeah.
Felipe: All right, I'll give Claude a try. I mean I have some staff members that swear by Claude, so maybe I'll give it a try.
Rachel: All right. If everything goes exactly how you'd like it to for the next five years, I'm going to make you ruler of the solar system. You get to decide how things turn out. What does the future look like?
Felipe: At least in my domain, so cloud platforms, I'd like to see cloud platforms that are orders of magnitude more efficient than they are today.
Also to match the scale of AI which is going to sort of require it.
But I think we sort of lost our way a little bit back when CPU and chips were in cycles that were expensive. We were very careful about the code we wrote. It had to be efficient. And to some extent there's one domain that has kept that sort of faith which is the embedded world because for obvious reasons they're resource constrained and their code is really good.
In fact, for unikernels we often steal their libraries because they're really well written.
But as CPU power increased and we had multi cores and whatever we decided, oh whatever, it's just way easier to buy more servers and CPU than to bother with better code or train people or hire sort of low level systems programmers.
I think that needs to change and is changing because there's no logical argument. It's not because I have a powerful server that I should do something silly with it because that powerful server could serve many more requests if my code was sufficient.
I think we're getting to that sort of turning point and maybe not having a nuclear plant to power my server farm because my code isn't as efficient as it should be, maybe we can get into a better sort of setup.
Rachel: Oh as if anything bad would ever happen with Three Mile Island.
Felipe: Yeah, knock on wood. I don't want anything bad to happen to a nuclear plant.
Rachel: It occurs to me-- Has anyone ever started out as a front end dev and then become like a low level systems engineer?
Is that it's not a trajectory I've ever heard of. Like everybody I know who does low level stuff has always done low level stuff and was interested in it from the start.
I don't think I've ever known anybody who did like front end programming who got back down to the hardware level.
Felipe: Yeah, it's a good question. Obviously the other path is much more common.
And in fact a lot of the Golang programmers we have in our staff are actually low level C programmers converted into Go.
But I've interviewed enough people to know that it does sometimes happen.
I met some people who started with JavaScript and Python, and they're like, "yeah, but it's so abstract, I don't know what's going on."
Then they go to Golang and they're like, "oh, what's a syscall?"
And then from there they're like, "oh, what's an operating system?"
But I think it has more to do with personality and the fact that those people happen to land at the top of the stack more than anything else.
Rachel: Yeah. Curiosity and an affinity for the hardware.
Felipe: Yes, exactly. And I also think as we've gotten-- now we're entering about 15, 20 years of the cloud and hyperscalers, we've gotten very used to the cloud being dashboards.
And so the idea of there being hardware on a server with disks and having to install something has become a very scary notion, which is kind of sad because ultimately a computer scientist who does really complex programming should have no problem installing something. I think we need to be able to get back to those basics a little bit.
Rachel: Kids these days, they don't know the smell of a server room and the hum and the little blue blinky lights.
Felipe: Right. And the smell of something burning. and you don't know where it is in your data center.
Rachel: Electrical fire. Exciting!
All right, last question, best question.
If you had your own starship for interstellar voyages that lasts longer than a human generation, a generation ship, if you will, what would you name it?
Felipe: I think I would name it Terra, in honor of our nice Earth.
Whenever we leave it, I think we should look back and thank it for hosting us for so long.
Rachel: It's such a good planet. I keep all my stuff here. I love Saturn. Saturn is a fantastic planet, but Earth is my favorite. It just edges the others out.
Felipe: Yeah. You have a sort of a pad out there as a secondary home in Saturn?
Rachel: Not yet, but I have plans. Yeah, for sure.
Felipe: Awesome.
Rachel: Great views.
Felipe: Well, take me with you if you ever go. But you know, I'll keep my primary residence, here, grounded on Earth.
Rachel: Right, right? There's so much oxygen here. We love oxygen.
Felipe: Yeah, it helps.
Rachel: Felipe, it's been a joy having you on the show. Thank you so much.
Go out there, make that code super efficient, slim down those instances, and keep fighting the good fight.
Felipe: Thanks a lot for having me. It's a promise.
Content from the Library
Open Source Ready Ep. #16, Building Tools That Spark Joy with Mitchell Hashimoto
On episode 16 of Open Source Ready, Brian and John sit down with Mitchell Hashimoto, founder of HashiCorp, to discuss his journey...
Open Source Ready Ep. #14, The Workbrew Story with Mike McQuaid and John Britton
In episode 14 of Open Source Ready, special guests Mike McQuaid and John Britton join Brian and John to share the story of...
Platform Builders Ep. #6, Making Accounting SaaS Sexy Again with Brad Rigby
In episode 6 of Platform Builders, Christine Spang and Isaac Nassimi chat with Brad Rigby about Canopy’s strategic shift from a...