Ep. #30, Cilium and eBPF with Thomas Graf of Isovalent
about the episode
about the guests
Marc Campbell: Hi there, as you just heard, we're here with Thomas Graf, CTO and co-founder at Isovalent to talk about Cilium and eBPF and probably lots of other cool stuff. Welcome, Thomas.
Thomas Graf: Thanks. Thanks a lot for having me. It's great to be here.
Marc: Great. So we'd also like to get started by learning a little bit about the path you took to get to your current role, how you got to working in this technology. Will you give us a quick look at your background?
Thomas: Yeah. So I have a very low level background, I started out as a kernel engineer, spent 10 years at Red Hat doing low level networking kernel development, working on IPv6, IP forward routing, quality of service, all the nerdy, low level network. Then I got more and more involved into security, but still in the kernel space.
I was involved in IP tables, the audit subsystem, LSM, initial namespacing now powering containers, did a couple of years at Cisco during the OpenStack days. Then I got heavily involved in Open V Switch which drove the softer defined networking age from an open source perspective, and then ultimately I got into eBPF which brought me to Cilium and eBPF development that we're doing today. So I have this open source heavy, kernel development, development heavy background.
Marc: Awesome. And I can't of course forget Benjie here as my co-host. Hey, Benjie.
Benjie De Groot: Hey, Marc. Yeah, waiting for you to say hi until after the first question, but that's okay. Hey. Hey, guys. Thanks. I'm really excited to have Thomas on. Everyone knows I'm a little bit of a eBPF fanboy, which I might be the only one that's a fanboy of that but I am very excited to hear about this stuff.
Thomas, thanks for coming on. You know, I would love to hear just a little bit more about, just since we're here, how did you get excited about open source? What was your first sniff that got you really going on it?
Thomas: Yeah, I don't quite remember what year it was, but I got a set of floppy discs with Linux on from a friend and a friend told me, "Hey, this is awesome. This is something that you can toy around with and you get this operating system, and by the way it's free and there is this massive open source community around it and it's evolving very, very quickly." I'm not quite sure what it was, whether it was slackware or something else. That got me hooked onto Linux.
I was not doing a lot of software development yet, I essentially just came out of gaming and using computers for that sort of thing, got hooked onto Linux and then started doing software development. Initially I think very similar to how others have been doing this, writing small little programs. For example, together with friends we wrote an IRC server or we wrote a small audio streaming client and service, and things like that.
Then by accident I actually got into kernel development. That's quite a funny story. For one of my university classes I was getting involved in learning IPv6 and in order to learn and really understand it I wrote this small, little chat application which was using IPv6 destination options. So that's a quite deep or very specific extension of IPv6 that allows to essentially insert random headers of the network level into the IPv6 protocols, and I was using that to essentially write a chat application where you could send messages back and forth that would then get the message injected into those IPv6 destination options.
For some reason, that application did not work and I read the Stevens book over and over again which was documenting these system call APIs. I could not identify the mistake and at some I was really sure that my code is right, the Linux kernel is doing something incorrectly here, so I started digging and digging and digging and eventually ended up in kernel source code and actually found the problem. I found some calculation around setting the length of those destination headers was wrong, was off, which is why my app did not work.
I ended up submitting a kernel patch, the patch was completely wrong and people gave me feedback and I corrected it and eventually the patch got in. But that's how I got involved in kernel development and it opened up my eyes in terms of, "Oh, actually its possible to do kernel development and I can work on something that then gets installed in millions of devices and everybody can use it." That, I think, was very attractive and got me completely hooked and I also truly believed in open source would improve the ecosystem and would provide something that we can all build on top of.
Marc: That's awesome. There's a lot to unpack there. I love digging in, working on code and then saying, "My code is right, the kernel is wrong." And we've all thought that at times, but you actually validated it and got a patch merged in. Most of us haven't gotten that far. Most of us end up finding out, "No, I'm wrong."
Benjie: Yeah, I've thought that every time and I'm always wrong. I think there was one Yahoo Sports API I tried to do something with maybe like 15 years ago that actually was broken, but pretty much everything else I've ever done was always my fault. So that's an intimidating story, frankly, that you got something in the kernel. That's really cool.
Thomas: It took me weeks to actually convince myself that my code is really right. Obviously I was in exactly the same position, I go, "Oh, it must be my fault, it must be my fault, it must be my fault." But eventually I think that's also what I learned doing more than 10 years of kernel development, kernel code is really high quality but it also has bugs like every other software. It's just the consequences are super high, which is why we try to do everything we can to keep the quality high. At the same time, it's also just humans working on it.
Benjie: So wait, have you met Linus?
Thomas: Absolutely, yes. I'm not sure whether he would remember me, I definitely remember him. He was very approachable early on, he was at conferences and Kernel Summit and so on. He's a normal human being like everybody else. I think he's very, very opinionated, of course. Right? I think that's also needed. I think he's also grown quite a bit as a leader overall, but he was awesome to deal with. I think from an outcome perspective and the type of community he has created, it's fantastic. I would definitely do quite a lot of things differently in terms of how to treat everybody, but I think that's a topic or a subject that's been well--.
Benjie: Yeah, I didn't mean to dive into that too much. But speaking of fanboy, I've been a big Linus fanboy. I had an idea, by the way, that I'm throwing out to the community, I think Linus should get a Nobel Peace Prize for Linux. I'm sure that's going to upset a bunch of people, but I honestly think that Linux is the most impactful humanitarian thing of all time, or one of them. And so it's just kind of cool to talk to someone that was working on this back in the day, remembering stuff way off topic, but it's just really cool that story and I can't believe I'm talking to a kernel contributor.
I've used your code thousands and millions of time, at my company we use you, we abuse you for all kinds of stuff that's really cool. All right, well, we can move onto a little bit more modern day stuff, but I feel like we could pick your brain about all the contributing. Marc, do you have anything else? Because this is a pretty interesting conversation, I don't want to-
Marc: No, I think that background is awesome. Let's fast forward all the way to today now, Thomas. You're the CTO and co-founder of a company called Isovalent. Can you tell us what does Isovalent do?
Thomas: Isovalent is the company created by the people who have created the Cilium project, and I'm sure we'll talk about the Cilium project quite a bit. It's essentially a company that brings eBPF into the hands of enterprises, so eBPF was the next logical evolutionary step for us networking and security people on the security level and we were sure that this is the defining technology in how we would define and provide infrastructure for enterprises and customers, and essentially everybody using or investing into software defined infrastructure. As part of that we created the Cilium project and based on the success of that we have founded a company around it, and it's growing very, very successfully now.
Benjie: Before we dive into Cilium I feel like we've got a kernel contributor on, I would love for you to explain, a little high level but feel free to get a little deep level, what eBPFs are? Imagine I have no idea what that is, I'm not the only fanboy apparently of eBPFs. But will you explain high level and then feel free to get a little detailed there? What is eBPF? What is going on here?
Thomas: Yeah. I think as you mentioned, one aspect of open source and Linux is it's driving all the devices. It's millions and millions of devices from Android to set top boxes to mainframes to laptops to servers to cloud is built on Linux and so on. Which means that whatever is part of the Linux kernel is available everywhere, also the Linux kernel or operating system has a lot of control.
It can see everything, it can control everything, it's being used to provide networking and security. If you have a piece of functionality in the Linux kernel, it's literally available everywhere. This is why it's super interesting to do kernel development.
The one big downside of the Linux kernel or operating system development in general is that everybody wants a very mature operating system. Nobody wants their operating system to crash, which is why kernel development has traditionally been very slow, very difficult, very challenging to find consensus and also why typically nobody is consuming the latest, bleeding edge kernel versions.
The times of compiling your own kernel and rebooting the machines are mostly over, right? This actually created a challenge and I've lived through that challenge as part of my work at Red Hat for years and years and years where Red Hat sold an enterprise distribution and at the same time tried and wanted to create innovative products so whenever we did kernel development work we wanted to get that into the hands of users as quickly as possible.
But there was a time when, let's say if you think back 20 years, there was a time when we were required to update our web browsers frequently to access certain websites because they may need HTML 4.1 or some other more modern version of one of the web protocols and you had to upgrade your web browser continuously.
It allows application developers to run programs safely, securely and efficiently as part of the Linux kernel to drive innovation, to add the capabilities to the Linux kernel but without compromising the security model of the Linux kernel without risking to crash the Linux kernel. This is what now essentially enables this, a very similar wave as we have seen with programmability around the web browser, we see that same wave around programmability of the operating system.
Benjie: That's a great description, probably one of the best descriptions of eBPF that I've ever heard. So it allows me to write code, I can keep the mature Linux kernel that I'm running, I don't have to update to, like you mentioned, compile or the bleeding edge version of the kernel, so I have a stable operating system but eBPF provides a way for me to extend that with new functionality while keeping the maturity.
So I'd love to talk about that isolation, some of the code that I'm using to extend it may not be the most secure or the most mature code which is why it's not in the Linux kernel. How does eBPF create that sandbox and that isolation so that I'm not risking the integrity of the entire system when I'm extending it?
Thomas: Yeah. So eBPF is a byte code language, very similar to Java would look as well, so it is a generic byte code language which makes it portable. Actually, given that Microsoft ported eBPF to Windows, it's not even Linux specific anymore. You essentially have eBPF programs that you can run on x86, on ARM on AMD, but also on Windows kernels as well. So you have this generic byte code language that just describes what the program does.
You can then load this program and it will go through two steps. Step number one is the verification, and this verification will look at the program and will analyze what the program does. It will look at all the different branching options that the program has and it will ensure that every single possibility that this program can take is safe to run. That's a fairly complex operation it performs and it is limited in what it accepts, so you cannot load programs of arbitrary complexity into a kernel.
There is an upper limit on the amount of complexity, the number of branches and conditions and so on. It's also not allowed to, for example, have so-called unbounded loops. It is possible to loop in an eBPF program, so you can essentially do the same operation over and over again until some condition is met but it must be guaranteed that this condition is eventually met, that's the bounded part.
You can, for example, loop based on a value that is written somewhere in memory but this value needs to be guaranteed to be within a certain range. It cannot be arbitrary. The last piece is it's not possible to just call into arbitrary other parts of the kernel, which a kernel module could do but that's risky because if that call is invalid for some reason, it will crash the program.
Instead you are essentially leveraging an SDK or an API, very similar to how a Java or a Go Lang program would do this. You are using a library and this library, this API is stable. It's stable across kernel versions, which again, not only guarantees safety, it also gives us portability so we can actually run eBPF programs across different kernel versions as well.
The last piece is there is a just in time compiler, a JIT compiler which will take the generic byte code which can run on any architecture and will actually translate that into x86 or ARM or some other instruction set that your CPU actually runs which means that even though you have this portable byte code language that's generic, the actual runtime efficiency is at the same or is as efficient as if you would recompile and reboot your kernel.
Benjie: So ultimately you're basically getting kernel level execution time with view limits and safety guardrails for you and so you could do some crazy stuff. Is that a very, very simple way of saying it?
Thomas: Yeah. I think sometimes the limitations are overestimated. Actually it's possible to fairly complex tasks. For example, we have built a HTTP2 parser, which parsing HTTP2 is not simple. It's a fairly complex protocol. But it would also at the same time be wrong to say I can bring my big legacy application with millions and millions of instructions and I can just run that in the kernel. eBPF is not a general purpose runtime that can run any code. You need to write eBPF programs specifically and within the constraints of eBPF but you can do fairly complex tasks.
That said, I think eBPF, and that's what's unique about eBPF and where it differentiates from something like, for example, Web Assembly, eBPF has been specifically built to run part of the operating system. Its purpose is not to host or run arbitrary applications. You can do that in user space.
eBPF is built to extend the kernel so you bring use cases and you provide functionality that makes sense to run as part of the operating system. Not an arbitrary program.
Benjie: Can you just give me an idea of maybe the speed increase by running this in kernel versus the ole', "Oh, it's 10,000 times slower to run something from memory versus hard disk and 10,000 more in memory to L3 or whatever." Is there a number of how much faster this could be?
Thomas: Yeah, absolutely. The real power of eBPF is the closest relationship would be it's very similar to building edge workloads or building edge clusters, right? You want to run whatever service you provide, whatever logic you need to provide, you want to run that as close to the data or where the data is at as possible. So that could be as close to the other as possible. If you look at use of eBPF to provide load balancing it can be 10X, 20X faster than even a pure, in kernel software based solution like IPVS.
I remember when back in the days Facebook came out and said, "Hey, world. Look, we have replaced our IPVS load balancing with eBPF load balancer and it's 10X faster." And the world was like, "Holy... What's going on here?" That rarely ever happens, to get a 10X improvement. We have other examples where we are using eBPF to run as close to the application as possible.
For example, we're using eBPF to do HTTP2 parsing and we're doing that at the socket level, or even when the application is talking to the SSL library and we look at that data portion. Very, very close where the actual data is. There's lots and lots of use cases there. Instead of getting the data to some centralized piece somewhere in the middle of the kernel, we can run the programs closest to where the event or where the source of the information is.
That can be close to the hardware or it can be close to the application, and that can have massive gains because we don't have to inject something or we don't have to reroute data, we don't have to reroute network packets. Even payload of a file access, we can run the logic, whether it's visibility or filtering or authorization, we can run that where the data actually resides.
This is why eBPF then drives use cases around networking, security, both network security and runtime security, audit, monitoring, application monitoring, performance troubleshooting and so on. Every time we want to be close to the application, eBPF becomes interesting.
Thomas: Absolutely. I think it's actually not bad because there's a lot of truth to it, right? Not everybody will want to see it that way, but in particular from a performance gain and from the amount of difference it can make, it's absolutely this. Then there's another element which combines that raw power, I think what you just said describes the raw potential from a power perspective, but then there's the other element which is equally important which is eBPF is often the glue.
The kernel can do a lot of things, it has a lot of capabilities and they've grown over years and years and years and years, many, many, many layers and we want to essentially pick what functionality we need. But we may not actually want to take the regular path through the layers and layers and layers of kernel functionality. We essentially want to glue pieces together and establish shortcuts that have not been established before.
Benjie: That's great. That's a great description of eBPF and what I'd like to do now is talk about Cilium. Cilium is the open source project, wildly successful and popular open source project in the CNCF that your company, Isovalent, has created. Can you tell us what does Cilium do?
Thomas: So Cilium brings all of that powerful eBPF technology into the hands of the cloud native user base. eBPF is super lower level, it's targeting kernel developers and is primarily being used by companies like Facebook and Google and Netflix who have their own kernel teams. But it's not a technology you want to consume in its raw form.
You essentially need kernel developers to use it properly so you are reliant on having projects leverage eBPF and Cilium is bringing eBPF to Kubernetes, to the cloud native world to solve networking, networking security, runtime security, observability and, most recently, service mesh functionality as well. We're trying to bring and solve as many cloud native use cases where eBPF is a great fit and essentially hide the eBPF pieces as much as possible but bring all the power to the users.
Benjie: So that's a lot of different topics. Let's break them apart, I guess. Let's start with one by one. You want to hide the eBPF complexity and functionality so at a high level, how do you do that? Are you creating best practices policies or are you creating a DSL so I don't have to write code that compiles into byte code? Or how does Cilium actually perform that?
Thomas: Exactly. What we are doing when we are taking Kubernetes objects such as network policy or Kubernetes services or gateway API Kubernetes ingress, which describes the intent that the user is describing, and then we are using eBPF to most efficiently implement that. Because of the programmability of eBPF, it actually allows us to build a lot of great tooling, so not just have a very efficient implementation of what is needed, whether it is a network policy or whether it is providing Kubernetes networking or Kubernetes ingress, or provide OpenTelemetry visibility, it's also allowing us to essentially build troubleshooting tooling and day two operations observability dashboards in a way that are appealing and helping users as well.
While hiding the implementation of what eBPF is. I think to compare this, it's very similar to what Kubernetes is doing with namespacing technology. Most Kubernetes users do not know how namespaces work as part of the Linux kernel, it's the foundation that allow for the isolation and resource control of containers. But most Kubernetes users don't really understand those lower level details, but it what unlocked to even build Kubernetes as a platform, obviously together with the container runtimes.
Benjie: Right. So it's literally just as easy as I get the benefits of Cilium, I don't need to be an application engineer, I can get the power of Cilium just by installing Cilium into my cluster?
Thomas: Exactly that. I think given the popularity of Cilium, in many cases you are already using Cilium, you may not even know it. For example, if you're using GKE or Antos or EKS Anywhere, Digital Ocean. They're already using Cilium under the hood and you will be using it. In a lot of situations you can actually just tap into it, for example, and benefit from the observability data that Cilium provides.
You have a layer that's called Hubble which is our observability layer and it provides visibility into a lot of different layers, for example, all the network communication that is ongoing like who is talking to whom, please measure the HTTP latencies, show me why network policy is denying some traffic or show me the amount of traffic volume between cloud provider regions or between availability zones.
All of that can be looked into and Cilium exports Prometheus metrics and essentially flow logs via FlowND and via OpenTelemetry traces, and users can benefit from that and build dashboards or feed data into an SIM and so on. So in a lot of cases, Cilium is already there and then can be tapped into to extract data that's useful for platform teams, for application teams and so on. I think the other aspect which I think is the security angle, which is enriching and empowering security teams with the required visibility.
That then goes more into the runtime side as well where we benefit from eBPF's ability to not only understand what's going on the network level where Cilium can see everything on that network packet that is being transmitted or received, but it can also see all the system calls that are being made. It can see every time a process is changing namespaces or every time a process is increasing or escalating capabilities, or every time a process is executing another process and so on. We can combine all of that information, all of that observability and provide a lot of insight for security teams as well.
Benjie: So Cilium is really providing that kernel integration. It's not trying to be the single pane of glass, it's not trying to be the dashboard or the monitoring, the observability tool. It's really trying to collect the data, allow me to pass that into whatever tool in the scenario that you were just describing, that I'm using and I want to actually have alerting and monitoring and instrumentation in everything on there?
I think this sweetspot is what makes Cilium so popular, it's that we're not only incredibly fast and scalable at doing things, I think we power some of the biggest Kubernetes clusters, but at the same time we also provide a lot of visibility that used to be incredibly expensive to get. That combination allows us to operate at speed, at scale while still having a lot of visibility, which is required to operate a complex system such as Kubernetes.
Benjie: How long has Cilium been around? How long has this project been going?
Thomas: We started Cilium 6 years ago and we didn't have this Open Sourcing mode, we actually started in public. The first line of code ever written was already public, so it's literally been six years since the first line of code was written. We were not part of the CNCF Foundation in the beginning. We actually decided that we want to have a very pure feedback from the user base so we actually did not want any free marketing from that perspective.
We wanted to have Cilium succeed on its own in a pure open source setting first, and then when it became clear that, yes, Cilium is what users want, more and more Kubernetes simply including or even preinstalling Cilium, with the user base growing and growing and growing. That's when we decided now is the time to actually donate and move it into the CNCF.
Benjie: You talked about creating the code, the first line of code was in a public repo. That's not as common as you actually think. A lot of open source projects were created as private projects and then once it was ready for a view and cleaned up and good, then the maintainers will flip the switch and make it open source.
I love that creating it just as public and it's not good at first, it's not functional at first, you don't have documentation. But you have to start somewhere and really showing that journey, showing that path to other engineers, other software developers to show that this is how software is written. That's fascinating. Would you do that again if you could?
Thomas: Absolutely. I think it's the best way to make sure that you align with your user base and that you learn early and quickly. But it is more effort, right? It is a lot more complicated. Essentially you need to keep documentation up to date from day one. You need to convince users to provide feedback. You need to listen to them. If you don't listen and react, they will move off, right? But this means that you are aligning early and you are sure that the open source project that you are creating is meeting a demand and you can then monitor that demand and act accordingly.
I see a variety of open source projects that have this Open Source Day, and then it's a bit hit and miss.
It can be a great hit, but what if it's a miss? You get this one big splash and then if it's a miss the project is almost dead. It's really hard to then keep it up because from an expectation perspective, nobody is expecting the project to now really, really succeed.
If you start small and grow year over year, you get the feedback early and you know whether you are on track or you are not on track so I would always... I would recommend to everybody if you have the time, if you are in a situation where you feel like you have a strong team to execute on the mission you have, go public, go open early.
Benefit from the massive feedback that you get from the open source world, I think this is the most undervalued aspect of open source, it's this exposure. Linux would never exist without that early exposure, I think if Linus hadn't published a very, very early version of Linux early on, Linux would never have gotten where it is today.
Benjie: Yeah, exactly. It's okay to be embarrassed in the early days. Thinking about Cilium, eBPF today is still relatively new. It's the more and more adoption of it, but I got to imagine six years ago eBPF was not a term that was talked about on podcasts, it wasn't a common term. You have this brand new open source project, it's telling folks, "It's okay.
This immature, unbattle-tested project that has a few hundred stars on GitHub. Install it, it's going to interact and run kernel level code and intercept Linux kernel calls and things like this." That had to have been a high bar in order to get people to start adopting this project initially and build that trust with the end customers.
Thomas: Yes, and I think that's the right way to do it. You will find early users and they will be aligned and it's only the open source project do I need to find, okay, these are users that are leading the way and they are doing what others will be doing at a bigger scale several years later on. The first version of Cilium we published was incredibly extreme. It was IPv6 only and it was essentially designed from scratch for massive scalability.
We looked at what would we do to run 10,000,000 pods, how would we design to account for that scalability. We looked at what's now called service mesh in terms of layer seven, visibility layer seven, enforcement, all of that. So we defined pretty extreme goals and then actually toned it back from there and now we're seeing a lot of these initial design goals that we had set are now actually becoming the standard.
It's incredibly rewarding to see that what we now see as a common usage pattern around IPv6 clusters, multi clusters, bigger scalability, identity based security, all the core principles we built into the first version of Cilium are now becoming the norm around the cloud native era as well.
Benjie: And that's really cool. We have a lot of listeners that are looking to start contributing, but I bet there's a few that are like, "You know what? I'm going to write open source Doom on eBPF. How do I get started?" Okay, bad example. But how did you jump start the community? Because it's one thing to say, "Okay, do it in public." But if a tree falls and no one's listening, what's happening?
Obviously you had some experience with contributing to the Linux kernel and you were at Red Hat and all that stuff. But what were some of these early successes? How did you get people, when you were doing it in the open, to actually take you seriously and to start giving that early feedback that was obviously so valuable?
Thomas: One of the initial really big milestones for us was DockerCon in 2016, which was the first time where we had a really big conference talk and we essentially described and demo'd Cilium for the first time and everybody was just learning about containers. At the same time, there were still more solutions from a networking perspective, they were just the ones that existed before from networking fertilization that were created for Open Stack that had just been repurposed for the age of containers.
Then we went and actually showcased something that is very different and targeting a use case that most people were not aware of yet, like, "You will be running at larger scale and you will think about not only network protocols, but you will think about application protocols, you need identity, you need authentication and stuff, segmentation and so on."
We used, I think, others that were further along as examples, so I think the common examples would be Google and Facebook and Netflix and Twitter that have been using more modern patterns already, and we were essentially signing off some of them as well in the belief that a wider set of enterprises will eventually get there as well.
That's obviously a pattern that we see very frequently where enterprises are, with a lag time of five, six, seven years, doing things maybe not quite to the same extreme levels but moving towards a similar direction as the hyperscalers do as well. So we were not without signal, but the set of users that would eventually end up using Cilium did not exist yet.
I think that's always the challenge if you want to innovate around open source, you need to be ahead of the curve and at the same time you need to have user feedback, you need that feedback channel. I think open source conferences can really help to find not only users and get into a conversation with them as well. I think DockerCon, Open Source Summit and then KubeCon later on were incredibly helpful.
I remember going to KubeCon Berlin and we had a small booth there from a Cilium project perspective and we got overrun. Everybody was interested in it, "I need visibility. I need better networking." It was an exciting time and we got a ton of amazing feedback in terms of what we could be doing better, what is meeting demand, what is not meeting demand and so on.
Benjie: All right, this is just such a great example for the community to follow this project. Let's switch it back to Cilium itself. I have a question, and that is because I've done a little bit of eBPF'ing as you know, as the number one fanboy. So let's talk about overhead and let's talk about how you get installed.
I'm assuming a stateful set, but just talk to me, if I'm a Kubernetes operator today and I'm like, "I want to start taking advantage of this stuff. GKE does it, I want to do it. How do I install it?" It's pretty simple, I take it. But also I want to understand the overhead, because I feel like overhead is the one little bit of an elephant in the room a bit. How much does running these different things cost me on my raw compute side and memory and all those other things?
Thomas: Yes, so installation is very simple. Cilium comes as a daemon set, it essentially runs an agent on all your Kubernetes worker nodes written in Go Lang, and this agent will connect to the Kubernetes APIs, it will receive the policies to services, all the intent it has to implement. This Go Lang uses base agent and interacts with the Linux kernel or the operating system to install the eBPF programs that will perform the actual required operations, for example, perform the networking, perform the load balancing, implement the Kube Proxy functionality, extract visibility, set up multi cluster, enforce runtime security policies and so on.
There's also an operator or a leader that gets selected which is the minimal control plan. In general, everything that Cilium does is backed with custom resource definitions in Kubernetes, so Cilium does not come with a massive control plane on its own. It does have one component which is centralized using Kubernetes as its leader election, and that's simple a deployment that can run in a highly available version.
Cilium can also be run on non Kubernetes worker nodes as well, so you can actually run Cilium, for example, automated with Puppet or Ansible or Terraform or some other automation framework and actually install it on a virtual machine or on a metal machine and make that machine part of the mesh as well. So eBPF is not Kubernetes or container specific in any way so you can actually benefit and bring the powers of eBPF to outside of your Kubernetes cluster as well.
It's actually what we see a lot of users do, they want the connectivity that Cilium provides but they want that for more than just the containerized application. They want the Kubernetes workloads running in the cloud, for example, maybe with an Open Chip cluster running somewhere and then they have a couple of metal boxes with a database and then fleet of EZ2 VMs running some applications and they will connect all of that together and run security control across all of that.
eBPF is an amazing technology to get that signal platform going where we can do all of that in a universal way. So on Kubernetes, as simple as a daemon set. Outside of Kubernetes typically takes some sort of automation framework to install the Cilium agent which will then orchestrate the eBPF layer. In terms of overhead, the overhead on the networking side is incredibly minimal, and I can tell a story which is kind of funny.
We did a lot of benchmarking, and you can find all of the networking benchmarks we have created on Cilium.io. One of them was surprising to us and we actually measured it over and over and over again because we believed it was wrong, because we measured that a pod talking to another pod on two different nodes was faster than the nodes talking to each other directly.
It's like, "Oh, this can't be true, that nodes like this, that there's less complexity involved. Why should containers be faster talking to each other than a node themselves?" It took us several days to realize that because with eBPF we were essentially bypassing the IP tables or net field layer on the host that pod to pod communication was quicker.
So the container networking that we are doing as part of Cilium takes less time than even an NT IP tables ruleset on the host itself, which was surprising and I think it shows some of that gluing power that we talked about earlier. It's not just about the raw potential of how fast eBPF can run. It's also about essentially removing part of the system that you don't need.
So if you don't need IP tables, let's bypass it. If you don't need, for example, some sort of additional visibility layer, let's bypass it. If we don't need the network injection and we just need to copy data, let's do that. I think it's allowing to rethink how to do things and actually cut overhead out. When we measure overhead where we have overhead, it's typically around observability and the amount of overhead there radically depends on how much observability you want and in what form.
It's obviously a lot more efficient to collect a metric and increment a counter or create a histogram than it is to essentially create an event or a flow log for every single packet that is being forwarded. Cilium can do both and depending on how you configure it and what level of aggregation and filtering you can configure, the overhead can be from point something percent to 20%.
Often it is not necessarily the kernel level data exfiltration that we do. It actually comes to boring JSON encoding and other things popping up in the performance profile because that's very expensive. So we get all of the observability but then we have to do JSON encoding and write it to disc which is very, very expensive.
So it's not an easy answer, but I think the short answer is that depending on what you need you can heavily optimize it and get it down to the lowest level. We're actually running Cilium in some stock exchanges with lowest of latency needs without any issues, so I think definitely from a potential perspective it's fully capable to run in highly low latency, high performance environments.
Benjie: Yeah, every time I dig into the eBPF stuff, it makes Boolean fries, it does everything for me. It's really cool. Speaking of which, you mentioned that you guys were adding a service mesh to the Cilium ecosystem. Can you talk a little bit about that? Because I feel like that dovetails really well with what you were just describing with the lack of latency, whatever the right term is.
Thomas: I think overall the Cilium feature set and the design goals actually align really well with the service mesh. It's being close to the application, it's understanding the application, it's understanding application protocols. So not just keeping or staying at TCP, but actually understanding HTTP, and instead of understanding a TCP latency, understanding the HTTP request response latency.
Elevating the level where we argue and where we provide visibility, so there is a strong alignment around that. Historically we have not called ourselves a service mesh because we weren't able to do all of the service mesh functionality. Things that we have not done in the past were things like layer seven traffic management, retries and circuit breaking, as well as NTLS.
But we have provided, for example, HTTP visibility, network security, multi cluster, load balancing, cannery rollouts and so on. So because we did not have a full integration or not a full feature set, we actually did an integration with the Istio. So the first service mesh offer we had as a Cilium project was to have a native Istio integration where you can run Cilium and Istio together.
Istio is one of the more successful, more popular service mesh projects that are out there, and we've done a variety of things such as accelerating Istio, removing the unencrypted payload of aspect of a side car and a variety of optimizations where we can help a Istio or a side car implementation. That was great and a lot of users were successfully running that.
Then, over time, more and more users approached us and said, "It would be awesome if you could help us get to a more efficient service mesh and there's a couple of problems that we would like to solve. First of all, if possible we would like to avoid running a vast number of side car proxies. Most service meshes today implement service mesh functionality with a side car model where they essentially run a proxy for each pod and this proxy proxies all the traffic and essentially funnels and provides all of the functionality in that side car proxy.
So if you run 1,000 pods you will need 1,000 side car proxies. The ask was, "Can you do something about this?" So we started working on it and we started out with implementing a variety of functionality purely in eBPF. I mentioned we have implemented an eBPF based HTTP2 parses that is able to provide tracing data, completely transparently read out without running any proxy at all and it can give application teams a golden signal dashboard and show latency numbers with no impact, minimal impact.
I think the overhead is less than 2% on the overall latency, really, really, low overhead but with the same functionality. But that's only the visibility portion. It can not do any traffic management, it cannot do any load balancing. For those use cases we still go to a proxy and this is where we go to the Envoy integration that we had for years as well.
As I mentioned, we actually went to the layer seven almost since day one and we've always integrated with the Envoy proxy which is another CNCF project which provides things like HTTP, GRPC and now also things like Kafka parsing, and we have been using Envoy in a per node configuration to actually get those layer seven services very successfully.
So we have brought that integration as well and call it Cilium service mesh so Cilium service mesh is a combination of eBPF based service mesh functionality, Envoy proxy based functionality that is not a side car and a new model of doing NTLS that does not do the authentication as part of the payload but actually does that separately.
It gets fairly technical but that has a massive benefit that we can run NTLS and make it compatible with any network protocol so we can support any network protocol while still relying and still benefiting from the mutual authentication part of TLS. So essentially users are driving us to solve some of the service mesh pain points that exist today with a clear ask that, "Please don't reinvent the control plain.
Please support gateway API, SMI, Istio, Kubernetes Ingress, Kubernetes Services, but please optimize the data plane and get us to a more efficient version of the service mesh data plane. I think there have been some tweets recently about, "Hey, I'm using 20% of my compute just for service for my side car proxies." That's definitely something that users want to avoid and that's the problem we are looking to tackle while essentially providing that universal data plain that is compatible with a variety of different service mesh control planes.
Benjie: Yeah, speaking from maybe three years ago, maybe four years ago, I don't even know. A while ago when I first turned on Istio for the first time, I think it was V1, maybe not even V1 and it was using 200% of my compute so that was quickly a problem and so it's good that it's down to 20% honestly. You said that you could do MLTS on any protocol, did you say that? Did I hear you say that?
Thomas: Yeah, so the thing is we are all using TLS every day, when we go to our online banking app we're TLS to secure that connection, the browser does it for us and TLS is fantastic for internet where TCP and QuickCode are primary protocols we use. Unfortunately, enterprises use a vast variety of different protocols as well, some of which we have never heard of and as they bring more legacy, more traditional enterprise applications to their cloud native world they essentially need service mesh connectivity and the value set of service mesh like connectivity plane provides, but with the compatibility for all these other network protocols as well.
The way we do that is by separating the connection and the network protocol that actually carries the data from the authentication itself. So instead of making the authentication and the transport of the data as one connection, which essentially limits it almost exclusively to TCP, we separate it too. So when one service wants to talk to another service and it's using a legacy protocol, Cilium will hold up the data, authenticate the services with each other using MTLS and once that authentication has succeeded it will allow the data to flow and it can then use IPSec or WireGuard to encrypt and authenticate the data independently of that.
This has one massive benefit aside from supporting any network protocol, it also separates the authentication away from the data path. HTTP is incredibly hard to parse and so proxies have typically been vulnerable to vulnerabilities quite repeatedly, and by doing TLS or MTLS as part of the proxy, we need to share the keys, the certificate of the secret with that proxy. So if the proxy gets compromised because of an HTTP vulnerability, the key gets leaked, the key gets lots by essentially moving the authentication part of the proxy, out of the proxy, out of that vulnerable data path part. We're actually improving the security posture as well.
Marc: So I want to talk about the extensability of Cilium specifically, eBPF has all this functionality, you're talking about this really cool stuff like MTLS on EDP transmission. If I have Cilium installed, what can I do with it? Obviously you talked about network security with Cilium, you talked about observability and monitoring with Hubble, which are the two main projects here. But what else can I do? How extensible is it? If I have another thing that I'm like, "I just want to do this." Do I have to go write eBPF code to start and just say, "Cilium solves this problem, I'm going to go solve my other problem"? Or can I write policies that actually execute code in Cilium?
Thomas: For the vast majority of use cases you will never have to touch eBPF. We do have really powerful users that want to essentially extend the eBPF data path capabilities but most of these changes actually flow back into the open source project. For example, we have telco customers that are heavily relying on Cilium now, they're actually extending the data path capabilities by working with us to extend the eBPF portion of it and then finding ways to expose control over that in a Kubernetes or cloud native way.
So think telco 5G with CODs where you can bring application developers and give them a user experience and a developer experience that's aligned to cloud native and Kubernetes with the low level latency and throughput and network protocol control that's needed for a telco network. So there are powerful users at that level where they want and need control of the eBPF level.
But for the vast majority of users, it's essentially that most of the functionality is already there and it's mainly about using the functionalities, writing policies in YAML or a JSON, using a Kubernetes object, or implementing existing standards and extending them. For example, implementing more and more of the Kubernetes Ingress resource, or we're now working on implementing the Gateway API resource and so on so it is possible to use eBPF if you want to operate on that level. That's not needed for the vast, vast majority of users.
Marc: Got it, that makes sense. One of the things that we haven't actually talked about, eBPF, we talked about the performance benefits you get from running it in the kernel, we talked about some of the limitations, what you can and can't do with byte code. But we haven't really talked about probes and that's actually one of the huge values of Cilium, I'm guessing is providing inside eBPF where I can know when this kernel function is being called because the code is actually running in the kernel.
Thomas: Yeah, so I think observability and troubleshooting ability is what we have underestimated the most in terms of needs when we created the Cilium project. I think initially we thought of the Cilium project and we focused heavily on the security aspect, we built better and more secure controls.
Then we quickly learned that was a benefit of exposing and opening up early, that as Kubernetes is complex, as the scalability increases with containers, as multi cloud becomes a reality, this end to end visibility and understanding all aspects from app to app and understanding where it might fail, what is working, what is not working is super crucial.
So this day two operational visibility, whether it's understanding when is DNS failing, correlating an HTTP latency with CPU loads or understanding the HTTP error rate or correlating a network policy drop with an application failure. All of that is super crucial and what's amazing about eBPF is because of its performance profile, because it's so low an overhead, it actually gave us the opportunity to look at a lot more things at different levels because it's at such a low cost that we can do so.
Some of these things were possible before but they came at such a massive overhead that it was not possible to actually run any of it in production. We all know back in the days when we have to reproduce issues, customer issues in a repro environment and then measure and get insights.
One major use case that Netflix used eBPF for was actually performance troubleshooting in production systems where the application was maybe not performing as well as it could and observability was needed in the production- level system because the actual issue would only reproduce at scale, at massive use.
We applied the same principles for Kubernetes workloads as well, whether it is load balancing at high speed or whether it is high scale, or simply even for smaller deployments, just the deep visibility and understanding where does it fail? What is working? What is not working? Is it DNS? Is it a proxy? Is it the app? Is the latency because of the network? Is the latency because of the app and so on? A lot of these fundamental questions can be helped with eBPF and Cilium.
Benjie: I don't know if we've actually talked about this, but I had planned to send this episode to a lot of people that have asked me a lot of questions on eBPF so I'm just going to make sure I get everything in there. But I think one of the coolest things about this is what we're not talking about, I don't need to integrate anything into my application to get all of these benefits you're talking about.
When you say that you install a daemon set, I'm literal, just to be clear here and correct me if I'm wrong here of course, but I am literally just installing something onto a node or onto a daemon set or whatever and I just get all this magic stuff that basically just goes around in a... Well, it's safe. I hope it's safe, I don't know. That's always been my anxiety about this stuff, I'm like, "Okay, it's a little too good to be true always."
But it goes around all of these things, it goes to the kernel level, uses probes so you're not pulling, you don't have these endless For loops, all these other things and it just gives me all this observability, it gives me all this ability to man in the middle my own application basically. Is that right, Thomas?
Thomas: Absolutely, yeah. I'm coming from the infrastructure world, so to me that's requirement number one so I always take that for granted. But it's 100% true and it's a massive value. There is no code instrumentation. There is no code injection. There is no redirection. We're looking at what an application is doing. We're looking at what is the system doing.
We're looking at this not only at the system level but also in a distributed way so we can compare the data that we observe on different nodes and we can even correlate a request as it leaves the node and is being received on another node and correlate all that together. It's incredibly powerful. It's simply something, as an infrastructure engineer, that's also something that was required and without that transparency it wasn't an infrastructure solution. But it's a massive value, absolutely.
Benjie: Does Cilium have something where I can just play back the kernel space or something for the last 10 minutes and put it on some-
Thomas: We have, we have something that we call TimeScape which is essentially that's getting into our commercial product. That's a time series database based on clickouts that can record all of these observability events, on the network level, on the runtime level, on the application protocol level, whether it's a network policy event or an HTTP request or a system call or a privilege escalation or a process execution.
It can record all of these events and collect from all the different nodes and then correlate the information together and build models. For example, we can build these amazing service maps or we can automatically generate network policy. We can identify certain patterns. To give an example, we can for example understand when a container is usually not ever launching another process so it's a single process container, so it always only executes this single binary, probably statically linked.
Then we can understand, "oh, on this one occasion it actually did start a soft process," which is then very likely actually a security relevant incident because maybe the container or the pod was compromised and a soft process was started. So a lot can be done with this data that we record and we can also replay it so we can actually replay some of the network traffic we've seen.
We can even replay some of the runtime behavior that we have seen, we can replay the file access and so on. But that's getting into some of the commercial product that we have. But absolutely.
I think that's one of the key benefits of eBPF, that because of the low cost we get the granularity of data that we can then use to store this auditable stream of events that can be fed into a time series database.
Benjie: Don't forget how crazy it is that there's no application tooling whatsoever. I feel like the eBPF people don't realize how revolutionary that is. At Shipyard we use Sentry to bunch net loads, I love Sentry. But imagine if I didn't have to do any tooling at all. So cool. But let's talk about the commercial product awhile.
We have people, it's always good to understand how open source projects support themselves, how you monetize and because we want these things and you guys have obviously been amazing stewards of this project. We want you to succeed, we want this project to live on. So tell us a little bit more about the commercial offering. I think you just told us some pretty cool features but give us a little more, give us a little more. Sell me a little bit, just a little bit.
Thomas: Yeah, so I think first of all Cilium is open source and we completely believe into open source as the best way of doing innovation. At the same time it's obviously important for our company and all the customers that bet on the technology that we as Isovalent are successful as well because the technology needs to be maintained, and even though the list of contributors that are coming in from the outside more and more, that list is growing, essentially Isovalent is still fundamental that Cilium continues to grow successfully.
From that perspective, we sell an enterprise distribution of Cilium which essentially provides Cilium open sourced in a hardened version and with an extended end of lifetime, proper security policy back boarding, professional services. Then on top of that we have a list of features which are enterprise only. I have mentioned one of them which is Hubble, our time machine database.
We also have a set of compliance monitoring aspects, typically when enterprises want to achieve compliance some of the visibility aspects there are enterprise only. The HTTP2 parsing, the low overhead one, this highly optimized eBPF parses is enterprise only. So we essentially provide enough to be successful on your own in the open source version and then have optimized versions and additional enterprise specific functionality that is found in our enterprise product.
Benjie: If I'm an enterprise, what's the coolest feature that I get if I do enterprise?
Thomas: I think that depends a bit, I think for many it is the amount of the security visibility that you get. So the richness and the automation of all of it, which means that you install the enterprise version and without doing anything additionally you can see what are the TLS connections ongoing, what ciphers are they using, is any of my apps using an insecure cipher? I see the HTTP level, I can prod my SIM, for example, and look into cross scripting attacks into the HTTP or URI.
I see the entire network layer, I see all the DNS resolutions, I can quickly probe which pods are actually talking to outside of the cluster, which pods are exposed to outside of the cluster. I can then correlate that with, for example, runtime information and identify which policies are listening on what port. In the last two weeks, have some applications started listening on a new port?
All of these questions that a lot of security teams have, they get immediate answers to that, and without actually installing a lot of different components. They install the product and essentially get all of that observability data neatly covered in a time series database, and even better, that works across all the cloud providers.
It works whether it's in cloud or whether it's on prem, even outside of Kubernetes itself. So I think that's the reason Aha Moment that our customers have, is, "Oh, I get all of that observability at super low cost, and it's multi cloud and where it's hybrid cloud."That's essentially what security teams want and need because a lot of them don't fully understand Kubernetes in all details yet.
Marc: Yeah. I know we're starting to run a little bit long and out of time, but there is one more question that I'd love to hear your thoughts on and that's a couple questions in one. What's next? What is the team working on right now in the open source world for Cilium, for Hubble? Or if I'm just super interested in providing feedback or contributing or running it, what type of feedback or how do I get started and involved in the ecosystem?
Thomas: I think we have an amazing set of users that are continuously driving Cilium forward in additional networking use cases, whether it is telcos, high frequency trading, multi cloud aspects, hybrid cloud aspects. But then there's also the complete service mesh space and I think my prediction is that what we now call service mesh and what we called CNI or Cloud Native Networking, I think this will all become one layer and users and customers will essentially demand from the cloud naming space that, "I want this connectivity plane."This should be as efficient as the network used to be. It needs to be completely transparent but it needs to be close to the application, it needs to understand the application protocols.
It's no longer enough just to understand TCP and to look at TCP retransmissions as a way of achieving resilience. We need to understand HTTP retries, we need to have layer seven awareness when we do load balancing but without introducing a vast majority of overhead and while being aware or by being compatible to the enterprise legacy world as well.
And, most importantly, while being able to connect and integrate the vast majority of workloads that are not yet containerized, that are not running on Kubernetes itself yet, but where there's a huge demand to bring cloud native principles and concepts to that world as well without having to migrate them. I think that's what we will focus on the most as this space converges together, we will invest heavily into working with customers to make them successful and allowing the wave that is yet to come to be successful as quickly as possible.
Benjie: This was great. I have a whole list of things that I wanted to talk to you about that we didn't even hit, but we're up on time. I'm going to do a little shoutout though. The Go eBPF library, I'm pretty sure you guys wrote that, right?
Thomas: We wrote it together with CloudFlare, so that was a joint effort, but yes. We were heavily involved in this.
Benjie: So that's another project for the audience to go check out, if you want to get low level on eBPF or contribute to that possibly. Thomas, I was not being facetious earlier, I am literally going to actually send this episode out, say, "Fast forward the part where I talk, but listen to everything that you have to say." This is so informative. I have one last question before we wrap. What is the name of the B?
Thomas: It's called E B. We voted on it, I think during the last eBPF summit. We organize an eBPF summit every year in summer, and this year it will be late September. As one of the exercises we named the mascot and we called it E B.
Benjie: Okay, and that's great. We'll leave it to September, there will be a eBPF conference. We'll put a link to that in the description. Could not thank you enough for coming on. Thanks, Marc. Thanks, everybody. And as much as this all seems like black magic, I've played with it, we use it, it's great. It all works. I'm waiting for... I don't know what's going on, why this all works so amazingly but it does. It's really cool, check it out and obviously check out Cilium.
Thomas: Thanks a lot for having me. This was great.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
Jamstack Radio Ep. #127, Open Source eCommerce with Saurav Pathak of Bagisto
In episode 127 of Jamstack Radio, Brian speaks with Saurav Pathak of Bagisto about open source tools for building ecommerce...
Understanding Legal Issues for Open Source Software Start-ups
Viewing Open Source Startups Through a Licensing and IP Lens Open source software (OSS) is a vibrant and rapidly-growing space,...
How to Successfully Invest in Open-Source Startups
Investing in Open-Source Startups: What to Look for Open-source software (OSS) leverages the power of community to create a...