Ep. #15, Operator Framework with Evan Cordell of Red Hat
In episode 15 of The Kubelist Podcast, Marc speaks with Evan Cordell of Red Hat. They discuss the inception and roadmap of the CNCF incubating project Operator Framework, as well as insights on deploying and managing Operators in your cluster.
Evan Cordell is a Principal Software Engineer at Red Hat. Evan was involved in the creation of the Operator pattern. He was previously an engineer at CoreOS and a software developer at LocalMed.
In episode 15 of The Kubelist Podcast, Marc speaks with Evan Cordell of Red Hat. They discuss the inception and roadmap of the CNCF incubating project Operator Framework, as well as insights on deploying and managing Operators in your cluster.
transcript
Marc Campbell: Hi and thanks for tuning into another episode of The Kubelist Podcast.
Today, I'm excited to have Evan Cordell on with me.
Evan is a principal software engineer at Red Hat, working on the Operator Framework project and we get to spend the episode talking about Kubernetes Operators.
Welcome, Evan.
Evan Cordell: Hi. Thanks. Nice to be here.
Marc: Awesome.
Before we get started on the technical side, I'd love to hear a little bit about your background.
How did you get into Kubernetes and specifically end up at Red Hat?
Evan: Sure.
Yeah, it's a bit of a long story, but I used to work at a health related startup and we started doing some Kubernetes things there, through which I found Quay and I was looking to move to New York and Quay.io, the container registry is a team based out of New York so I started working there when we moved up here.
From there, Quay had already been purchased by CoreOS at that point so I joined CoreOS to work on Quay.
And then over time, some of the priorities within CoreOS changed here and there and so eventually the original Quay team was sort of divided up to tackle different projects.
And one of the things that we started working on was the Operator Framework related things.
We didn't call it that at the time, but heavy focus on operators from CoreOS and supporting the distribution of Kubernetes, the auto updated Tectonic.
Marc: Yeah, that's great. I know CoreOS did a ton of early work.
In fact coined the term operator and I'd love to dive into that, but specifically we're here to talk about the Operator Framework.
Before we do that, Operator Framework is currently an incubating project at the CNCF, but it's a little bit different than other projects which are one thing that you can run.
It's a little bit more of an umbrella project if I understand it correctly.
How would you describe the project today?
Evan: I think umbrella is a good way to describe it.
There's two main components to it.
It's the Operator SDK they use for writing operators and then operator lifecycle manager for installing and updating and dependency resolution of operators.
Most of the work falls under one of those two categories, but there's lots of sub projects for sure.
Marc: Cool. Operator STK helps you write an operator in OLM or the operator lifecycle manager helps you run and operate them.
Just in case anybody is unaware of the term operator or there's a little ambiguity around what the term operator means, can you define that for us?
Evan: Yeah, happy to.
There's I think dozens of slightly different definitions.
The original one from CoreOS is a controller that you run on Kubernetes that is in some way application specific.
The canonical example would be an NCD operator is a controller that you install in Kubernetes that gives you new APIs to work on, create an NCD cluster object in the Kube API and you get some pods running with your NCD cluster.
And as far as Operator Framework is concerned, it's probably easier to think of it in terms of just a controller framework.
We don't really know what the operators were installing do and we don't really care so we have tooling to help you build operators that might manage operands like NCD or provide some other service to the cluster, as long as it's a controller with potentially some CRDs or maybe an extension API server.
Marc: Yeah.
And if I'm not mistaken there's currently in the SIG app delivery special interest group, there's an operator working group that's trying to concretely define what an operator is.
Are you involved in that at all?
Or are you paying attention to the work that's happening there?
Evan: I was personally involved early on.
They had some issues with their calendar.
I don't know if that's been resolved, but basically I never knew when the meetings were going on.
I think that that's been fixed since then and I haven't dipped back in but we are definitely looking to get more involved with that group.
Marc: Cool. Yeah.
I think operators have been around for a couple of years now and a lot of folks are adopting them, but there's lots of, like I said, ambiguity, I guess, around the term operator.
And it's like, is it a marketing term? Or is it actually a technical definition of what it means?
And I think you had a pretty interesting description of it there a second ago.
Evan: Yeah. To some degree it's absolutely a marketing term, but to some, a different degree, it's absolutely a technical one.
I think the problem comes with the on the technical side, there's lots of different ways you could define it and with different properties that you might care about.
And that's kind of why for Operator Framework we're not super strict about what we think of an operator is.
It's a controller in Kubernetes at the bare minimum.
Marc: Let's dive in for a second until the Operator SDK side of Operator Framework.
This is a tool that helps me bootstrap and create my own operator.
Is it like something that I just use one time to set up a project and then it gets out of the way?
Or what services does it help me as a developer when I want to write an operator?
Evan: Yeah, so in a lot of ways it's similar to Kubebuilder.
It's based upon Kubebuilder and controller-runtime.
It gives you the similar, run wants to scaffold but then if you have changes that you want to make, that you can express through its interface then you can run it again, generate new things, make changes over time.
That's kind of where it is today.
There's another project, like I said, there's lots of sub projects in Operator Framework, but another one for SDK is the operator lib, which is a collection of common libraries for common tools that you might want to do something in your operator.
Managing the version of an operand or something like that, an update of a piece of software that it's managing.
That's something that we're focusing more on now so there's not a lot there yet, but that's kind of the how you would use SDK over time after bootstrapping.
Marc: And the reason for this is because is it just really hard to create an operator, lots of scaffolding and bootstrapping work?
Or where are the challenges if I just wanted to say, I don't need operator lib.
I have Visual Studio code and I can write go code.
I just want to go make my own operator.
What are the challenges that have?
Evan: There is definitely a lot of cogeneration that you need to do for writing an operator or any controller for Kubernetes and SDK and Kubebuilder both help with those tasks.
You can absolutely do it yourself and there's nothing wrong with that, but it can be overwhelming to maintain without some help.
Marc: Yeah. I think a lot of folks have written operators.
I've written a few operators too and there's also, I don't know, day two day challenges of writing an operator where you want to move from V1 beta 1, to V1 beta2 or something like this.
And there's some kind of crazy web hook stuff that happens inside Kubernetes in order to handle that transition.
Does Operator SDK help me kind of bootstrap that when I want to transition?
Evan: It does. It has tools to bootstrap conversion of web hooks and anything related to that.
That's also a good intersection between SDK and OLM because on the runtime side, OLM will do a bunch of checks to make sure that the updates you're applying to a COD are safe with respect to the data that's currently in the cluster.
Marc: And then let's transition to the other side.
I want to run an OLM, can you help define concretely what OLM is?
Is it a standard that an operator should conform to?
Evan: It's more or less a packaging and distribution and dependency resolution system.
Yes, there is a particular format that you need to put your operator manifest into in order to teach OLM about it.
Some of those restrictions we're looking at removing, but it allows you to define a package for your operator and then apply a lot of the concepts that you know from other package managers, your after yum or PIP, those things to operators themselves. You can define properties of operators, then you depend on operators, OLM will resolve those at runtime.
We also bring in update channels that you might be familiar with from OpenShift, Tectonic or even Google Chrome.
All of that stuff is what OLM does.
Since you asked for specifics, OLM is a couple of operators that you run on cluster and it provides APIs and you interact with those APIs to do installation and updates of operators.
Marc: That's great. We went really, really deep into the weeds there of what an operator is and how they move.
And let's kind of take it back a little bit, I think, and talk about the origins of the Operator Framework project.
CoreOS before joining Red Hat coined the term operator.
I remember a really good blog post by the team there, describing this concept originally, what was the inspiration for actually creating the Operator Framework project, not just the concept at CoreOS?
Evan: The goal with Operator Framework originally at CoreOS was to build an ecosystem on top of Tectonics, the Kubernetes distribution, that we could take partner products and give you an experience very similar to an AWS console or a GCP console with these services running in your cluster off the shelf services--
But you could run them on prem or in the cloud or wherever you wanted rather than being tied to AWS for your database service, for example.
The goal was to give you essentially a programmable infrastructure component to extend with just the pieces that you care about.
And then the other side is having that software component there gives us the ability to have people operators who know how to run the software and code their knowledge into the software itself and that's distributed to the customer on prem.
They get all of that as well.
Marc: You describe something that's interesting and at Replicated I spend a lot of time thinking about this problem too.
And that's when you have two different parties, the person who's writing the code and delivering it and the third party is going to run and manage and operate that software.
And you described that as the goal for Tectonic and operators and everything.
When you started off, was it specifically for that?
Or did you also think about, hey, first party use and I'm going to just run my own software, but operators have value there.
Evan: Yeah, that was absolutely part of it too.
Like I said, when we started working on this project, we were all from the Quay team and so we all had experience running Quay and the idea is--
Quay started out, it's a Python application, it's a container registry, but as you run it, as you're in SRE for it, you discover the ways that you want to monitor it, the ways you want to maintain it over time.
And wouldn't it be nice if I could take all that stuff, all that knowledge that I know and teach the AWS console about it?
Now I have my Quay dashboard instead of an EC2 dashboard, for example.
That's kind of the user side of it, I guess.
Marc: Great. And if I do that, you ship it as an operator and I have this console, let's dive into more of the details.
What are the other benefits that I get out of shipping my code as an operator versus shipping it as, I don't know, say a Helm chart?
Evan: Right. The benefit for an operator and certainly you can ship an operator in a Helm chart.
But the benefit of writing an operator in the first place is having the act of reconciliation against the current state of the cluster and coding decisions around that.
Knowing that when you start to hit a certain limit in one access, you need to make some change in the cluster or request more resources from the infrastructure or something like that.
That's the goal there. Like I said, it's encoding of SRE operational tasks as much as possible.
Marc: Great. Yeah, that makes sense.
Operators aren't necessarily just a packaging format but they're a runtime and it leverages the fact that Kubernetes API servers just constantly running 24/7.
Evan: Exactly. And Kubernetes gives a strong interface to do this.
You can write an operator that targets any infrastructure.
The nice thing about using Kubernetes as the target is that it has all this great infrastructure and a common language to talk about all these problems.
Marc: Yeah. I think we talk about that on a few other episodes here too.
And I think there's all kinds of value in Kubernetes, but this common API is absolutely massively beneficial.
And it turns out that that's the thing that we've been missing in this industry for so long and Kubernetes is solving that.
Evan: Yeah, absolutely.
Marc: You mentioned Kubebuilder as another operator, bootstrapping scaffolding tool.
What other tools exist to help me manage and deploy operators in my cluster today?
Evan: That's a good question. I'm not sure.
There's similar ecosystems, I guess, that you get.
There's an ecosystem around Helm, there's an ecosystem around Crossplane, there's an ecosystem around Flux and Carvel is another one that are all kind of around getting workloads in one way or another into a cluster.
And a lot of them have a focus on operational tasks as well. I don't know if we want to dig into what the differences between all those projects are.
I'm not sure I'm an expert enough on all of them to do that justice.
Marc: That's totally fair.
There are so many things happening in this ecosystem right now.
I'd love to hear, I think, more where you'd would be the expert in how Operator Framework differs.
Where would I want to choose Operator Framework versus something else in the ecosystem?
Or where does Operator Framework shine?
Evan: Yeah. I think Operator Framework shines particularly in our update and dependency resolution story.
I think that's where we're different in a lot of ways from other ecosystems.
There are certainly other ecosystems that have support for dependencies in saying you can say this Helm chart depends on another Helm chart, but it's more like vendoring and another component to yours so that you install both at the same time.
What we do is we have a packaging system that's much more similar to a traditional package manager that has, there's a set solver there.
You can depend on arbitrary properties of other packages in the ecosystem and potentially we're not there yet, but potentially other packages from other ecosystems that we could understand.
There might be a world where you install an operator from one ecosystem and depend on an operator or just an application from another ecosystem.
Marc: Yeah, that's actually really cool.
Especially when you get into third party software delivery where it could be some application that depends on databases or stateful components that can be fulfilled through various operators and--
As a cluster manager or cluster operator I can choose which of those and how I want to actually bring that dependency in.
Evan: Yeah, absolutely.
That's part of the original vision that I think is still a little ways off.
There's some challenges around even specifying that type of dependency.
I need to be able to write something to a duct type API.
I know there are some solutions in some places that do that, but actually asking an author to write down here's the specific fields of a particular API that I'm going to use, that's a challenging social problem.
Marc: Right. One question we just love to like dig into a little bit is just the tech stack that's around it.
And I know as an umbrella project that may have different tech stacks, but yeah, what is the tech stack that the different projects in Operator Framework are using?
Evan: Almost everything is written and Go.
There is a heavy reliance on upstream projects so we've used Controller-runtime and Kubebuilder.
As I mentioned, we also use client-go for most of the OLM operators.
We started looking at a couple of other CNCF projects.
Other projects, one of which has CNCF.
We started looking at Rego from OPA as potentially being a data store for some of the things that we do.
And we also started looking at Cuelang, which has been making some inroads in various Kube communities, but it's a great configuration language with a lot of nice properties that could help with a lot of our validation tooling.
Marc: That's great.
If Operator SDK and most of the framework is all written in Go, does that mean if I want to use the tools I have to write my operator and Go also?
Evan: No, you don't. The packaging side is agnostic to whatever it's written in.
The SDK side is supporting a small, but growing number of ways to write operators.
There's some work going on right now to support scaffolding job operators.
Then there's also a couple of ways that you can take existing applications and sort of make very simple operators around them. You can take a Helm chart and turn it into an operator so that, for example, an instance of a CR I guess, corresponds to some instantiation of the Helm chart in the cluster.
That's a very simple lifecycle but it for some applications that's really useful.
And then another is an Ansible playbook based one where you can load Ansible playbooks into this Operator SDK framework and you get operators to pop out on the other side.
Marc: That's cool. Those operators handle generally the OLM side of it, then I assume, installation and upgrading of it, but there's no magic that's happening that's going to give day two operations of that, whatever the database is or whatever the actual software was.
Is that really just around a path to allow that piece of software to really build a fully functional operator with deeper roots into the life cycle?
Evan: Yeah, that's the idea.
The idea is, let's say you have some existing piece of software and you're looking to write an operator for it.
You might start just by taking its manifested Helm chart and turning that into an operator that knows how to stamp it out.
And if you're comfortable with the Ansible already, if that's something you do as part of your normal product lifecycle, you might be comfortable writing an Ansible playbook to do more lifecycle things that are specific to your application.
Any backup and restore or migration of something.
And then the idea is that as you become more comfortable with the ecosystem, you can sort of fit in wherever you are and if you are more and more comfortable with Kubernetes, maybe you'll start writing an operator in Go, if you have the need to, there might not be a need to.
Marc: Got it.
A lot of projects that we use, like CNCF projects are they're built and I execute and I run them.
And so the team that's building and creating that project is responsible for an implementation of it.
And there's all kinds of technical challenges there.
But it sounds like you had a lot of challenges with Operator Framework because it's not just an implementation, but you kind of define the concept.
And you had to create the market and the opportunity to say, "Operators are a thing and we have to convince you that this is a good pattern to follow. And now we also have an implementation of it."
When you were doing that, that probably created all kinds of new and unpredictable technical challenges along the way.
You've been doing it for a while now, I'd love to hear any stories or any specific challenges that you can remember.
Evan: Yeah, absolutely.
I guess one specific one that has only recently been resolved is around update safety for operators.
There's a certain set of operators that might need to do some work before they're safe to update or after they've updated before they're safe to be updated again.
And so the original model that we started with was we'd take the pod health as a corollary for whether the operator is running successfully.
But what was happening is we had operators that had some amount of work to do before they were really ready and they didn't want to report healthy in their pod status because they knew that marked the overall operator as healthy and that would potentially enable another update to happen.
And so just recently we introduced some features that allow operators to reach out back out and tell, only in another way that, hey, the pods are healthy.
I'm still doing a bunch of work though so I'm not in an upgradable state.
And that's really something that it seems obvious in hindsight, but we only discovered it through people actually building operators and using the upgrade features and attempting to maintain these things over time.
Marc: Is a lot of that, the need for that driven by the fact that common uses of operators are a stateful services, services that need to be up with highest SLAs as many nines of uptime as possible.
Evan: Yeah, Absolutely. Absolutely it is.
A lot of internal services at Red Hat run the operators right now.
Marc: Cool. Operator Framework is currently an incubating project.
What's next on the roadmap? What's the team working on right now?
Evan: Yeah. We're looking at, I kind of alluded to this before, but we're looking at kind of separating some of the core competencies that we have.
The safety around operator updates and installation is sort of one aspect of it.
We've also, we've solved all of these other packaging and distribution problems and we're starting to get requests of pulling in other ecosystem content as well.
One of the things we're looking at and we've been having some discussions in the working groups and also in some of the upstream communities cluster add ons, we're looking to make some of this more general so that if you can imagine, I like to use the analogy for persistent volume claims where you have multiple provisioners that know about specific types of content.
OLM operators would be one type of content that it understands, but you can write a provision for other types of content and pull some of that in.
And so we're looking at what it takes to sort of expand the ecosystem to meet some of the user needs that we're seeing.
And that looks like that's going to include some new APIs that are more generic and potentially more in line with Kube patterns that everyone's familiar with like persistent volumes.
Marc: That's pretty cool.
I want to go back and kind of circle back and talk more about OLM.
If I want to have my operator, if I'm an operator developer, I'm going to ship something and I want it to be compatible with OLM, how do I do that?
Evan: Yeah.
It pretty much all happens through a single API, the cluster service version, which embeds the deployment spec for the operator itself.
It also defines some of the RBAC needs for the operator.
You take that manifest and you drop it into an image that we call a bundle image and you push that to a registry somewhere.
And then once that's available in a registry, you make a reference to it in what we call an index image.
Index image is basically just a list of bundles that we can find on cluster.
And then the index image is added to whatever cluster you're using.
That tells OLM where to find all this content, what bundles are available to it for installation and then when you make a subscription, which the idea behind the API is that, so you're subscribing an operator to updates either automatically or manually.
And that tells OLM to look at these indexes you've made to find operators that match the constraints you've given and install them and keep them up to date.
And so one side of it keeps checking for updates and installing them as they're found.
The other side takes that API, that cluster service version and instantiates that as a bunch of resources on the cluster that actually make up the operator.
Marc: Okay. And are there requirements like, oh, this works in OpenShift and only OpenShift or if I just have a GKE or any cluster out there, will I be able to use OLM?
Evan: Yeah. No, you can use OLM in any Kubernetes cluster.
There are some OpenShift specific features that happen automatically if you install it on OpenShift, it comes in OpenShift, but they're not things that you'd miss if you weren't running on OpenShift.
It's about integrating with the OpenShift update mechanism in addition.
It's everything that works on OpenShift that's not OpenShift specific works fine on upstream Kube.
Marc: Okay, great. Let's talk about an example of an NCD operator.
You brought that one up earlier.
If I write an NCD operator, that operator will deploy NCD and then if I upgrade the operator, I don't have to worry about upgrading those instances of NCD because that's what the operator does.
The operator keeps NCD updated. And OLM basically does the same thing then for the operator itself?
I kind of can now manage the state and the installation and the versions and everything of the operators themselves, which in turn manage the NCD, in this case?
Evan: Yeah. That's more or less how it works.
I guess the caveat I draw is that you don't necessarily want an update of the operator to automatically update those NCD clusters because you have applications that are using that NCD API and if there's still breaking change there, that might be a thing that you want to have control over.
Those lower level rollout decisions are up to the operator author to decide what's best for the software it's managing.
Marc: Got it.
And so does Operator Framework include some kind of best practices, even if it's a docs or a reference implementation or anything like that around how I can handle that?
And maybe it's like, oh, this NCD operator's deploying multiple instances of NCD and some of them should be upgraded and some shouldn't.
How do I, as a developer and how do you as the end user control that?
Is that in scope of the project?
Evan: It's absolutely in scope of the project.
We have some documents right now on the Operator Framework website, but that's hitting on exactly some of the work that we're looking at doing in the near future for SDK.
It's what are the patterns that you want to be using as an operator author to manage your software?
And what should you be using? And what are the best practices?
And also here's a library to do that for you, if you happen to be using SDK.
Marc: Thanks. That actually helps clear it up.
How do I get started today with Operator Framework?
What's the easiest for me, if I'm a developer and I'm thinking about writing an operator, what would you recommend?
Evan: I definitely recommend starting at the SDK website.
I think it's sdk.operatorframework.io, but if you go to operatorframework.io, you'll get a link to it.
And going through some of the examples there, that also heavily links out to the Kubebuilder docs, which are also excellent.
If you're just getting up and running and want to get a general sense of what operators are doing and how they work, I think those are great resources.
Marc: There's also OperatorHub, which is a registry for Kubernetes operators.
Is that related to Operator Framework? Or how does that fit in?
Evan: It is absolutely related.
It's a little bit tricky because while OLM and SDK are part of CNCF, OperatorHub is not and some of that's sort of historical and political.
When we were looking at donating Operator Framework, the artifact hub had already begun.
I think it was called CNCF Hub at the time. Development had already started on that, but just wasn't public and we didn't know about it.
As part of a compromise to contribute the project, we omitted OperatorHub from the CNCF donation.
All the operators that you see on OperatorHub are operators that you can install via OLM on any cluster.
Marc: But you can install them via OLM, but that doesn't mean they necessarily had to have been written in Operator SDK or anything like that.
If I've crafted a operator the hard way, as long as I can use OLM, I can still push it up to OperatorHub?
Evan: Yep. Yeah. The only requirement is the packaging part.
Marc: Got it. As the project's continuing to mature, Operator Framework, what type of feedback is really most useful for you right now?
Is it on the developer and the packager side?
The folks who are actually running operators?
A little bit of both? Where are you looking for feedback right now?
Evan: All of the above.
Any feedback from any of the sides because there's several different personas.
There's consumers of the operators, there's authors of the operators.
We also have a persona that's basically a person who maintains an index of operators for their internal cluster.
Anyone who's in any of those roles and has any issue or questions or comments, those are really valuable to us.
And then on the author side, of course, anyone who's developing operators and has ideas of things that could be easier or patterns that they'd like to see more off the shelf solutions for, all of that is stuff we'd love to hear.
Marc: Is there a regular community meeting or something that you would recommend everybody attend?
Evan: Yes. Depending on which side you're looking at there, OLM working group meetings, bi-weekly and there are SDK working group meetings biweekly.
I think they're on off weeks, but I'll have to find the--
Marc: We can link to schedule in the show notes here, too.
As you're right now in incubation project, the next step is obviously graduation at some point.
Have you guys given any thought to what the target is, either timeline or kind of goals on the technical or adoption side or anything like that around the project before applying for graduation?
Evan: Yes, we definitely have some technical goals that we're trying to hit before that.
Some of them are those newer APIs that I alluded to, some of it is also just modularity and componentization.
Right now, you only kind of get it all by buying into OLM.
And so there's some things that are not as easy as they should be.
It's harder to use this in a GitOps workflow or at least if you're using it in a GitOps workflow, you have to do it in a very particular way.
And so we want to split up the components so that you can fit it into the workflow that you have rather than fitting your workflow into what Operator Frameworks workflow is.
And then, like I said, we're looking at addressing some really long standing issues with some of our APIs by introducing newer APIs that address them.
Once we do that and then we're hoping that we also get a lot of community feedback and involvement, we'd like to see contributors from other companies and things like that before we apply for graduation.
Marc: Okay. Do you have a contributing list right now so you know some of the folks who are contributing?
Evan: Yeah. The majority of the contributors are Red Hat and IBM and there are a couple more here and there, but I think that's the majority of the contributions.
Marc: Great. I'd love to talk for just a little bit about your job, Evan.
You are principal engineer at Red Hat.
Do you get to work on Operator Framework full time?
Or what do you do on a day to day basis there?
Evan: Yeah, I am a, this is very internal Red Hat stuff, but I'm a pillar lead in the add ons pillar for OpenShift.
And so the add ons pillar is all about making sure that layered products run successfully and sell successfully on top of OpenShift.
And our strategy around layered products is essentially operators because all the benefits we talked about, why you'd want an operator, that's also a benefit for shipping software to customers.
And so I do spend a good chunk of my focus on Operator Framework as a part of that.
I tend to stay out of day to day feature shipping, though I do a lot of code review and a lot of design review and things like that for Operator Framework.
Marc: You spend a lot of your time leading the architecture direction of the project or just operators in general.
How do you split your time there?
Evan: As needed with whatever needs focus at the time.
One of the things we're looking at now for example, is Hypershift.
You might start to see some stuff about Hypershift around GitHub, and that's basically just a way of running lots of OpenShift's control planes on a single management cluster.
Marc: Great. Let's actually dive into that a little bit.
It's not something that I've ever really explored in detail.
You said it allows me to run multiple control planes?
Evan: Yeah, it does.
The Hypershift project is you create one cluster of management nodes and then basically you make as many namespaces as you want.
In each namespace you can install the control plane for OpenShift.
And so all that does is it provisions its own set of guests nodes.
You have each set of nodes, worker nodes, for each cluster is actually a separate node.
You don't share worker nodes so it's different from some of the other multi-tenancy solutions being discussed upstream, but you can pack a bunch of control planes all on one management cluster.
It makes it really easy to spin up and fast to spin up new clusters.
Marc: That sounds super interesting.
I'm looking at the Hypershift repo right now.
And I think there's a whole world of multi-tenancy in Kubernetes, there's a lot of future in unknown and like great stuff happening here.
How long has this project been around for?
Evan: That is a good question.
It's I want to say a little under a year, but under different forms and different names.
I know that some of this, a form of this has been used to provide OpenShift clusters in IBM cloud for a while.
Marc: Got it. Yeah, it looks super cool.
I think it's a little off topic of Operator Framework because it's a little more OpenShift specific, but I think it's relevant and interesting.
And I think that's the goal of the podcast is just to talk about the Kubernetes and CNCF ecosystem.
Evan: Yeah.
And one of the things that is very operator specific and potentially big impact on Operator Framework in the future is that we might want to start thinking about operators that know about more than one Kube API server.
There could be a lot of benefits in architecting your internal infrastructure to have a management cluster and guest clusters, much in the same way we have Kube system namespace and username spaces so that some infrastructure owner can install and run operators in the management cluster that provide the service to the guest cluster.
That's an interesting line of, I guess, research that we should be thinking more about.
Marc: Yeah. And I know it's really, really technical and into the implementation details, but I actually want to go and spend a minute chatting about that.
A challenge of operators today is that they're cluster wide.
I can't install an operator in a namespace.
I need to install the CRD so I need cluster admin level permissions to get the operator installed. Do you know of any plans or any future work around being able to limit the scope and being able to install an operator into a narrower scope, like a namespace?
Evan: Yeah, this is actually a huge topic for us in Operator Framework right now.
OLM actually started with the model that all operators would be namespace scoped and only have access to one namespace.
And this was done at the time that a third party resource or a CRD, all it did was register a global name in the cluster so there's really no risk of any problematic interactions between operators and different namespaces.
That of course has changed.
And in addition to that, there are security issues with running your operator in the same namespace as the workloads if you need some distinction between levels of access within that, because anyone who can run a pod, can now get the operator's permissions.
All that said, we've been working with the existing Operator Framework community to sort of move over towards basically acknowledging the fact that all of these operators are cluster scoped and there's really nothing you can do about it because they use CRDs.
I think that the Hypershift approach is a really interesting one.
It doesn't really give you namespaced operators, but what it does give you is the ability to treat a cluster control plane far more ephemerally like you would a namespace.
And that's also kind of some of the goals of several projects in the multi-tenancy working group.
All of which I think are interesting, but none of which seemed like they're going to be in tree anytime soon.
For now I think we're kind of setting our eyes on the Hypershift approach and we would love there to be some great upstream solutions that solve this problem, but I don't think there are any on the near term horizon at least.
Marc: And does the Hypershift approach, is that all kind of, we can check out the repo it's a public repo, but is it all custom proprietary implementations of it?
Or is it using technology such as like the Kubernetes API server aggregation or the ability to built in Kubernetes, Kubernetes type stuff?
Evan: Yeah. Everything is open.
There's nothing proprietary and it's all a derivative of OCP, which is also all open source.
I don't know that I know the product plans, will there be a unsupported version of Hypershift, an equivalent to OKD which is the sort of upstream of OCP?
I don't know. There's no special sauce. There's nothing proprietary.
Marc: Yeah. That's super cool.
I think OpenShift is often looked at as the enterprise distribution of Kubernetes, but there's definitely some really cool stuff happening there.
And it's driving, obviously like you just showed, it's driving a lot of the operator technology too and the architecture and the roadmap there.
Evan: Yeah.
And like I said, I would love if there was some upstream solution that solves similar problems, but right now it looks like everyone is providing multi-tenancy by forking.
And from what I've seen of the SIGs related to this, it doesn't look like much of that would be accepted as patches anytime soon.
Marc: Great. When we think about app delivery, kind of move the conversation up a little bit here, when we talk about app delivery, operators are amazing.
And we talked about where you think the roadmap of operators is going to go, but what do you see far out in the future of Kubernetes that you see sometimes folks talking about in these SIG meetings or in the community meetings and you're like, yeah, that's actually going to just change the way that we're doing app delivery specifically on Kubernetes?
Is there anything that you've seen around that?
Evan: Well, I don't know if this is representing other opinions or not, but one of the things that it seems like we're headed to overall is being able to treat the entire control plane and set of content that we deliver more like a container.
I haven't seen anyone explicitly talking about this anywhere, but to me it seems like we're on the path towards, whatever the cluster level equivalent to a docker file would be.
These are the APIs I want available, these are the applications I want running, go make it happen for me.
That to me is it's at least an interesting line of thinking, given the current state of the world and Kubernetes architecture.
Marc: Interesting. How does that compare to GitOps?
Because in theory, I could sort of get that today if I set up a cluster and put a GitOps operator inside it and then I could just say, "Okay, here's some CRDs I'm going to put her into the git repo and then they're going to deploy."
How does what you just described differ from that?
Evan: It's fairly similar. I think the thing that is missing is that right now, you're kind of on your own to stand up that cluster.
You need new nodes, you need new control plane pods, you need networking and all that.
I don't know if it makes sense to even consider that as part of Kubernetes proper, but it seems like there's maybe some layering that could happen here so that that's not as much of an issue.
Marc: Oh, that's cool.
And there's definitely some cool other work happening around cluster API that we could probably leverage to get there.
Evan: Yeah, absolutely.
Marc: Evan, another question I'd love to dive more into is the community involvement.
Red Hat's a big company and I'd love to hear what it's like working at Red Hat, but focus completely on open source.
What the challenges are, what the fun is and what the opportunity is there.
Evan: Yeah, that that's been an interesting process specifically for Operator Framework because Operator Framework started out as a closed source proprietary project in CoreOS.
We went through the process of open sourcing and we've only recently managed to completely decouple the upstream from the downstream.
We're now in a position to accept contributions from anyone at any time.
There used to be some restrictions about when things could merge.
Those are gone now and we're really shifting focus to make sure that the development and the design all happens upstream first so anyone can come with a suggestion or proposal.
It gets discussed as part of an enhancement that's modeled after the KEP process in Kubernetes.
That's the same for whether the need sourced internally or this need sourced in the community.
Everyone goes through that process. It receives reviews and discussion and gets scheduled for work by the team.
And we basically treat the set of things that the Red Hat team works on as source from both.
But of course, anyone who's willing can pick up any work that's in the project and contribute.
Marc: Were you involved in the process from taking it from a proprietary project to open sourcing it and then ultimately applying and putting it in the CNCF?
Evan: I was involved in some of the open sourcing, although that much more than clicking make public on GitHub.
I was less involved in the CNCF application process, that was mostly Erin Boyd and Diane Mueller who handled a lot of the community things for Red Hat.
Marc: Yeah.
I think there's a lot of people that I suspect have projects that they want to make public, but there's, as a developer, they're like, oh, I'm going to clean this little bit up before I do.
This part's a little embarrassing.
Do you have any advice to somebody who has a project that hopefully in a couple of weeks or a couple of months, they're going to make public?
Evan: None of my personal projects have ever really taken off.
I'll say that as a caveat that this is not necessarily advice that you want for me, but I tend to just default to everything being open.
If I've been hacking on something, I'll just throw it in and GitHub repo so that I can share it with folks I think might be interested and I don't really worry about whether the code looks perfect or the modules or interfaces are exactly what I think they should be to start with.
I think early feedback is better.
Marc: Yeah. I completely agree.
It's like, look, software's hard and building it out in the open and getting feedback is so much better than waiting for it to be perfect.
Evan: Absolutely. Yeah.
Marc: Cool. Well, I'm really excited about the future of operators.
We use them on a regular basis.
I think it's leveraging the Kubernetes API server to perform these tasks in an automation and OLM sounds like a great wrapper around it that I think we're going to see more and more of.
It sounds like some cool stuff coming.
Well we'll include some links to the project and everything else in the show notes here and I appreciate you joining.
Evan: Thanks for having me on.
I really appreciate the discussion. It's been a great time.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
Jamstack Radio Ep. #129, Standardizing Orchestration with Surya Oruganti of Argonaut
In episode 129 of Jamstack Radio, Brian speaks with Surya Oruganti of Argonaut. This talk explores the complexities faced when...
Jamstack Radio Ep. #125, Life After Cold Starts with Matt Butcher of Fermyon
In episode 125 of Jamstack Radio, Brian speaks with Matt Butcher of Fermyon. Together they explore cloud computing, the evolution...
Jamstack Radio Ep. #121, Reliable Serverless Functions with Tony Holdstock-Brown of Inngest
In episode 121 of Jamstack Radio, Brian speaks with Tony Holdstock-Brown of Inngest. This conversation explores infrastructure...