OCT 28, 2020

45 MIN

Ep. #5, Flux with Michael Bridgen of Weaveworks

GuestsMichael Bridgen

light mode

about the episode

In episode 5 of The Kubelist Podcast, Marc Campbell is joined by Michael Bridgen of Weaveworks. They unpack Flux, the GitOps operator for Kubernetes, as well as GitOps adoption and progressive delivery.

about the guests

Michael Bridgen is a principal engineer of Weaveworks, and the co-creator of Flux, the GitOps operator for Kubernetes.

show notes

about the episode

about the guests

show notes

transcript

Marc Campbell: I am here today with Michael Bridgen, a principal engineer at Weaveworks.

Michael is the co-inventor of Flux, and we're going to talk about both the origins of the project and the future. Welcome, Michael.

Michael Bridgen: Pleasure to be here.

Marc: To get us started, can you explain what Flux is for somebody who might not be familiar?

Michael: Of course.

We call Flux "The GitOps operator for Kubernetes."

That really just defers the explanation.

The idea of Flux is that you can keep your definitions for Kubernetes in Git and it will apply them automatically to a cluster.

If you think of Kubernetes as being a homeostasis maintaining system, it takes definitions and then tries to keep that as the state of the system.

Then Flux extends that to definitions in Git, so you have an equation al system where what you merge into Git equals, for some definition of "Equals," what is running in the cluster.

Marc: You've been working at Weaveworks for quite a while, can you talk about the origins of the project and the timeline of it?

Michael: Yeah, of course. Our first focus at Weaveworks, even before it was named Weaveworks, was an open source container networking technology called Weave.

Originally called Weave, now called Weavenet.

But after we had got somewhere with that we were looking for other problems to solve in the container space, and one thing that was obviously going to be a problem was being able to deploy new versions of things.

Continuous deployment or continuous delivery.

So Alexis, who is the CEO of Weaveworks, thought that perhaps the product direction might be trying to gather continuous delivery.

Our first attempt at that was to make a user LAN network proxy that would redirect traffic to different versions in some proportion.

So, you're rolling out a new version of something it then could start at zero traffic going to that new version. Then progressively, if things were going well, it would direct more and more traffic until you hit 100%. This is now something that service meshes do.

But at the time we decided that no one would ever devote the resources to run Userland proxy for all their containers so we actually ended up abandoning that .

The alternative histories write themselves there, so instead we concentrated on just being able to automate the rolling out of new image versions.

At the time we were developing Weavecloud, so we made something that was useful for ourselves for running Weavecloud.

To start off with, what it would do was it-- This is before Kubernetes had deployments, and we had replication controllers.

What it would do was run the deployment of the new image, change the replication controller definition, and then if it was successful it would check that into--

Commit that back to Git, where we kept all our config.

Then as time went on we figured out there were other things that we might want to change in Git, and eventually we came to understand that the important part of what Flux was doing was not actually rolling out the new image, it was the applying what's in Git.

We turned it around 180 degrees and made it the case that it would apply whatever is in Git, and then sometimes when there was a new image it would change what's in Git.

That really changed our understanding of what Flux was for, and that was about the point that we came up with the term "GitOps," which became a bigger thing and got slightly out of our control.

Marc: That's interesting.

I don't know if everybody's aware of that, but Weaveworks actually did create and coin the term "GitOps"to describe this process, and GitOps means a lot to different people.

It can mean push vs. pull, it can have-- Flux is one implementation of GitOps, but I'd love to--

As one of the original co-investors of Flux, the original GitOps operator, I'd love to hear your definition of what is GitOps and what the alternatives to GitOps are in general?

Michael: I think it's a big tent now, but originally the formulation had the idea of pulling from Git and applying things.

That was just one line item, if you like, and then there were some other ones about observability and diffing things.

Those got subsumed into the one thing, which is essentially you have a declaration of the desired state of your system, whether it's Kubernetes or something else, and you automatically apply that definition or maintain that definition.

The idea being that then your workflow becomes largely about doing things with Git, so you can then use things like PRs, merging conflict resolution, and all that stuff that engineers are familiar with from developing software.

You can use that to also drive your runtime cluster state, or system state.

So that-- Or, perhaps even more briefly, the thing I said before about having an equation.

Effectively, "What is declared in Git is what is running" is, to me, the essence of GitOps.

Marc: So, the implementation details about whether there's a push vs. pull doesn't directly map into the requirements of GitOps, GitOps is-- The declarative state is declared in a Git repo, and it is then what's deployed into the cluster. That's GitOps, then?

Michael: Yeah, I would say so.

I think Kubernetes specifically makes this a lot easier because itself is this system where you give it the definitions and then it actively works to keep that as the state of the system, a nd because that is level trigged rather than edge triggered, it suggests that the correct way to extend that to Git is for it to be level trigged rather than edge trigged.

What I mean by that is it will continually work to make sure those definitions are kept up, rather than just reacting to changes as they come in.

So what you find to make a contrast, what you find with lots of continuous delivery products from 10 years ago or something, is that they're driven by changes in the system of record.

If that's Git, then a commit or a push goes in and then that push triggers something, some action.

One difficulty with that is that if that event gets lost, then your state is now out of sync with your system of record.

Another one is that you can't have compensating actions, so if you change something directly in the system then that might not get corrected ever, or it might get corrected that the next time that something happens.

So GitOps is maybe weighted towards a level triggered or pulling config in and applying it continuously, rather than having a pipeline that's driven by events, which is more of a push thing.

But I say "Weighted" rather than being an absolute requirement .

Marc: That makes sense, especially with Kubernetes being the desired state, and developers are pretty familiar with the concepts of Git and commits history and reverts, stuff like this.

That definitely makes it, in hindsight, it seems like an obvious connection there in GitOps works.

But I'm sure the actual mechanics of creating and realizing that that's the good thing took a lot of effort.

Michael: It did. In fact, it took a while to get used to the idea of "We are just going to apply whatever is in Git."

Like you say, hindsight says "That seems like a reasonably obvious idea."

But at the time, we were quite scared of switching that on.

Because it's like, "What if it deletes stuff?"

It took us a while to get from the stage that we had thought of it to implement it and then to actually switch that on for ourselves in production.

Marc: I'm actually curious about that, because I think that's something that we thought about too when we started using Flux.

A little bit of hesitation and concern, early days before it was a product and before you were recommending it to anybody, you had that concern.

Are there any stories, any battle scars you have from, "That that actually did happen, we had to recover from it, and now Flux is a better product because it avoids these cases?"

Michael: None that really stand out. In my mind, it was much less of a big deal that we thought it might be.

That's not to say that nothing happened. I'm just saying that nothing happened that scarred me so deeply that I still remember it to this day.

Marc: You talk too about pre-Flux being the GitOps operator for Kubernetes, you were looking at the Userland proxy and now you have Weaveworks as a separate product.

I want to really focus on Flux here, but it's probably worth a little bit of a diversion to talk about Flagger, if you're up for it.

Flagger does do some of the stuff that you described about those progressive rollouts in modern Kubernetes, and how do you see that integrating into the GitOps operator Flux and working in the whole ecosystem with service meshes right now?

Michael: Yeah, Flagger is really interesting because there is this gap in GitOps, if you like, or in declarative systems maybe more so.

Which is, "How do you describe ongoing processes ?"

Or, "To what extent you want to describe ongoing processes?" The one that Flagger concerns itself with is progressive delivery.

So, that thing I mentioned before about going from 0% of traffic to the new version to 100% of traffic to a new version, you can see that's an ongoing process that needs to be actively managed, and it's not something you can define statically, so much.

Then you have to decide, "OK. What do I define statically? What change triggers this? Do I need to make a change to the declaration that says 'Go to 10%, go to 20%' Or could that be automated?"

And Flagger, if I'm right, gives you a mix of both of those.

So you could say, "Just make as much progress as you want until you get to 100%."

Providing that tests pass or that whatever conditions you put on your system are still passing, or you can tell it, I think, to go to 50% and then you can check point and say "Right. We're at 50%."

So, that's the recovery position.

It doesn't do the bit that we initially set out to do happily, at least two or three groups of people went and solved the service mesh problem and did a lot better job than we ever were going to.

Because that's a really hard problem, so I'm glad someone else went and did that bit and we're just piggybacking on that, really.

Marc: Yeah, they are a really large problem. Lots of different standards and SMI, it's an interesting space.

Michael: Absolutely.

Marc: So Flux is an open source project and I can use it as an open source tool, completely disconnected and no commercial agreement with Weaveworks today, and have a full GitOps operator and pipeline running to deploy my stuff into Kubernetes.

I'd like to understand a little bit more about the thought process that Weaveworks had as a company.

You were around at that time, thinking about packaging it and making it available separate from the commercial offering, and the implications that has to Weaveworks revenue and business model versus pushing this new concept of GitOps out there.

Michael: The slightly glib answer, which is that if you write something for Kubernetes, in a way it can't really not be open source, because that's an expectation of the community and the users of Kubernetes.

There are counterexamples, of course, but for us at the time it certainly seemed like it would be more conducive to growth and getting people involved and interested in Flux for it to be open source.

That's true of things like Flagger as well, and Weavenet originally. That the expectation is that it's open source, so unless you have a really good reason and you think your business model is going to survive holding stuff back and making it paid for from the get go, then it ends up being open source and probably written in Go.

But at the time we were working on Weave Cloud and we wanted to develop the capacity in Weave Cloud for doing these deployments, which ended up being Deploy.

"Deploy" is just the internal name in the Weave Cloud app for essentially the bit that's powered by Flux and does the rollout of new images.

So, we do have a paid-for route to using Flux that comes with a user interface and some niceties for rolling something out here in your dev environment and then making the same change in your production environment, that sort of stuff.

Marc: I think the idea of defaulting to building stuff out in the open in Kubernetes is definitely true of the ecosystem right now, and then you actually took Flux from just being an open source project and donated it to the CNCF.

Currently, it's a sandbox project.

Can you talk a little bit about the thought process in the motivations behind turning it from just an unlicensed open source project that anybody could use to actually transferring ownership of the project into the CNCF Foundation?

Michael: I think it's a double edged blessing, if you like.

On the one hand, in some ways if you want to be a serious open source project then there has to be this idea that you come with some governance and a succession plan, and all these sorts of things.

Things that are not just about the code that are sitting in the repository, and donating it to a foundation is one way to force that, if you like.

If Weaveworks goes away or the maintainers leave Weaveworks or something, then Flux will continue.

There's some continuity there, so if you were a company that wants to either contribute or use Flux there's that extra level of comfort that you get from that.

The other edge, if you like, on the blessing is that Flux gets a certain amount of endorsement and a certain aura from being in CNCF.

If they consider us good enough to be in the sandbox, that's an endorsement of Flux, which again leads towards maybe more contributors and more companies using Flux.

Marc: Cool. Let's dive in for a minute and talk about some of the technical challenges and the implementation details of Flux.

So today, Kubernetes applications can be packaged in lots of different tools.

Helm, Customize, Ksonnet. There's a proliferation of different packaging formats and it's continuing to evolve.

I'm curious to understand a little bit more about the roadmap that Flux took in the early days, did you think about having to support all these different packaging formats?

Or was it just vanilla Kubernetes YAML?

When did you start adding additional packaging formats in?

Michael: Good question, there's definitely lots of technical detail in there.

To start with, a lot of those things didn't exist when we came up with Flux .

Customize, for instance, didn't exist at the time.

Lots of other stuff did, and we ourselves for Weavecloud first use case, if you like-- Or our first installation of Flux just used plain YAMLs.

So, our target for quite a long time was just plain YAMLs.

At some point Helm became a thing, as they say, and we resisted it for a while.

It was clearly going to be a big thing, but it meant extra work.

Eventually we settled on making a separate operator that would work in sympathy with Flux, which we called "The helm operator," imaginatively.

The idea was that it filled a gap that existed with Helm, which was the Helm is a very imperative tool.

You say, "Install this, upgrade this."

In some respects that doesn't really sit well with GitOps where you want to make the declaration of a fact.

So we designed it so that you could do exactly that, there's a Helm release custom resource that declares the fact of a Helm chart being installed into the cluster.

If it appears as a new one, then the operator says, "I better go and get that Helm chart and install it."

Or if you change the version but it's already installed and it knows it needs to upgrade it. So it fills this gap of turning something, which is quite imperative and driven from a command line, into something that is declarative and runs automatically.

That's how we dealt with Helm, which is one big use case.

With Customize, again, because it seemed like more work we resisted that for a while and we ended up coming up with a design.

We also wanted to support things like Ksonnet, so we came up with this design which was very generic, whereby it would support Customize but it would also support other means of essentially writing programs to generate ultimately Kubernetes YAMLs, which because it used a configuration file called .Flux.YAML, which came to be known as Flux.YAML.

But we figured out eventually-- Actually, pretty quickly, that it was way too generic and that the power you got from it was not worth the trade off of it being incredibly difficult to troubleshoot because you could just put anything in there.

So, for Flux version 2 we've backed away from that quite a lot. We're going back to-- Is it too early to bring up Flux version 2?

Marc: No, I think it's super interesting to talk about. I'd love to hear more.

Michael: Yeah, we've backed away from that stuff and we're just saying, "OK. How you get YAMLs is up to you, but by the time we see them we will deal with customizations and we will deal with Helm charts or Helm releases and that as before.

But other things, we're not going to deal with them. That's up to you to arrange."

So if you want JSON, then we'll tell you how to sort that out to do it in your CI, but it happens in your CI.

It's not something Flux would give you.

Marc: I think it's interesting, because there's a couple of reasons you may want those.

You might want Customize or Helm Chart because that's just the packaging format of it, but when we first adopted GitOps, one of the challenges that we had originally was thinking about how we're going to have this repo with the declarative state of how we want everything should be running, but we're targeting multiple environments.

So there's these little last mile per environment configuration, and I think Customize and Helm and Ksonnet, all these tools definitely enable the GitOps operator to simplify that so you don't have to maintain completely different copies of your declarative state for every little change.

Michael: That's right, and that's a really good motivation for us to support Customize and plain YAMLs rather than just plain YAMLs, which in some ways would also be a completely reasonable position to take.

But going that extra bit further to support Customize because of those use cases, I think it's worth it. It's a good trade off.

Marc: So, since you invented GitOps and have Flux out there, there's now other GitOps tools.

One that comes to mind is Argo CD. Argo CD is another GitOps operator, and I remember around a year or so ago there was a coordinated effort announced between Weave and Intuit around the Argo and Flux projects coming together and forming a common unified library that's going to drive the two .

I don't know what's coming of that or if you're still working together, but is there anything you can talk about that process to do that, the motivations behind it, and where that stands today?

Michael: Yeah. Last year we talked with the Argo team because there did seem to be a lot of commonality in what we were trying to achieve.

Argo CD was developed after Flux and learned a lot from where Flux had gone slightly astray in some ways, or at least didn't suit the Argo team use cases.

They brought in a lot of ideas of their own, which proved pretty good, so it seemed like a good idea to join forces.

We thought of an experiment to perform, which was to try and factor out this common engine for syncing, which got called the GitOps engine, and Alex at Intuit did--

In fact, the Argo team in general did a huge amount of work to factor that out of Argo CD, which shouldn't be underestimated.

It ended up, I think, actually improving Argo CD as well, which is good.

But then it turned out that had we incorporated that into Flux, the path forward from there wasn't that great for Flux. If we were to take Flux and Argo CD as being things that are addressing different use cases, we would either have to say Harris Flux V2, it's essentially Argo CD but single user or something . Or, we would effectively have to adapt Argo CD to be more like Flux.

At that point it seemed pretty clear, to us anyway, in Weaveworks that actually neither of those things was a great destiny for either Argo CD or Flux.

It would be better to almost double down on the things that made them different so they stood apart, and there's a certain set of principles that we didn't necessarily start with when we developed Flux but ended up adhering to.

Those things hadn't gone away, and there's an argument to be had that Argo CD adopts a different set of principles so you can't really merge those two things together without compromising one or other of those sets of principles.

I guess in the end, it turned out maybe the principles which make us different are worth keeping, and Argo CD can pursue things that they think are the right ideas and we can pursue the ones that we still think are the right ideas.

Marc: Today you mentioned the challenges with integrating them, one of the challenges would be the things that make Flux and Argo CD today different would be hard to keep different, and for somebody who is maybe looking at the two projects, can you help me understand what at a basic level or at a high level, some of those things are that make Argo and Flux different?

Michael: Yeah. Some of them come down to quite architectural or technical things, but for instance the way Argo CD works is how I was describing Flux V2 sticks very much to plain YAMLs or customizations. Argo CD has a different idea of this, where you have effectively a different control of each input source.

So if you have some JSON, then you can say to Argo CD, "This repro uses JSON."

Or, "This part of the repo uses JSON," and it will run JSON for you before it presents the output of that to be synched.

That's built into how the syncing engine works as well, to some extent.

That ended up being a thing we were backing away from, because we think that it's actually better to have the plain YAMLs available if you can, on your laptop, before they even make it to the automated system.

Because then you can see "What is the effect of this thing going to be?" Well before it happens.

Now actually, it turns out Argo CD handles that in a different way, which is that it gives you a preview of what's going to change and you can push the button to say, "Yes. That's fine, go ahead."

Which is a completely reasonable way of approaching that problem, it's just you can see there are two different principles at play.

One much more end user oriented, and one much more unattended automation oriented.

So that's one difference, and another one is that we wanted to work much more in sympathy with Kubernetes own systems.

For instance, ARBAC. With Argo CD there is an API to it and an application that uses that API, a GUI application.

Then because they have that API running in the cluster then they have a separate layer of permissions and so on, and we took it on ourselves to have the principle of not having a separate set of permissions.

Again, it's a completely reasonable position on either side, but it is worth maintaining that distinction and pursuing and saying, "This is our idea of what the right thing to do is so we're going to stick with that."

Marc: That actually makes a lot of sense.

I mean, depending on what your goals are and how important is full transparency in the YAML auditability of the YAML pre deployment versus post?

They're both valid solutions there.

Michael: Yeah, exactly. I think it comes from-- Argo CD I think came from way more of a position of the developers that we are serving with this.

They don't really care about the internal workings of what's going on in Kubernetes, they care about deploying their app.

So we're going to give them an interface to do that, they could describe their app in reasonably high level terms.

We help them onboard it and then then they get a nice interface for doing the things they care about, whereas Flux didn't start from that position.

It was very much, "We're just going to run something that automates a thing that otherwise we'd have to do manually."

So you can understand the two different positions lead to a different set of assumptions.

Marc: That also explains why some of the challenges with abstracting that out into the GitOps engine would be interesting.

But didn't Flux end up abstracting Git out into GitOps toolkit, though?

Which is the same concept, but very purpose built to solve the Flux problem in particular?

Michael: The GitOps toolkit comes out of recognition that we'd backed ourselves into a corner in a couple of places with Flux.

One of them is that the Flux runtime is a monolith, it does-- Although not a really huge one, but it does the image automation bit updating to new versions of images and rolling those out when they're available.

It does the syncing from Git, and therefore the pressure is to accrete things into that monolith when there's new requirements.

We recognize that actually that's not very sustainable, so we split out the home operator.

That was his own thing, fine, but that was an exception. The more things you add, the harder it becomes to add new things.

It ends up not being very sustainable, and so for the GitOps toolkit the idea was to break each individual thing down into a single controller that just deals with that one thing.

So there's one that does the syncing of customizations, there's one for dealing with Helm releases as before, and then the update automation is another set of controllers.

A couple of them that do just that thing, and therefore all these things can go at their own speed and develop features as needed.

But without making a big all of stuff, which is difficult to then modify afterward .

That was one pressure, another one was that Flux predated custom resources.

The way you set Flux running is you just make a deployment with a whole big list of arguments that tell it what to do, and it only operates on one Git repository.

For instance, in some ways that works really well. If you want to do a different Git repository, that's cool. Run another Flux.

But there's also lots of reasons why it's a nice idea to -- And this was us to some extent, following what Argo CD did to have custom resources which define the things you want to sync, so then you can have an arbitrary number of them for instance.

The other really nice outcome there is that you can put stuff back in the status of the custom resource.

So someone defines something and then they find out what actually happened, which is a really difficult thing to do with Flux V1.

You have to go looking through the logs to see what happened, a lot of the time .

Marc: That's interesting. Flux has been out for a while and it's one of the downsides of being so early in the Kubernetes ecosystem, is you mentioned earlier deployments didn't exist.

CRDs now you just mentioned didn't exist at the time, so you're held back a little bit as the ecosystem continues to mature.

You built it before these constructs were built into Kubernetes and you said all that was challenges with Flux 1.0, and now you're working on Flux 2.0. I'd love to hear more about that.

What are the biggest changes? What are the--? What is the difference between Flux 1 and Flux 2?

Michael: The big difference is, I guess you can sum it up as Flux V2 aims to do the same things but it's using the more modern, the more up to date-- 4 years, roughly, more up to date tooling and mechanisms.

Custom resources, we're using Kubebuilder to create the controllers.

That means there's a whole standard set up for things like metrics and how they work, local caching and blah blah.

All that low level mechanisms, and the point being that they all operate in roughly the same way with custom resources and they output the same metrics .

They're all quite standardized, and because of this modularity you can to some extent pick or choose them.

In Flux V1 if you don't care about the image automation bit of it, you just want to sync stuff from Git, you have to explicitly go and turn that off for backward compatibility reasons.

We can't switch it off and then make you switch it on, you have to explicitly say "Don't run that bit ."

Whereas in Flux V2, for instance, you just don't run that controller.

You only run the bits you want, and if you don't care about Helm you don't run that.

If you do, you do. It's a lot more modular.

One thing that I referred to before that perhaps deserves a bit more examination is a motivation for Flux V2 is that we felt Flux V2 had just got to a point where it was a big bowl of stuff that we couldn't really add to very easily. One motivation was to break that down, and that makes it easier to maintain and add stuff to Flux V2, but it also makes it a lot easier for people to come in and contribute. Architecturally, instead of having to know how the whole thing works and know exactly where to look, you can just choose the area that you care about.

I care about notifications, so there's a notifications controller and I can just go look at that.

But it also makes it easier for other people to integrate with.

For instance, there's this idea of some of the custom resources are about getting the source of something or acting as a source for something.

A Git repository or part of a repository or Helm chart, and if you want to do something with a Git repository which is not just syncing it to the cluster, you can use the source that's available in GitOps toolkit and then your own controller which does whatever that thing was.

So that, for instance, is how the automation works.

It reuses the Git repository custom resource, but then it just does its own thing with it rather than syncing it to the cluster, which is what the other controllers do.

Marc: That's great. I think thinking about making it easier for contributors to add stuff in without understanding as the project grows is a great motivation by itself.

What is the status of Flux 2.0 right now? Is it ready to run, or what's the timeline to get it to production ready?

Michael: When we were thinking about existing users migrating, we thought "There's no way we can just say there's a cutover day . Flux V1 ends, Flux V2 starts."

It just doesn't work like that. So, we figured out some milestones.

People use Flux in lots of different ways, but they broadly fall into two categories.

One is that they use it just for syncing, and the other is that they use it for syncing but they also use it for the updates automation.

The latter is obviously more work, so to get stuff available we set ourselves to the milestone of parity with the read-only or syncing only use case of Flux.

We're pretty much there with that, you can use it if you are prepared to figure some things out for yourself.

We need to write some guides, lots of installations will, we think, be pretty directly migrated from Flux V1 to Flux V2.

Maybe even to the extent that we can automate a bit of it and make some command line tooling for upgrading installations.

The further out milestone is for parity, s o also having update automation stuff, and t hat's additive.

So obviously it requires the syncing stuff, but it also requires the development and design of a whole bunch more stuff and we're not there yet.

We are hoping that somewhere around the end of the year , so our idea with those milestones and migration is that once we reach full parity then we have a window where people get the opportunity to move across.

We'll support them in doing that and maintain Flux V1 at least minimally during that period, and then we sunset Flux V1 once people have had a pretty decent window to move across.

Marc: That's great. This whole ecosystem in Flux 2, it's moving so fast.

You're looking at this timeline and it's three months from now hoping for something close to feature parity. It's not really that far away.

Michael: Yeah, it's not. Time flies.

Marc: So if I'm currently a Flux user, what could I do to help out? Is there certain--?

If I'm only doing that read-only sync, is there use cases you'd like to see or beta testers?

What's the best way to be involved in the project if somebody is not really, if they're already a Flux user but they're not looking to contribute code at this time?

Michael: You can definitely be an alpha/beta tester.

One thing that we really need, especially with the further out milestone of image update and stuff, is we often don't have much visibility into how people are using Flux because we just hear about the problems.

For instance, if we have to do things in a different way for Flux V2, which we do because it's got a different architecture, we don't necessarily know which things we can break compatibility with and which things are just super important and that everyone uses.

Even if you're not contributing code, then a massive contribution is to have a look at the designs.

There's a bunch of discussions about design of various bits and pieces that happen in the toolkit repo in Flux CD on GitHub.

That's where you might spot, for instance, "This design is going to shut me out because I won't be able to do X or Y that I do now. I need to bring this up so that the Flux team can figure out, 'Do we need to build that in or is it something else we can do?' Or whatever."

It's that kind of stuff that we don't have great visibility on unless people come in and actively tell us, and another thing you can do is just try it out.

If you've got a throwaway environment, then if your expectation is that you're going to migrate to Flux V2 then it is definitely at the point where you can come along now and follow the installed guide locally or in a throw away cluster somewhere.

Then that will give you some insight into what's going to change.

If you report back, then we'll get some insight into how people find that migration.

In reporting back, just even as simple as putting a comment on an issue or filing an issue, we need either the GitOps toolkit or Flux CD? Yes.

The discussions part of it, this is GitHub discussions, that's quite a heavyweight thing.

It's mostly reserved for-- Or, not reserved but mostly used for design discussion.

This feature or that feature, but issues is a decent place or just rock up to Slack.

We have a Flux channel in the CNCF S lack we keep tabs on, and anything that's like "I tried this out and I ran into this. What I expected was this."

That kind of information is really valuable to us.

Marc: I think writing a 2.0 of a product that has adoption and uses, the bar is high, and it sounds like you guys are taking your time to do a good job and making sure that you support the users, support the use cases.

Trying to build the architecture you want but being pragmatic about ensuring that everything still works.

Michael: Yeah, absolutely. It's a two point zero, so we are deliberately breaking backward compatibility. We can't do that lightly.

We have a responsibility to not just leave people out in the cold, so we're taking that very seriously.

Marc: So, Flux right now is a sandbox project and the CNCF ecosystem goes from sandbox to incubation up to a graduated project.

Have you given any thought to when you might consider applying for incubation status of the Flux project?

Michael: We'd like to do that soon, so I would say opinions will differ a bit but my position at least is that we should probably get some of the milestones done first.

Because I think doing it while we're in the position of "We're right in the middle of developing V2," at the very least that's a lot of things to think about at one time.

I would like to have something in the bank, if you like, with regard to Flux V2 . The other thing that I would like to see before we go for incubation, which actually we're making good incremental progress towards, is having more diversity in the contributors Flux V1.

We had lots and lots of contributions, i.e. PRs, from lots of different people.

There's pretty good diversity there, but the maintain is, broadly speaking, still for most of its lifetime people from Weaveworks. Flux V2 has started out on a better footing, we have a small number still but interest, and I think part of that is that people can-- As I was saying before, they can take an interest.

It's much more realistic to take an interest than just a narrow bit of it now with Flux V2.

So we're getting interest from different orgs and people in different orgs anyway, who just have their the bit that they're interested in working on and we can accommodate that now.

It was much more difficult before, so we're making incremental progress towards having a more diverse set of maintainers, which is cool.

It's really good to see. That was also an aim of breaking it up and making a fresh start on the architecture.

Marc: With a broken up architecture, if you haven't already, I'm sure you'll start to find just more unique use cases of the different components that you hadn't even thought about.

It creates new possibilities.

Michael: That's right.

There are-- People are often quite cagey and hold their cards close to the chest a lot of the time, but there's indications that people are reusing bits here and there so that they're interested in the source controller because they want to use it for their own purposes, which is cool.

But that's by design, perfect.

Marc: So we talked about if I'm on Flux 1.0,what I should think about when I'm either waiting for or helping with the upgrade to 2.0.

But what about an org who is running Kubernetes and they're not even using GitOps today, they're using traditional CI ops.

They're starting to explore GitOps, do you have any recommendations for safe ways to dip your toe in the water and start using Flux in a very incremental way?

Any best practices for getting started?

Michael: Yes, there are. There's lots of avenues into that, and that's the reason it's a tricky question.

If you are the kind of person that learns by doing, then it can be as simple as creating a cluster.

A mini Kube or a kind cluster, and then just follow the instructions on the Flux website or the Flux V2 website.

If you are a person that likes reading documentation, there's quite a lot there already for the GitOps toolkit, Flux V2.

If you are thinking more, "How do I introduce this into my organization?"

Then you might be interested, Weaveworks ran an event called GitOps Stays, which was an online conference which has some pretty cool speakers.

There's lots of talks worth going to look at, but also one outcome of that was exactly what it's called, a GitOps Handbook which is for people that are trying to figure out, "I think I like this idea and I've tried it out, but I'm not sure how to broach the topic with my operations team or with my boss."

So that handbook has practical experience from people who have done exactly those things, sometimes in quite big organizations.

Marc: Michael, thanks a lot for your time today and for all the work you've done at Weaveworks and co-inventing Flux and creating GitOps as pattern in Kubernetes.

I know we're big fans of GitOps and we see it as a really good way to deploy code.

It's been a platform shift, so thanks a lot for your time talking about it today.

I've learned a lot about Flux and I'm excited for the 2.0 release.

Michael: Yeah, absolutely. My pleasure.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Jun 20, 2025

Podcast

The Kubelist Podcast Ep. #47, SecureBuild with Grant Miller of Replicated

In episode 47 of The Kubelist Podcast, Marc and Benjie sit down with Grant Miller, Founder and CEO of Replicated. This talk...

Mar 27, 2025

Podcast

Open Source Ready Ep. #10, The Whirlwind Pace of AI with Taylor Dolezal

In episode 10 of Open Source Ready, Brian and John chat with Taylor Dolezal, former CNCF Head of Ecosystem and current Chief of...

Mar 5, 2025

Podcast

The Kubelist Podcast Ep. #46, Kubefirst with John Dietz of Konstruct

In episode 46 of The Kubelist Podcast, Marc and Benjie chat with John Dietz, CEO of Konstruct, about Kubefirst, an open source...