AUG 4, 2021

42 MIN

Ep. #18, Submariner with Miguel Ángel Ajo and Stephen Kitt of Red Hat

GuestsMiguel Ángel Ajo, Stephen Kitt

light mode

about the episode

In episode 18 of The Kubelist Podcast, Marc and Benjie speak with Miguel Ángel Ajo and Stephen Kitt of Red Hat. They discuss the CNCF sandbox project Submariner and how it’s solving multi-cluster networking.

about the guests

Miguel Ángel Ajo and Stephen Kitt are Senior Principal Software Engineers at Red Hat.

show notes

about the episode

about the guests

show notes

transcript

Marc Campbell: Hi, again, from The Kubelist Podcast.

Like we just said, Benjie and I are here today with Miguel and Stephen from Red Hat.

This is the core maintainer team for the Submariner project.

If you aren't yet familiar, Submariner is one of the newer Sandbox projects and we're going to spend the next little while talking all about it.

Welcome, everyone.

Miguel Angel Ajo: Hey. Hello. Thank you, Marc and Benjie, for having us.

Marc: Awesome.

So, before we dig in, I'd love to hear a little bit about each of your backgrounds and how you ended up working at Red Hat and in the cloud native space.

Miguel, will you start?

Miguel: Yes.

So, I started at Red Hat eight years ago already, and I started with the OpenStack project on the networking component of OpenStack and a couple of years ago, I moved into this project.

At the start the name was not Submariner, we just didn't know what it was going to be, and then we found Submariner.

Marc: And Stephen, how did you end up working in the cloud native space?

Stephen Kitt: Yeah. Well, so I joined Red Hat six years ago, working on another project called OpenDaylight, and that ended up being canceled.

And so, we moved into-- well, with Miguel and others, looking for multi-cluster connectivity solutions for OpenShift and Kubernetes.

I didn't really choose the project, but I really enjoy it.

Marc: Awesome. Cool.

So, let's dive into the Submariner project a little bit; you mentioned multi-cluster networking, what does the project attempt to do?

Stephen: So, historically, Kubernetes clusters have been a fairly isolated world; each cluster is a box and egress and ingress are tightly controlled.

And the idea of Submariner and really what we wanted to do before we even discovered Submariner, was to provide a more transparent connectivity.

Well, basically allow pods, for example, from one Kubernetes cluster to talk to pods in another Kubernetes cluster without any specific setup like adding a gateway and configuring the client pod to specifically talk to that gateway and adding TLS certificates.

So, that's the general idea; you can take two clusters, connect them, and they end up behaving pretty much as one cluster, at least from an IP perspective.

Marc: So, today, if I have one Kubernetes cluster, I get core DNS in a software defined networking through a CNI plug in and stuff just magically works.

I have a pod and I can talk to a service using in-cluster DNS and that works, and so what you're attempting to do is bring that same level of ease of connectivity between multiple clusters?

Stephen: Pretty much. Yeah.

When you connect two clusters, the pods can magically talk to the pods in the other cluster and services in the other cluster, the only tweak the client pods need to do, is to adjust the domain name that they use for the target service.

Marc: Okay. Is that to identify which cluster that is trying to talk to, then?

Stephen: No. It's just to distinguish cluster-local services from what we call cluster sets global services.

Miguel: Yeah. We implement an API that was defined upstream in the Kubernetes project which is the multi-cluster services API.

The idea of that API is that you can export services to be available on the other participating clusters of what this API defines is a cluster set.

A cluster set is a set of clusters and they're at the same administrative domain or a very trusted environment, and it has one assumption that a namespace in one cluster is the same as another namespace in a separate cluster.

And if you export a service in one cluster, that service is going to be equivalent to a service export in a different cluster of the cluster set, so being like an horizontal service that it really doesn't matter to which cluster are you talking if you want to distribute that service across clusters.

Marc: That's cool.

So, going back to something you said, it's leveraging functionality that was in Kubernetes called multi-cluster service API, is that what you said?

Miguel: Yeah.

It's not a functionality that already exists in Kubernetes, but it's an API that has been defined from the multi-cluster seek to provide this type of service discovery service connectivity between connected clusters.

So, in the end, Submariner is just one implementation of this API.

We worked with the upstream community to define this API.

For example, rates like Istio in the context of cluster mesh are also working into providing this API.

Benjie De Groot: That brings up a great question; so can you give us just a real high-level, why would I use Submariner versus an Istio or a Linkerd?

Miguel: Okay. Yeah. There are differences.

So, in Submariner, we tried to avoid any packet handling at the user workload level.

So, every packet that is handled that has to flow into a different cluster has to be handled at the kernel level.

So, no user application port or container is going to receive that packet, manipulate, analyze; it's just going to be routed and handled at kernel level.

Because one of the goals of the project is to maximize network performance and make it as simple as it possibly can be.

Benjie: Yeah. I've seen in the past there's some extra weight when we're using some of the other service meshes.

So, one of the big goals here is just as lightweight as possible? That's really interesting.

When it comes to the multi-cluster service API, that's super-interesting to me; you said that you've kind of been a part of the upstream defining of that API, can you tell us a little bit about that process?

How did that start? Was that from you guys?

Did you guys read, "Oh. Two people over at Google want to add this?"

How did that mature, how did that get started?

Stephen: It's the latter.

We discovered the MCS sync, which was started by people from Google and somebody who was at Red Hat at the time as well, various other people.

And it matched what we were trying to do to a large extent.

And so, it's an API that's still being defined, it's still at the alpha stage.

So, we joined in to make sure it would fit with what we were trying to do, and so not to add everything that we wanted to do into the API itself because that's not the goal, but to make sure that there weren't any constraints imposed by the API that would affect us negatively.

Marc: So, your intent in your goal with Submariner was to implement that API as the API spec existed not to shape and morph and transform the API, initially?

Stephen: As far as possible. Yeah.

Miguel: Our intention was to make sure that with this API, we will be able to satisfy all the use cases that we were foreseeing in customers.

Benjie: So, that's a great point.

So, can you give us maybe a little inspiration prior to you guys even diving in?

Were there any specific customer use cases where you're like, "Oh. We need to solve this problem," and that inspired you guys to dive into this?

Stephen: Yeah.

The initial use cases were really around Kubernetes scalability; we had customers who were running into issues scaling up, kube-proxy in particular, on very, very large clusters.

And so, they were starting to use multiple clusters and these are setups with thousands of nodes per cluster and thousands of clusters, it's fairly epic scale.

And so, they were looking around for solutions to provide ways of connecting all these clusters so that they could host applications across multiple clusters.

So, there was another big use case as well which was geographic distribution.

So, I think all Kubernetes cluster doesn't work too well, or at least didn't work too well at the time, if its nodes were too far apart and if latency was too high.

So, we had customers who had clusters in separate geographic zones, for example, who wanted to be able to use them transparently.

And so, there was some work that was more oriented towards service-mesh type technologies like Istio, there was a fair amount of work on KubeFed as well.

But like Miguel said, we were interested in providing something simpler with less overhead, both for the administrator and at runtime.

Marc: So, I totally get that, there's multiple ways to scale.

Obviously, there's all kinds of stories right now; Google has shipped some from sick scalability and so on where you can actually scale one cluster massively.

But you have to make a decision, if you're running a cluster, is that the direction you want, do you want to put one massive cluster or multiple large clusters?

When you start to think about geographic distribution, I think about that in terms of disaster recovery and fault tolerance and making sure like, "Oh. If this region of GCP goes down or if the US East goes down in Amazon, I can keep running."

Do you see a lot of use cases, then, in that mode of geographic distribution and fault tolerance, disaster recovery needing cluster-to-cluster communication or do you see folks want to run those as isolated independent clusters?

Stephen: Right. That's exactly one of the use cases that we have known.

So yeah, disaster recovery, and we do see a need for deeper connectivity between clusters in disaster recovery scenarios to host, typically, databases that are capable of replicating data from one cluster to the other.

So, one use case we have was with CockroachDB, we also have people using PostgreSQL and scenarios like that.

And so, one of the features you end up with is trying to host database systems that have their own idea of what a cluster is that doesn't necessarily quite match up with what Kubernetes' idea of a cluster is.

And so, having IP-level transparent connectivity between clusters really helps in those scenarios.

Marc: That makes sense.

And putting it into the perspective of stateful components, CockroachDB and other databases like that, absolutely makes sense as you think about you might want multiple clusters running in different regions, but you probably want the same data to exist in all of them to make the service functional.

Stephen: Exactly. Yeah. At least for disaster recovery.

There are other use cases where you don't want the data to be the same across geographic zones if there are legal constraints, for example, on where you can host data and so on. That's another set of requirements.

Marc: Right. Right.

And then, I'm hoping this question makes sense, so when I want to do that and I want to say, "Great. I have this database," let's say CockroachDB, and I'm running it in my cluster and the performance of that database is critical to my service and I want to replicate the traffic across the different clusters using Submariner.

Can you talk a little bit about the technology on how you're doing it and how you're making sure of this performance and not going to be the slowest link in the system?

Miguel: So, today, the topology that we use to connect the clusters is not a full mesh between the nodes that some technologies try to do that because it doesn't work very well when you have, for example an on-premise cluster on a corporate network and you need to connect that to another cluster.

It works very well if you have multiple clusters on the cloud and each of the nodes has access to a public IP address, but in that kind of mixed environment between corporate network and then a public cloud, we need a gateway and we need to set up the network, at least to some degree.

Sometimes it's not necessary but sometimes you do need to at least open ports on your corporate firewall or make sure that you have a public IP for your gateway, or at least a port which is redirected to your gateway.

So, that has the implication that in the end, we need to make one hop from the worker node of the consumer to the gateway, and then from the gateway to the gateway on the destination cluster, and then to the database, if that database is living on a different cluster.

But if you have a replica of that database close to the consumer on the same cluster, now by default, we are directing that connection to the local cluster.

And something that we are working on is in trying to optimize that flow.

Currently, it's very simple as I explained; if you have the database server, in this case if it's a database, the service that you are consuming, if it's available on your cluster, we will send you to the local replica of the service in your cluster.

And otherwise, you will be sent to a random cluster, that is not optimal.

Something that we are working on is on making that connection to the closest cluster based on latency, for example, or letting the administrator define policies on how that is going to happen. But that is something that we have on the roadmap for the future.

Marc: That's cool. That makes sense.

Like you mentioned before, geographic limitations on where data can be hosted, it would allow the administrator to say, "Well, I have three clusters in Europe and four in the US and data should only go here, not there."

I'd be able to implement rules like that then, right?

Miguel: Yeah. Exactly. Yeah.

And also, I mean try to use the closest service if it's healthy.

Maybe it's not on your cluster, but maybe it's on a cluster which is on a close region, so the latency is lower.

It wouldn't make sense, for example, if you have a database which is located on the US and it's available and it's healthy, to make that connection to a database which is on the Chinas or where else, it wouldn't make sense for latency reasons.

Marc: I think I'm making a couple of assumptions about the architecture and how Submariner works, and I'd love to give you an opportunity to spell it out in a little bit more detail.

Say I have a couple of clusters, is this all just components that I run in each of my clusters?

How do I bootstrap them to find each other?

Are there services that run outside of the cluster that are important for this discovery in bootstrapping or how does that work?

Stephen: So, there's a shared broker that is used to store data that's common to all the clusters.

Basically, it's a list of clusters that are part of a cluster set and how to connect to them.

And that can be either an external Kubernetes cluster that's only used for that or it can be one of the connected clusters and that's just CRDs basically, so there's not much specific.

Although, now we do use our operator to set that up, so the operator ends up running there.

And then, to actually join the clusters and communicate the information that's needed for all the clusters to connect to each other, well we have an operator that takes care of the cluster local setup, but it needs a fair amount of information to be given to it.

And so, to simplify things, we have a dedicated tool called subctl that takes care of things.

And so, basically in most cases, setting up Submariner is just one command to set up the broker and then one command per client cluster to join them to the cluster set.

And then, once you have that, the clusters are all connected.

And then, if you want to export a service to make it available so that it can be discovered in other clusters, there's an additional step which is to actually export the service.

So, we don't make all the existing services available across all the clusters automatically, it's up to the administrator to choose which services are exposed.

And then, you were asking about performance earlier and one of the nice things about Submariner, in my mind, is that once the tunnels are set up and the clusters are connected, that's all represented as one tunnel between each cluster pair using one of a variety of technologies.

Currently, we support IPsec or WireGuard or VXLAN, and then we set up IP tables rules across the cluster to direct traffic as appropriate to the tunnel.

And so, once all that's set up, network traffic goes through the tunnels and to the target clusters without actual intervention from Submariner; it's all handled in the kernel.

Marc: Got it. That makes sense.

So, thinking before Submariner, how did people solve this?

What other traditional, maybe even non-CNCF, pre-Kubernetes ways and tools were people using to solve this same problem?

Because multi-data center, that's not a new problem that exists in 2021, that's existed before Kubernetes.

What tools are you replacing right now with the cloud native toolset?

Miguel: So, I think that before Submariner, you could get to something similar to this if you had VPNs configured manually between your VPC networks or your networks, and then maybe using network plugin like Calico which allows routing.

But it was a pretty involved solution where you have to configure routing, create every single VPN connection between the clusters, monitor those connections, and you will not get the service discovery or the multi-cluster service API available.

You will get the IP connectivity between the services and the bots, but not the discovery part.

So, I think that was a solution that you could have before but it was not complete.

Also, the other alternative that always has been available is exposing that service that you want to share between the clusters on the internet under a load balancer or an API endpoint.

But that means that that specific service becomes available to the whole internet and it doesn't stay as a private service into your Kubernetes network.

Marc: Yeah.

There's disadvantages of the cluster API service that sits internally but having cluster API that is multi-cluster API is kind of cool; you still have the firewalls and the defense in depth of being able to segment the traffic but allow access for specific endpoints.

Miguel: Yeah.

Benjie: So, let's talk about security, that's a good segue, right?

So, how do you guys handle it?

Assuming there's some RBAC in there, but just talk to us about your security model and how you guys look at that.

Miguel: So, the first part where security is important is the establishment of the connections to the broker.

The broker is where the information between the clusters is changed in terms of the available clusters, the available endpoints.

Endpoints are the gateways and where the gateways are listening in each cluster.

So, for connecting into the broker, every cluster has its own credentials like a service account; this is just a Kubernetes API and its cluster will have dedicated credentials, so if you need to cut out any specific cluster, you can do it.

And that's one part of the security model.

The next part of the security model is that to establish connectivity between two clusters, you would need an additional secret, not only being able to connect to the broker, but you also need-- for now, we have a pre-shared key that has to be installed in your cluster.

If you don't have that pre-shared key, it doesn't matter that you have connectivity to the broker and that you can see the other end points, but you cannot connect to the other end points.

We use that pre-shared key with IPsec and with WireGuard, and they will use a normal mechanism in those technologies to establish the connection at the data plane level.

For VXLAN, that one is not meant for security, that solution we are providing it for speed reasons.

For example, if you have clusters which are connected via your own private network and you know that that network is secure and you don't care about security but you care about the performance, then you can remove the overhead of the encryption and the data plane connectivity establishment, and you don't use that pre-shared key.

Benjie: So, I can turn off TLS, essentially, if I trust that the cluster's on the same network?

Miguel: Did you mean between applications which are talking to each other, right?

Benjie: Correct. Sorry.

I'm just trying to understand the connections between the data planes, I think you said, those are secured through private keys, I believe you said.

So, is that TLS just backing up a second or what are we looking at?

Miguel: Okay. Yeah.

So, the connection between the clusters and the data which is transferred from one cluster to another cluster between two gateways, it's encrypted if you use IPsec or if you use WireGuard.

If you choose to use VXLAN, which is a much more performant but low-security connection, that one doesn't have any encryption.

Although, I think it's always good to keep using TLS between the endpoints and the application this is just another extra security level which is going to encrypt your traffic between clusters.

I mean, if you have a threat or somebody who has infiltrated into your cluster, they could monitor the traffic.

I mean, any security measure that you have on a single cluster is good to still keep having on these multi-cluster level even if you have encryption.

It will be encrypted twice but I think it's better to rely on encrypting all the endpoints where that is possible.

Benjie: Yeah. That makes a lot of sense.

Marc: Encryption's good and twice is better than once, right?

Miguel: Yeah. Of course.

Stephen: Yeah.

Back to your question about the shared key that we mentioned, that's an IPsec, we don't actually use certificates on the IPsec tunnels, just a pre-shared key.

So that's one of the things that each cluster has to be given.

But if somebody manages to steal that, then that level of encryption ends up being useless.

I think on the WireGuard side of things, that uses certificates.

Marc: That's cool.

So, by integrating into IPsec or WireGuard, you're able to leverage a mature ecosystem of proven technology around that encryption around the tunneling and stuff like this?

Stephen: Yeah. Definitely.

And they're both supported in kernel on Linux, so we can piggyback on performance improvements there and just use standard tools. So, for WireGuard obviously the tool's WireGuard, and for IPsec we use Libreswan.

Marc: I want to jump ahead for a minute; when we think about how you actually built Submariner and the problem that you set off to solve, did it initially always support multiple different tunneling protocols like our applications WireGuard, IPsec, VXLAN, or were those added over time?

What was the initial use case and then what drove you to start implementing optionality there?

Miguel: I think that was a decision that was made by Chris Kim when he started the project.

I think that he was probably aware that no encryption solution was going to fit every need and that there are solutions that have advantages over other solutions.

For example, VXLAN is much more performant but I mean, the security is not there, it's just encapsulation.

While IPsec is very secure, but of course you have the overhead of encryption and decryption.

So, I think that is something that Chris Kim included on Submariner from the start.

Marc: Cool. So, let's dig in there for a second, Chris Kim is the original creator of the project?

Miguel: Yeah. Yeah. That is correct.

So, when we started participating on Submariner, it was him.

I mean, he had just published Submariner out; it was an only Rancher project at the start, and at that time, we were looking into solving this problem from Red Hat.

We knew what we wanted to solve, and we didn't want to start from scratch because I mean that's something that happens in open-source many times, in my opinion.

It's sometime a common error that you think that it's going to be easier to start a project from scratch because you know what you want to do and you know how you want to do it.

But over the development, you will discover that there are a lot of hidden complexities that you didn't account for.

And if somebody has started something, maybe it's not the same exactly that you wanted to do but very similar, they have already found those problems and they have solutions for them, you have a starting codebase you can jump start.

And the Rancher people and Chris were very, very friendly to have contributors join the project.

Benjie: I just want to dive in here because I'm curious, looking from the outside in.

If I'm a engineer at Red Hat and I want to get started and I want to either augment a project with a Rancher project, or obviously an open-source project, what's that look like internally?

How do you guys get permission to go work on this project?

Because that's always fascinating to me; how does Red Hat operate?

How do you get to go work on something as cool as this? How's that work?

Stephen: We have huge amounts of freedom.

Obviously, choosing a project that we're going to work on in a very consistent manner for a long time with investment of a reasonable amount of engineering resources and everything that goes with it like a QE, documentation, that requires management involvement and it ends up going fairly high up.

So, once we saw Submariner and decided that it matched our requirements, the question wasn't really the project itself, it was do we want to go down this path and spend time working on this type of solution?

But then, as far as the project itself is concerned, we can work on basically anything that doesn't have a contributors' license agreement or a copyright assignment.

And even then, many of them we can get approval for.

We've got huge amounts of freedom and that's one of the great things because when you work on a project like Submariner, or any other project basically, projects are never isolated, they live in an ecosystem and you always run into issues in your dependencies, for example.

And in most cases, if we've figured out how to fix a dependency, we don't have to ask anybody for permission to submit a pull request on another project, we can just go and do it.

Marc: So, that's awesome, I think that's how organizations scale and really cool things get created when you see a problem and you have the freedom to be able to go build.

But Submariners is at a different level now; Chris Kim started the project, it's an open-source project that you discovered, but now it's in the CNCF Sandbox.

What's the process at Red Hat that you had to go through to say, "Look, this is no longer owned and governed by Red Hat but we're going to contribute the copyrights, the trademarks, everything, into the CNCF?"

Stephen: Well, that was more a problem involving Rancher really, because they owned the initial copyrights and the trademark and the domain name.

So, Chris Kim worked a lot to get their agreement submitted to the CNCF.

I don't know what the process was like on our side.

Daniel was involved there, so I don't know if he wants to chip in.

But it was fairly straightforward from our perspective.

We already have a few other projects in the CNCF and people like Josh Berkus, for example, who're very involved.

Miguel: Yeah.

So, when we decided that we wanted to contribute to Submariner and use it to solve this problem that we had for our customers, which was connecting the workloads in different clusters, one of the first things that we did was talking to Chris asking, "Okay. Do you think that long term this is a project that we could donate to a neutral place? For example, the CNCF."

And actually, he said that that was completely on his plan.

I mean, it was something that could fly as a Rancher-only project, and that CNCF was a great home for the project.

Marc: That's great.

I mean, and obviously, new use cases and you guys joining the project and helping push it a little bit, probably, was very welcome.

Especially when the same outcome that he had in mind.

Miguel: Yeah. Exactly.

Something that we had seen since the project joined the CNCF Sandbox is a lot more of attention.

We had more people trying Submariner, more people providing feedback, people asking questions.

So, it's something that we were seeing a little bit before CNCF, but now it's more evident.

Marc: So, I'd love to dive in to a little bit more and talk about the tech stack that you guys are using to write it.

What language is it written in?

You mentioned CRDs, so is it mostly functioning as an operator?

Can you talk a little bit about how it's built?

Miguel: What we used to build Submariner is first the Go language because we found that it's something that worked very well on the Kubernetes ecosystem.

It's very easy to integrate with the Kubernetes APIs and also it has a very good support for cross-platform or cross-architecture.

But then all the information exchange that we do is using also CRDs over a broker with Indiana, it's a Kubernetes API.

I think that that was used for several reasons.

The first one is visibility for the user. It's very easy to go into a Kubernetes API and list the resources and look at them manually and try to figure out what's going on.

And it really saves a lot of time in terms of development.

If we had to develop a dedicated API, I mean we would have spent a lot more time in that part of the design.

The CRD support of the Kubernetes API is fantastic for this.

Something else that we use is the operator framework to deploy Submariner and to maintain Submariner.

Also, if you want to do an upgrade, it's going to take care of upgrading the deployments inside Submariner DaemonSets, the CRDs that Submariner uses internally to communicate or into the broker.

It's still not perfect. I mean, we have support and we have CI that verifies that every single batch that we write is going to upgrade smoothly, but probably we have some gaps to fill.

Marc: Sure. Don't we all, on our projects? Yeah. That's cool.

So, that actually is an interesting question, then.

It's a relatively new project in the CNCF anyway, how many people at Red Hat are working on it either full-time or on a part-time basis?

And then, how many folks in outside organizations do you see contributing on a regular basis?

Miguel: That's a good question.

In Red Hat, probably Stephen can correct me here, we are 10 people?

Stephen: Yeah. Approximately that.

I was just looking at the contributor stats on GitHub there, there's 14 of us in Red Hat.

Marc: Wow.

Miguel: Okay. Yeah. That's a lot.

And then so, from Rancher we still have Chris around.

He's not contributing a lot of code currently, but he keeps joining the community meetings and giving advice. Sometimes we made architectural error or changes that we didn't realize that we were making and he explained why things were made that way and he has been very helpful as much as it's been possible for him.

Because as you know, Rancher was bought by SUSE and there were some changes, but he's still around as much as he can.

And we also have one contributor from Hitachi, Bandara.

He's interested in providing support for external network connectivity into the Submariner network.

So, you have external or legacy workloads, you can also connect them to Kubernetes workloads in a very easy way, so he's working on that area.

Stephen: Yeah. A couple of people from IBM, they contributed the WireGuard support.

Benjie: Interesting.

So wait, guys, just before we run out of time, I want to talk about roadmap, I want to talk about what 1.0 is going to look like, maybe the timeline on that.

And then also, what's it going to take to graduate from the Sandbox?

I know that's a few questions in one, but let's dig into all those things real quick.

Miguel: Okay.

So, for the roadmap things that we plan to look into, for example the network policy support, we try to drive that discussion into the multi-cluster seek to try to define how our network policies supposed to work into a multi-cluster environment.

But at that time, that was something that looked very far and they focused on the connectivity and the service discovery, but we strongly believe that now it is time to look into that. We have some proof of concept which was working into Submariner, but we want to make a final implementation.

Because I mean, it's an important security part.

Network policies are already important into a single cluster, if you go into multiple clusters connected, I believe it's still more important.

We are going to be spending some time into a specific use case that we have found which is overlapping IP address spaces.

For example, when you have clusters which have overlapping IP addresses for the ports or the services, then if you have one IP address for that port or that service, you have to go to one cluster or go to another.

So, we have an initial solution for that which has been in Submariner for six releases now, I think.

But we are making a second iteration which we have called Globalnet V2, which in the end is like an overlay IP address assignment for the ports or the services that need connectivity to other clusters.

So, with that new IP address space, you can know if an IP address belongs to one cluster or another cluster.

So, we are finishing that implementation that will go in 0.10, Globalnet V2. We also want to work into the scale problem.

We know that Submariner works today for at least three to 10 clusters and this is going to be okay.

But we want to characterize how is this going to work, for example for the broker, which is a central piece which will have served a lot of information from all the clusters.

So, we want to see how scale is going to work or how scale is going to work in terms of the connections between the clusters.

Because currently we are doing a full mesh between all of them, but maybe a full mesh is not something sustainable, so we want to work into the scale area of Submariner.

Marc: That's always an interesting topic.

How do you measure that?

Do you have end-to-end tests that are running on a regular basis and you have a target for a certain scale and you're trying to push that bar up with each release or what are you doing there?

Miguel: So currently, we have nothing for that, so this is something that we need to start working on.

We did some preliminary testing with a simulated environment with a lot of client-based clusters connecting to a broker, but it was all very preliminary.

So, we need to define how we are going to test for that and what are our targets and then, as you said, how do we sustain or enhance those numbers over the time?

Marc: And I'm sure three clusters that each have 5,000 nodes and tens of thousands of pods running behave very differently than three kind clusters that are running inside Docker.

Miguel: Yeah. Exactly.

So, the numbers important for the scale of Submariner are the number of clusters and the number of services which you ave exported.

If you don't export them it's really not important.

At the start, actually, we were exporting all the services, but that came with the problem that maybe you have tons of services that you didn't really want to export to other services, so it was a scale problem.

So, when we participated on the definition of the multi-cluster service API, it's something that was identified like, "Okay. Probably people connecting clusters don't want to expose all the services, they want to expose a specific services to the other clusters."

At that point the scale probably improved a lot.

Marc: Great. So, let's wrap up and talk about community for a little bit.

It's a new Sandbox project, how are you managing interacting and getting feedback from the community?

What tools are you using, how do people get in touch and chat about the project or express desires for features on the roadmap, challenges? Things like this.

Stephen: Yeah. So, most of our interaction is on GitHub and on the Kubernetes' Slack.

So, we have a dedicated channel on the Kubernetes' Slack which is #submariner.

It can be a bit daunting, I guess, for new users because we use it a lot for our own chats around what we're developing, problems we're running into.

But we do have a lot of people joining and asking questions and explaining their use cases, problems that they're running into.

But then, obviously, when there is a documented problem that someone is interested in fixing, we prefer having a GitHub Issue filed, and so that's the other venue.

It's not used as much as it could be.

Miguel: We also have a couple of mailing lists; one for development, and another one for users.

But I guess that mailing lists are not as popular today as they used to be.

We have questions from time to time, though.

Marc: Do you have regular community meetings yet or something you're still working to get started?

Miguel: Yeah. We have them on their website.

If you go to the left, down, there is a Google Calendar link and we have multiple meetings during the week.

One for governance, one for users and developers, and some others for automation and CI inside the project.

Marc: It sounds like a really cool project and it's kind of built in a way that I can really appreciate where it's built on top of this mature, battle-tested, proven open-source technology.

So, it really just makes it kind of turnkey easy to use, but providing a bunch of functionality on top of it.

Miguel: Yeah.

Marc: Miguel and Stephen, I really appreciate you taking the time to share the Submariner project with us today.

Benjie: Thanks, guys.

Stephen: You're very welcome. Thanks for having us.

Miguel: Thank you very much.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Aug 7, 2025

Podcast

Open Source Ready Ep. #19, Kubernetes at Scale with Josh Rosso of Reddit

In episode 19 of Open Source Ready, Brian and John speak with Josh Rosso, Principal Engineer at Reddit and author of Production...

Mar 27, 2025

Podcast

Open Source Ready Ep. #10, The Whirlwind Pace of AI with Taylor Dolezal

In episode 10 of Open Source Ready, Brian and John chat with Taylor Dolezal, former CNCF Head of Ecosystem and current Chief of...

Sep 25, 2024

Podcast

The Kubelist Podcast Ep. #43, SpinKube with Kate Goldenring of Fermyon

In episode 43 of The Kubelist Podcast, Kate Goldenring shares her journey from Microsoft, where she contributed to Kubernetes...