DEC 16, 2020

40 MIN

Ep. #8, Telepresence with Richard Li of Ambassador Labs

GuestsRichard Li

light mode

about the episode

In episode 8 of The Kubelist Podcast, Marc Campbell speaks with Richard Li of Ambassador Labs about the development and testing tool Telepresence, as well as the problems developers encounter when adopting Kubernetes.

about the guests

Richard Li is the Co-Founder and CEO of Ambassador Labs (previously Datawire), makers of Telepresence. He was previously an advisor at Duo Security and Product Marketing Director at Red Hat.

show notes

about the episode

about the guests

show notes

transcript

Marc Campbell: Today we're here with Richard Li, who's the founder and CEO of Ambassador Labs, a company formerly known as Datawire. Welcome Richard.

Richard Li: Thanks for having me Marc.

Marc: So we're here to talk about Telepresence which is a CNCF sandbox project.

I'd love just to kind of get started and understanding like, what is Telepresence? What does it do?

Richard: So Telepresence is a unique tool that developers run on their laptop or desktop to create a connection to the kubernetes cluster.

And this is a tool really for development and testing.

And the problem that we've seen as developers start adopting Kubernetes and microservice architectures, is that frequently your application consists of multiple microservices a nd over time, that application gets too big for you to run efficiently locally because you've got databases and multiple applications.

If you're running Java, suddenly you need all this memory.

And so what happens is you start trying to run this in the cloud, but as soon as you start running in the cloud you can't your IDE and your favorite tools.

And you have to go through this Docker build, Docker push-to registry, and you're just waiting all the time.

And so Telepresence lets you do real-time development against the application running in the cloud.

Marc: Okay, so the primary goal is to be able to shorten and close that developer feedback loop, s o when they're writing code they see it executing really quickly.

Richard: Exactly, yeah, so that so-called inner loop.

So it's not a replacement for CI, right?

But what happens is, CI is a terrible inner loop for developers.

And what we find is that as you start adopting Kubernetes at greater degrees of scale sometimes people start relying on CI for the inner loop and that's just monumentally unproductive.

Marc: Yeah, no, absolutely.

Like, I mean, as a developer, I know that when I click save like if I can't execute and test and run that code within single digit seconds, it becomes frustrating because you get distracted, you switch over to a new tab and you start reading something.

Richard: Right exactly. And then you lose your train of thought.

So trying to stay in the flow of things is so important if you want to be a productive developer.

Marc: So as a developer, I'm running Kubernetes and I have like a relatively complex stack, like I'm constantly like hoping, tuning into the next Apple announcement that the next laptop has more and more memories so that I can run a local Kubernetes cluster but help me understand like what are the requirements to use Telepresence?

Can I just start using a GKE cluster as my local dev environment now? Or how does that work?

Richard: Exactly, so all you need is a set of Kube control permissions basically the ability to port forward and a couple of other things in your Kubernetes cluster.

And then you need root access on your laptop because a what we essentially function is a little bit like a VPN.

So we take over some of your laptops networking so that instead of sending the packets out into the internet, we intercept those packets.

And if those packets actually should be going into the Kubernetes cluster, we send them to the Kubernetes cluster.

Marc: So does every developer need their own Kubernetes cluster? Or how does that scale in a larger organization?

Richard: That's a great question Marc. So there's a couple of different models.

So one is you can give each developer their own namespace or Kubernetes cluster we are actually working on and we have a commercial version of Telepresence that actually does this.

We're bringing some of this back into the open source right now which actually lets you create a shared model so that if you have a microservice in the cluster, based on, for example, your HTTP header you can actually route the requests to different laptops.

So you can say, well, this request comes from Marc. So let's send it to Marc's laptop.

This request is from Richard, send it to Richard's laptop.

So you can actually share the cluster and simultaneously develop the same microservice.

Marc: Oh, that's super cool.

So I can just like check out my local copy minimizing the amount of memory and CPU and everything that I need.

So I can just run just the part of the stack that I'm deving on right now.

Richard: Exactly yeah, so you just run your microservice that you're coding on locally which means you can get your IDE and you get hot reload all those kinds of things.

But then your microservice, if it depends on other stuff in the cluster it will think it's in the cluster so it can talk to all the other microservices in the cluster.

Marc: And to be clear like, my local microservice that I'm running that doesn't have to be an, a Kubernetes cluster doesn't have to be in a Docker container, or can it just literally be like a go binary executing on my Mac.

Richard: It's just a go binary running on your Mac or it can run in a container. We support both.

Marc: Great, let's kind of talk about the origins of the Telepresence project.

At Ambassador Labs, that's not the primary product that you guys are shipping.

So I'd love to hear like why Telepresence was created.

Like, it sounds like it solves a very specific problem but like what steps did it take to decide, hey, this should be an independent project inside the ecosystem.

Richard: Yeah, so it was a few years ago. So Kubernetes, it was very new then.

And we started building some cloud software on top of Kubernetes and we have a whole bunch of infrastructure engineers, right?

So they love hacking on all this low level system stuff. And they also are fanatical about just fast dev loops and that kind of thing.

We realized quickly that it was just a pain in the neck to make a code change and then wait for that push to the Docker registry and Docker bill and all that kind of stuff. And so one day we were just white-boarding and said, "There must be a better way."

And we came up with this sort of whiteboard architecture and we had a super talented engineer who said could I take a few days to try to hack something together?

And so that's what he did. And he had something together. We're like, oh, this is pretty cool.

Can we spend a few more days actually cleaning it up so that we could actually release it as open source because we weren't necessarily building a business around it just to see what other people thought and it just started taking off then, so.

Marc: Yeah, that's great. Like early days of Kubernetes, the stack grew to be too big.

Can you help me understand a little bit about like the tech stack that's running Telepresence or do I have to have components installed in my cluster in order to make this work?

Do I have an agent on my laptop or how do I get started with it?

Richard: Yeah, so it's a very simple installation. We've got home brew and Linux packages.

We don't run on windows today and you just install it locally on your laptop.

And then the client will then look for coop control locally in your laptop.

And then it will then take care of everything else.

When you run the client for the first time it will tell your Kubernetes cluster to download a proxy that gets deployed in the cluster.

So there are some cluster site components they're all deployed via the command line client, so.

Marc: Do you see a lot of people spinning up like a shared dev cluster or like in our organization, we have a staging environment in a production environment.

Is it also possible or reasonable to say, "Hey I'm going to debug this problem in our staging environment. So I want to check that service out here. "

Richard: Yes, it's an increasing use case and we're starting to see actually quite a number of users who used to have this one dev cluster per developer model.

But as you can imagine, as your dev clusters get beefier, as your applications get bigger, it starts to get pretty expensive, especially if you have a hundred engineers and you have a hundred dev clusters, that's actually quite a bit of compute expense.

And so we're starting to see companies that are actually starting to consolidate down to one or two dev clusters that are a little bit bigger but they're not running production workloads.

So you can put a lot of developers on a single cluster.

Marc: Yeah, I mean, look at Replicated.

Like we started off with everybody running Kubernetes locally and then that outgrew what could fit.

So we gave everybody Kubernetes clusters in the cloud running, micro cates on a GCP instance and then switched to K3S to squeeze a little more performance and more reliability out of it.

But the idea of actually leveraging our SRE team and saying like, let's just run this cluster kind of match the production environment a little bit closer.

It's not like a different version of Kubernetes. There's probably tons of benefits in that.

Richard: Yeah. And the other thing is you also get what I like to call real-time integration testing, right?

Because if you've got 20 microservices, right?

If you actually have your own cluster, you actually have to periodically update your cluster to take the latest versions of everyone else's microservice.

So you're sort of, your service is actually being integration tested with the tip of everyone else's work.

And if you have a shared cluster then you just say, hey look whenever a different person has gotten their microservice to releasable state, just push it to the cluster, right.

And so you're always current. And so the amount of work you're spending to keep your cluster up to date is way less than if you had your own cluster.

Marc: Right, does that open up some interesting workflows around collaboration between, front-end back-end teams and stuff like this that is otherwise really tricky to do.

Richard: Yeah, we're building some functionality in now where you can actually preview changes.

So for example, on my laptop, before I pushed into my share cluster, I can Slack you a URL that's unique and then you can click on that URL and then see exactly what's running on my laptop, right?

So I think there's a lot of interesting things that you can do.

We're also doing integrations with our Ambassador API gateway, which will let you send a copy of production traffic to your laptop for a minute just so you can test on a copy of production traffic.

So there's a whole bunch of things you can do when you actually have control of layer seven and can start sending requests all over the place.

Marc: One thing we'd love to dive into normally on the podcast is talking about some of the technical challenges with building the project.

But I think Ambassador Labs is probably relatively like you have that expertise that domain expertise around layer seven.

And like you're able to leverage that in order to turn this project into a pretty unique solution to the problem.

Richard: Yeah, I mean so that part actually has been easy.

What's actually been a pain and super hard for us has been the fact that it's actually client software like the server side like because of our API gateway and everything, we understand all that.

We do a lot of stuff with Envoy, but on the client, it turns out, how do you override DNS on Mac?

And by the way, Apple seems to change how DNS works. Every two versions of Mac OS right?

And then Linux is different and how do interact with VPNs.

And so that's actually where I'd say we've had to do a lot of learning ourselves because our engineers mostly started writing server side software.

And this is really client-side software. And it's mucking with all these, DNS and internals on operating systems.

So that's been a little bit of a excursion and journey.

Marc: Interesting, that's unexpected. I would have imagined like a lot of the Kubernetes networking would have been challenging.

And like, as we're recording this, Mac OS big surge, just recently come out like the Apple M1 chips come out that's probably filling the roadmap with issues to solve for right now.

Richard: Exactly, yeah, it totally is.

So one of the things we're doing now is we started by writing in Python Telepresence and that was good because we knew a lot about Python and we actually were so new to Kubernetes that none of us knew Go, and we're actually getting ready by the end of this year to release a new version of Telepresence, Telepresence two, which is a full rewrite of Telepresence in go.

And that brings a whole bunch of different benefits not the least of which, the entire Kubernetes ecosystem is and Go, so.

Marc: Yeah, I mean, I'd love to dive into that a little bit, like rewriting a project is always super challenging, right?

You're going to like lose a couple of things and like add a bunch of new things often.

What are some of the benefits? Other than like the Kubernetes ecosystem is in Go, I'm a go developer, so now I can import libraries and it makes it more compatible.

But from like a project perspective from Telepresence users what should we be looking forward to in the 2.0 version?

Richard: So from a user perspective, we are going to a model where you have a persistent proxy that runs in the cluster, which allows reconnect.

So currently if you drop your connection to the cluster and this is less of an issue these days since most people are working from home but if you're on the go and you lose your connection the cluster would get sort of into an odd state and you'd have to manually clean up the cluster and that kind of thing.

So we're going to a persistent proxy in the cluster and you can reconnect to that proxy.

And that's one of the big advantages.

And just in general, with the rewrite, we also simplifying the entire user experience because over time the command line for Telepresence has gotten pretty complicated as we've gotten more users and we've just kept on adding flags.

So now a lot of our users actually write wrapper scripts around Telepresence because no one can actually remember the magic incantation of flags. So we're trying to simplify that so normal human beings can actually figure out how to use Telepresence without reading 30 different flags to figure out exactly what you want.

Marc: Yeah, I mean, I think, flags are that makes it complicated but it's still much easier than, debugging while your local Kubernetes cluster isn't running.

So like, that's awesome that you're like trying to optimize that right now for the 2.0.

Richard: Yeah, people really, I mean, and part of it is just we've got a really great community and we've got other contributors to the community.

So it's been fun kind of continuing to build Telepresence with the help of other folks.

And when you get people who are like, I can't do anything without Telepresence, we're totally depending on Telepresence, for all of our workflows that's always nice positive affirmation.

And it's just completely open source, you use it to help build Ambassador gateway but like there's no commercial offering behind Telepresence right now.

There is a commercial offering behind Telepresence right now we call it service preview.

It's not something we put a lot of emphasis on.

It was just more people, were asking us can we have specific features that we want and can you give us support?

And so we basically said, okay, we'll provide sort of an enterprise support package for Telepresence if you want.

But the focus of our business is really around Ambassador and we just want people to use Telepresence.

And if they don't feel like giving us money, that's not a big deal for us.

Marc: Right, I'd love to kind of talk a little bit about the ecosystem in general Telepresence.

When did Telepresence create the first open source version?

Like you said, it was early in Kubernetes How early in Kubernetes.

Richard: I want to say it was 2016 when we did a release, so.

Marc: Yeah, that's very early Kubernetes.

There's other tools right now that help really focus on the developer experience around like namespace as a services, there's projects out there and other dev tools.

Can you help me understand a little bit like the different approaches that some of these tools are taking the ecosystem in general if I have developers that are struggling with Kubernetes like Telepresence takes one approach, but like there's a myriad of various approaches out there.

Richard: Yeah I think the most common approach I see is this let me create a dev environment for you and then take care of the mechanics of going from source all the way to something running Kubernetes.

And so I think Scaffold is probably the most well-known project from Google.

There used to be drafts although Microsoft has killed drafts at this point which was based around Helm.

And then there's a variety of startups like tilt and garden that kind of occupy this space.

So I think that's your most common approach.

And it's just, I've got a bunch of source code, I want to run in cluster.

How do I do this with a single command?

Telepresence, we think, it doesn't actually solve any of those problems. It's really a real-time development and debugging tool. So, we see those projects as pretty complimentary to us.

I'd say the only product that I am aware of, which is I think I'm more directly an alternative to us is Thomas Rampelberg at Buoyant wrote a project called I think he was called "Casing" where he used basically real-time file sync.

So you would do real-time file syncing from your laptop into a pod in the cluster.

So you could actually get sort of what felt like sort of real time development and that was approach.

And so I think again, like casing is not really an alternative to Scaffold, but it's a different way of kind of synchronizing and getting more of that real time fast inner loop problem solved.

Marc: Yeah, and I'm pretty familiar with scaffold until and I know the team over at tilt really focuses on like shortening that inner loop how much Kubernetes expertise do all of my developers need in order to be able to be successful with Telepresence.

Richard: You really don't need any because like Telepresence is really more like a VPN.

So the idea is just type Telepresence, enter and then it's really fire and forget.

Like one of the pretty big cohorts for Telepresence users actually are data scientists because they run all their ML workloads in remote clusters, because you don't have enough compute locally to run any sort of real ML model.

So they're able to actually use and I don't know a lot about the space but they create these Jupiter notebooks locally but they're all connected to the remote Kubernetes cluster.

And so they basically have unlimited compute to do all their data experiments.

And so that's actually a super popular use case.

And those people don't know anything about Kubernetes. I mean, they're data scientists.

Marc: They shouldn't have to know anything about Kubernetes to do their job.

Richard: Exactly, yeah.

Marc: And I imagine another great use case too, you know I'm thinking about how we could adopt it a little bit more is like a front end engineer, who's going to like write some react code and maybe modify a few APIs, but they don't need the database and everything running locally but we don't want to expose the databases to the public internet or manage VPNs ourselves.

Richard: Exactly, yes. Because especially if you do front end development it's really so iterative.

You like make a little change to your CSS, you react which it hit reload. And you want to be able to do that really quickly.

Marc: So let's say I wanted to get started with Telepresence today.

Like it's a pretty mature project at this point.

Do you have any recommendations for a good way to get started less on the technology side about how to install it, but like, I have a team of developers who have a way that they're writing code right now.

How can I start introducing this new style to them and show them that it's an efficient and it's a good and a productive way to do their job?

Richard: I'd say that the way we found the most success is you get one developer who's interested in it and they figure out how to integrate Telepresence into their workflow, right?

They get it working in the cluster and they get it working on their laptop.

And then usually engineering teams, they have your lunch and learn or weekly engineering meeting.

And then that person does a demo and then shares whatever configurations necessary.

And I think that's the best way to start because I think the mechanics of Telepresence as you pointed out are pretty straight forward.

And it's really about how do you apply it into your environment in a way that actually works for you because everyone's environment's just a little bit different.

And it turns out once you start thinking about databases you need to add there's some command line options around which database ports to actually forward and all this kind of stuff.

And so getting someone who's excited to set up the initial configuration is I think the best way to get started.

Marc: Yeah I mean, that's the way these things work.

Nobody's sat off to build this complex environment. It was originally easy.

And then a month goes by and there's this additional step to do.

And six months go by and now like a new person joins the organization and they understand, wow, like this is this like nine page Google doc to set up a dev environment. This is insane.

Richard: Right, exactly.

Marc: What about types of organizations or size of organizations that is good for if I have a really small a couple of developers or a really large enterprise that has thousands of developers, is there a sweet spot for Telepresence or how does it work in these various size orgs?

Richard: I think all organizations can benefit from Telepresence.

I think the benefit differs, right?

So if you're just like a two person startup, I think the benefit is just faster development and debugging.

As you get into thousands of developers they actually ironically, tend not to care as much about developer productivity, but they care a lot about money and there's this a user we know that they spend a million dollars a month on cloud compute just for dev.

And with Telepresence, you can slash that by 60, 70% and we're actually doing this deployment for them.

So they expect to save millions of dollars a year just by going to Telepresence and consolidating clusters.

Marc: Wow, a million dollars a month on just a dev cluster. That's definitely significant, worth putting some effort into.

Richard: Yeah exactly.

Marc: So Telepresence is a sandbox project right now in the CNCF.

And like, I think, most of us kind of understand the ecosystem in the process there, it goes from sandbox to incubating to graduated but it's like Telepresence is a pretty mature project with seemingly a pretty good amount of adoption.

Has the team given any thought to next steps or like targets in order to move it into an incubation project?

Richard: Yeah, so we thought about doing incubation this year and we decided to wait to next year because for us a sign of a real incubation project is if we can get a couple other maintainers that don't work at Ambassador Labs to have commit rates to the repository.

And this is also one of the sort of side benefits of moving to go because everyone in the Kubernetes ecosystem generally programs in Go.

And so we're doing a bunch of things to try to make the code base more accessible to third-party contributors because we really liked to be able to, from an incubation perspective, say, hey, we've got this mature project that not only has all these users and does these things but also we've got these maintainers that don't just work at Ambassador lab.

Marc: Yeah, I mean, I think that's super important.

I think one of the CNCF measures as they advanced projects is like diversity of organizations contributing to the project.

And so that's great that you guys are making the project more accessible so that you can increase the diversity of those commits.

Richard: Absolutely, yeah. So it's a big thing for us.

We were fortunate someone from Datadog started helping us out.

We've got someone from red hat, so continuing to get more contributors to help out is a big goal for us.

Marc: And so the 1.0, release the track that we're on right now as the stable release, that's you said it's primarily Python, but 2.0 is going to be switching the core of it over to Go?

Richard: That's right, exactly.

Marc: And can you help me understand the timeline on that?

Like what should somebody expect if they wanted to either contribute to it or say, hey, I'm going to wait and like start with Telepresence on the two point brand.

Richard: Yeah, so we're probably about a month out from releasing 2.0 a bunch of the work that remains is we're working on sort of documentation, especially developer documentation.

And we're trying to get it to a state where it's easy for other people to contribute.

So right now it's in a private repository that we will be transitioning into the open repositories in the next month or so.

So we just want to get it into sort of a reasonable, buildable, installable state, before we release it, so.

Marc: Yeah, I think reducing the turn and making it easy if that's the primary goal, that totally makes sense.

So the outside contributions you have right now Datadog, Redhat and others, you said like those are working in the 1.0 branch right now.

Richard: Exactly, yep.

Marc: Cool, if I wanted to start contributing in today as a developer, do you have any recommendations?

With others like a lot of issues in the 1.0 that a good first issues, little enhancements or would it be better off to hold off and wait until the 2.0, becomes public?

Richard: I'd say that there's certainly issues and enhancements where I think really the most value would come is really in helping with documentation and use cases because that's the thing we've learned about Telepresence is that everyone so does the application is a little bit different.

And so even a simple how to, which says here's my application, it has Kafka Postgres and 20 Java microservices.

And this is how I set up Telepresence with scaffold or whatever else you use.

I think that kind of stuff is super valuable because you can't just use Telepresence by itself. It's always like, "You got to use it with your Kubernetes cluster, your application and scaffold and your CI system." And so we found that that's where people start to really get bogged down, "How do you get all this stuff working together?" And that actually is a pretty big amount of work unless you're like an expert in all these different dev tools.

Marc: The intent though, is like the 1.0 work that's still going on out there in the public repo like as much as possible and whatever makes sense, we'll make it into the 2.0 branch.

Richard: Yeah, exactly. We expect a first release of Telepresence two call it a beta.

Won't be feature parity with the open source but it will have the reconnect better reliability all these things.

And I think we'll probably get to feature parody sometime early next year call it late January, February timeframe, so.

Marc: Yeah, I'm actually really excited about the reconnect.

I mean, I like definitely am spending a lot less time on airplanes now with COVID and everything, but that said, there's a lot of people who are just like not working from an office and they have less reliable internet connections.

And like, I'm sure there's a huge demand to make that like a lot more reliable. And it just automatically handling that for everyone.

Richard: Yeah there's a whole bunch of sort of things in that reliability category which we found that there was some architectural limitations to how we approached it with Telepresence one, that we're actually approaching in a better way with Telepresence two.

So the mechanics how we inject the proxy into Kubernetes cluster, like we've learned also more about sort of the corks around Kubernetes networking and how it works so that we're taking advantage of some of that stuff in Telepresence two.

So in general, we expect Telepresence two to be much more reliable and faster than a Telepresence one.

Marc: How much effort has been put into like an upgrade path at this point from Telepresence one to Telepresence two for somebody who already has one running?

Richard: We haven't really put any real effort in, I think we'll probably do more of a documentation based approach because I mean I love the fact that Telepresence has no state, right?

So you just install the binary.

And as long as it connects to your cluster you should be able to, the command line options will be different, but it's not like you're migrating a database and you need to figure out how to get all your data to migrate.

And it's not like it's a Python library where you have all this code that ties to that library.

So our hope is that, with some upgrade documentation it should be pretty smooth for most people.

Marc: Got it, what about like more complicated use cases of Telepresence, right?

And I don't know if it's possible, feasible or if you've heard of anybody trying it, but like we talked about a data scientist or somebody checking out part of the code or a front end engineer running the front end at an API locally.

Like, what if I have an SRE who, we're running postgres in the cluster and they want to like say hey I want to switch from Postgres 10 to Postgres 11 or 12.

Could you just run that locally? And like in execute that entire end to end loop

Richard: I'm pretty sure you could. I haven't heard of someone doing that but I haven't certainly heard the other way around.

So like, I know there's this very large audio company.

They basically have all their databases in the cloud in the way their developers do local development and connect to their cloud databases is with Telepresence.

So you're basically talking about the reverse situation. So from a protocol perspective, we're agnostic.

So I would think it would work. I just don't know if anyone actually tried it.

Marc: Are there any like requirements for me to be able to use Telepresence?

Do I need to have a service masher, like side cars?

Like you mentioned it works through like a header that's injected into like some of the protocol as it passes around.

How does that happen?

Richard: You just need a Kubernetes cluster and you need a macro Linux machine.

You do need privileges on both those machines.

It works with a service mesh and the header injection is an optional part of Telepresence.

So without header injection, we just route all the traffic by default from the cluster to a particular service to your laptop, which is fine for dev purposes, right?

It's the header injection becomes important when you're having a shared environment where multiple people are trying to access the same service and you need those requests to go to different destinations.

Marc: Right and that likely solves kind of two very related problems.

One is just like, without the header injection it's just developer reliability, it's easy to onboard a single way to like run the stack but like the cost savings really come in when I can like have that shared cluster and share it across the whole team of developers.

Richard: Exactly, yeah. So again, if you're like a two or three person startup, right, you probably don't care about the header injection, any of that functionality as you get bigger, that stuff gets pretty useful.

Marc: So you said that today the piece that I have to install locally is for Mac and for Linux, any plans for windows?

Richard: We've been thinking about it. And I think part of it depends on if there's commercial interest, because it is a pretty expensive endeavor to support windows, but we've been thinking about windows support and starting to look into it because there have been some inquiries.

And if so, I certainly wouldn't expect it until much later in 2021, since, zero people are working on it right now, so.

But if there was an organization that had developers that were primarily running windows shops you'd be like very open to working with them.

If they wanted to contribute that.

Marc: Yeah, absolutely, yeah.

Richard: And by the way, we do run on WSL two. It's not really windows, but WSL two is so good that it can run Telepresence kind of amazing.

Marc: Yeah, WSL two is pretty cool. If I took that approach.

So I'm running the Linux agent then are there any limitations, would I be able to, I mean is it effectively the same as running on windows?

Richard: No, because WSO two has its own networking space. So if you're using say visual studio to code that visual studio, isn't running in WSO two.

So you basically have to do some fancy stuff on your laptop so that your visual studio can talk to your microservice running in WSO two which is also where Telepresence is running.

So it can totally be done and people have done it, but it's not super straightforward.

This goes back to how people can contribute, right?

All this stuff can actually be done through magic incantations of Telepresence and WSL two, but documenting and writing about how you actually got this all working-- That's where there's a lot of gaps in the project.

Marc: So, you've been working in the Kubernetes ecosystem at Ambassador Labs since like super, super early as we talked about, have there been changes to Kubernetes that have like been challenging as like new versions of Kubernetes?

Is there any like specific or big changes as Kubernetes has matured, that's made it like tricky and you've had to like throw a lot of resources and engineering at updating Telepresence to be compatible with it.

Richard: Ironically, like the primitives that we depend on with Telepresence with Kubernetes has been, made them pretty stable and this sort of tells you how old we are is I think the biggest change that impacted us was our back.

So we started working with Kubernetes before our back was a thing.

So and so when our back was a thing we had to go through and figure out, okay, what are all the privileges that we actually do depend on and figure that out.

But other than our back, it's been pretty straightforward for us because Port Ford's been around for a long time and we use Kubernetes deployments and pods.

So it's been pretty stable for us.

Marc: But like Telepresence is obviously compatible with our back being enabled in a cluster now, does it require cluster admin level permissions or can I actually give it a relatively limited reduced our back role?

Richard: Oh, you can give it a pretty reduced our back roll but it is more than I have to remember, but yeah, there's just a couple of things it needs but it's not cluster admin for sure.

Yeah, as we try to ship software into more restrictive environments at times.

One of the environments that we see comes up often is OpenShift as like it has a lot of limitations on it.

Marc: Does Telepresence work in an OpenShift cluster?

Richard: It does, the gentlemen from red hat who actually contributes to Telepresence uses it with open shift, how he gets it to work.

I'm not sure, but I think it's actually it generally just does the right thing. So yeah, I think that's the case.

Marc: Cool, yeah, I think that speaks to like the Kubernetes architecture a lot, the fact that it's been able to Telepresence has been compatible with like, the CNI and CRI abstractions that allows you to really just build on top of these primitives and then kind of like go and build something creative that they never thought about before.

Richard: Yeah, I think the Kubernetes team has done an amazing job in keeping a relatively stable set of APIs going well shipping updates and features at a ridiculous pace.

Marc: Right and so Telepresence today is a sandbox project.

I'd love to just kind of understand a little bit more about the decision to donate it to the CNCF and some of the benefits that you've maybe seen from it or any stories or anything to share around that.

Richard: I mean, honestly it was just an experiment because we thought, well, let's donate to the CNCF to try to get some wider visibility and try to get more maintainers and contributors to the project.

So it wasn't just us. And what we've learned is that people really like the fact that it's neutral and managed by the CNCF.

And we've also found, and we were I think the first or second CNCF sandbox project.

So I think over time, the CNCF has spent much less time helping with sandbox projects and a lot more on the incubation graduation projects, which makes sense, especially since sandbox has really exploded.

So we haven't seen as much of that broader awareness marketing benefit that the CNCF gives to other projects but we've been happy with just having a CNCF and people feel more comfortable with it.

And we think that there are things like rewriting and go that will really help with getting more contributors.

Marc: Yeah, I mean, I think recently the CNCF changed the process to submit into the sandbox and as part of that they've reduced or changed some of the unwritten benefits you get as a sandbox project.

So, I mean, I think it's great. Like the increased amount of projects in a sandbox is only great for the ecosystem, but like you're right.

It definitely puts a little bit more pressure on you as an early sandbox project to like, let's move into the incubation phase in order to like get a little bit more visibility and like level up on the benefits that we're receiving from It.

Richard: Yeah, exactly.

Marc: So, while we're waiting for the 2.0, to come out, like is there anything else that you'd like to share?

Anything else like on the roadmap like specific use cases, anything new that you'd like to see other than just like contributors?

Richard: So, we think that there's a lot of things that Telepresence can do that we're starting to do some exploration.

In particular, Telepresence is like I said, sort of like a VPN. So on one hand that's great because it's a pretty clear abstraction.

You can do anything you want with it, but at the same time, figuring out the best way to use it in your environment-- Because it's just sort of a pretty low level tool, it's not always obvious how it can benefit.

So we're starting to look at ways to use Telepresence better in both development and testing workflows, because you know, like a lot of times developers will create mock APIs and at Telepresence, you can actually sort of replace mock APIs to some degree.

So we're starting to look at a bunch of these sort of development test use cases.

And I think that's going to be where we start to take the project is trying to figure out how to make it more like provide higher level abstractions to integrate into development and testing workflows, as opposed to just, hey this is just as VPM, which is a relatively stable, so.

Marc: Yeah, that's super cool. Just to make sure I understand.

Like one of the things that we do is every time a developer commits code, part of our CI process will either install it to a cluster to run tests against, or sometimes depending on the project actually create a cluster using Kind or K3S or whatever and actually like run end to end tests.

And then it's challenging sometimes to make sure that on a test failure the developer gets visibility in all the information they need to understand that.

So the future of Telepresence may be able to like assist that developer on that CI workflow problem.

Richard: Exactly, right, that's a great example.

So you can for example make Telepresence easier to run on a CI system so that when you run CI, you might already have a cluster.

Your build up is of that particular service you're testing.

So you build and run that service locally in your CI system while it connects remotely to your staging cluster.

And then you get all the log data because it's all running locally.

So it's just a lot easier to kind of troubleshoot what's going on with that service.

So figuring out how to make it work better with CI as an example.

And also, as I said earlier things that you can do with managing layer seven.

So being able to say, okay, as part of CI, run this integration tests and the way we're going to actually run the integration tests is we're going to actually create a copy of production traffic and send it over to our CI system to see how it does for, you know, a minute.

So figuring out how you do those kinds of things I think are pretty interesting directions for us to explore.

Marc: Right, yeah. And I mean, as the projects, exploded in popularity do you have like full-time engineers that are like dedicated to working on Telepresence right now?

Or is it still a supporting project for like Ambassador Labs engineers to be able to contribute to?

Richard: We actually have a couple full-time engineers working on Telepresence now, because we do see, it has a lot of opportunity for growth.

And so we're definitely starting to invest more in it, especially since, without us trying that hard.

There's also been a growing cohort of commercial customers w ho've been kind enough to help let us invest more in the product, so.

Marc: Yeah, I mean I'm sure four years ago when you started it, there weren't a lot of dev teams that were spreading Kubernetes across their entire dev organization.

But today it's like, everybody's doing it.

And as you're doing that to your point like you're reaching into groups and that just, they don't want to learn Kubernetes.

It's just not in their wheelhouse. They shouldn't have to learn it.

And like that creates just more demand for the product.

Richard: Exactly, yup.

Marc: Cool, some of the new stuff, when you're thinking about like the dev test stuff is that stuff that's going to be focused really in the 2.0 branch.

And so like, are we at that phase of like thinking about it in early design?

Or, are we like we're low, there's some prototypes and implementation out there right now

Richard: It's really more in the 2.0 product.

Although we do have some commercial users who are starting to use some I would say more custom solutions built around Telepresence that we've been experimenting with.

So we want to productize that and package it into the project as a whole.

So we have some experience with what works what doesn't work, but all of this is being built around the Telepresence two code base.

Marc: That's great. Richard, is there anything else that we didn't cover here that you'd like the opportunity to share about Telepresence 1.0, 2.0, use cases, developing on Kubernetes, anything like that?

Richard: No, I think it's great to talk to you, Marc.

And it's also great that you are a developer in the Kubernetes space, so it was a great conversation.

Marc: Cool, well, thanks Richard Li, the CEO and founder of Ambassador Labs, formerly Datawire, like I mean Telepresence looks like a great tool.

I think like, you'd be crazy to like try to do this the hard way when this thing's out there as an source project for any team right now.

Richard: Great, thanks for having me Marc.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Oct 8, 2025

Podcast

Open Source Ready Ep. #23, Kubernetes, AI, and Community Engagement with Davanum Srinivas

In episode 23 of Open Source Ready, Brian Douglas and John McBride sit down with Davanum “Dims” Srinivas to discuss the health...

Aug 7, 2025

Podcast

Open Source Ready Ep. #19, Kubernetes at Scale with Josh Rosso of Reddit

In episode 19 of Open Source Ready, Brian and John speak with Josh Rosso, Principal Engineer at Reddit and author of Production...

Mar 27, 2025

Podcast

Open Source Ready Ep. #10, The Whirlwind Pace of AI with Taylor Dolezal

In episode 10 of Open Source Ready, Brian and John chat with Taylor Dolezal, former CNCF Head of Ecosystem and current Chief of...