AUG 7, 2019

NaN MIN

Ep. #12, Service Mesh with William Morgan of Buoyant

GuestsWilliam Morgan

light mode

about the episode

In episode 12 of EnterpriseReady, Grant talks with William Morgan, CEO and Co-Founder of Buoyant. They discuss how Twitter’s scaling methods in the early days led to Linkerd, as well as adoption strategies and viable business models for open source-based companies.

about the guests

William Morgan is the CEO and Co-founder of Buoyant. He was previously an infrastructure engineer at Twitter, where he ran several teams building on product-facing backend infrastructure. He has worked at Powerset, Microsoft, adap.tv, and MITRE Corp, and has been contributing to open source for over 20 years.

show notes

about the episode

about the guests

show notes

transcript

Grant Miller: All right, William. Thank you so much for joining us.

William Morgan: Thank you for having me, Grant.

Grant: Cool. Let's just jump right in. Tell us a little bit about your background, and maybe a bit about how you got into enterprise software.

William: I am the CEO of Buoyant, which is a company that I founded with my colleague Oliver Gould a couple of years ago.

We were actually both engineers at Twitter during a very formative time for the infrastructure world at Twitter.

Because when we started there in 2010 Twitter was still a monolithic Ruby on Rails app, which we lovingly called "The monorail."

Grant: I think I remember. Was there a Fail Whale around that time?

William: Yeah. There were a lot of fail whales. Then by the time we left, about four or five years later, Twitter's infrastructure was totally different.

It was microservices, and containers, and orchestrators, and all that stuff. Going through that experience firsthand informed pretty much everything that we've done since with Buoyant, and with some of our open source projects.

Grant: I feel like I haven't seen that Fail Whale in a while, so maybe you guys did an OK job there.

William: Yeah, that's the most amazing thing about this. It actually worked. It's very rare to have these wide-ranging infrastructure initiatives and to have them actually succeed at the end.

It was also rare, honestly, for anything to really succeed at Twitter at that point in time. But magically this worked, and it worked really well.

So we haven't seen the Fail Whale a whole lot ever since.

Grant: OK. The backstory here is that you were at Twitter, where you were in this formative engineering role working on the infrastructure, and that was at the time when Docker and things were starting to come of age. Is that right?

William: That's right. Although Twitter didn't know anything about Docker, didn't use Docker.

Grant: OK. Just tell us a little bit about what you worked on at Twitter to make this all work, and then how that ties into what you're doing at Buoyant today.

William: Yeah. The transformation at Twitter that happened from this monolith into this mass of microservices thing was done without really knowing what we were getting into, and without having a lot of tooling that was existing out there to build on top of.

So nowadays you have things like Docker and Kubernetes, and circa 2010 those things weren't there or they weren't prevalent enough for us really to know about them.

But the concepts were the same, so Twitter didn't have Docker but it had containerization in the form of-- We had C groups, which would do resource limitations.

We had the JDM, which was a packaging mechanism in a lot of ways. Twitter didn't have Kubernetes, but we had Mesos.

Mesos was a grad student project and we had to do a lot of work to turn it into a production ready system, but that was the orchestration layer.

We didn't have the word "Microservices," or maybe that word was out there and we just didn't know about it.

But we called this thing an SOA, and we knew that was a bad word, but we didn't know what else to call it because we were decoupling into services.

Grant: That's "Services-oriented architecture," is that right?

William: Yeah, that's right.

Grant: Were you aware of what Google was doing with Borg at the time? Was that something you'd seen or heard about?

William: Some of the engineers who were core to this initiative had come from Google, so they had these ideas in their heads, but Borg was not a product we could rely on.

It was a thing that people had been exposed to and that we generally knew was a good idea.

Grant: OK. You've seen that there's this pattern, and maybe Google was pretty good at keeping their services up.

I think the SRE handbook talks about how people ping Google to see if their internet connection works.

Maybe Google was building fairly reliable software, and Twitter could use some of those same principles and core concepts to get that same level of reliability.

William: Yeah, that's right. We knew we needed something even by the time I was there.

In 2010 there was a World Cup, the soccer World Cup in the summer of 2010, and I remember being in the lunchroom with the other two employees watching it.

Every time there was a goal, Twitter would fall over because everyone would tweet "Goal" and the site couldn't keep up.

I think that was the last moment in time when we were like, "OK, this Ruby on Rails monolith is the path forward."

Because we'd actually done a lot of work to make this thing performant. In fact, I think it was the highest performing Ruby system in the world or something at the time.

Grant: You'd stretch it to its limits, but you needed to go further.

William: Yeah, that's right. We had refined the garbage collector, we had gotten really deep into the Ruby runtime, but it just wasn't viable going forward.

Grant: Where was the world in terms of cloud at this point? Were you running AWS? Were you running your own systems? How was that all working?

William: AWS was certainly around, but everything at this point at Twitter was on prems.

Grant: OK, so you had your own data centers?

William: Yeah, that's right. There had been a world prior to 2010 when Twitter was in a colo-facility. But by the time I was there, those were our machines.

These were physical machines that we were SSH-ing to and doing things with. They weren't BMs.

Grant: OK. You have logical access to all of these machines and your team is-- When there's an outage, are you SSH-ing into these machines to try to fix things? How are you fixing things at that point?

William: Yeah. Sometimes you werw SSH-ing into machines, sometimes you were doing other things.

The only point I was making is that this was not in the cloud, these were physical machines that had names, and we knew those names.

When you spun up a new service or whatever, you'd be assigned "Here's the three machines that you get this thing to run on."

Grant: OK, great. This is the, as we call it, "Pets not cattle."

William: That's right. These were all pets.

Grant: You had a name for every one of your servers, and you knew it well.

William: Yeah, that's right. There was the guy with the spreadsheet and you have to go and beg him for more machines, and you'd figure out what type of whiskey he really liked because that was how you were going to get bumped up in the priority list. It wasn't ideal.

Grant: OK. One of the things that you worked on and you helped build that's helped solve this problem, is that where the service mesh concept came from?

William: Basically what happened from there is as we went through-- Pursued this idea that we should break the Twitter application into separate services, and then those services should all be operated independently and communicate with each other at runtime.

We had to invent a couple of different bits of infrastructure to make that work. One of those things, obviously, was Mesos. Which we didn't invent, but which we productionized.

Another thing was we observability stack, so we invested a ton of time and energy in this very fine-grained layer of instrumentation and metrics collection and aggregation, that every team that owned every service had this very powerful dashboard that they could look at.

Another big component was this library called Finagle that we used to manage the communication that was happening, the runtime communication that was happening between different services.

Finagle started out being fairly simple, being this idea of "We're writing these services," we were running them in Scala, so it was like "We need this cool Scala library to allow us to do functional programming on top of RPC calls."

Then over time Finagle evolved into this sophisticated library that was doing things like load balancing or request balancing, requesting all the load balancing, and routing, and flow control, and a bunch of other really fancy things under the hood.

Grant: So Finagle, it sounds like that was what became the inspiration for what you built at Buoyant.

William: Yeah, that's exactly right. When Oliver and I left Twitter we were looking back at that transformation and we were like, "Wow." We really learned a lot of stuff in going through that.

One of the lessons we really took away was managing that communication, that runtime communication between services.

It was so critical to Twitter being able to operate that micro services architecture effectively, and as we looked around the rest of the world the rest of the world was starting to adopt things like Docker, and Kubernetes, and it was all different from what we had seen it at Twitter.

The details were very different, but the pattern was the same. But no one was really thinking about anything like Finagle.

No one was thinking about, "OK. Once I actually have 100 services running and this orchestrated environment, what then?"

Grant: OK, so you saw this opportunity and you knew based on your experience on Twitter where you had gone down the microservices path, the next step would be everyone's going to need a service mesh.

William: We weren't quite at that point. Basically, we were like "OK. We've seen into the future and we know that everyone is going to have this set of problems, and it sure feels like we could save them a lot of trouble if we could just solve that for them."

That was the very original goal, and we knew that if-- Finagle was a Scala library for doing functional programming on top of RPC calls.

It was this obscure intersection of people who were cool enough to be into functional programming, but also fuddy-duddy enough to be on the JVM who would even want to use this thing.

We didn't want to just take Finagle and be like, "OK. We are the Finagle company." Although Finagle was open source, so we could potentially have done that.

So the first thing we did was, "OK. Let's wrap this up into a proxy."

You actually don't care what language it's written in and you don't care about the programming aspect of it, but you get the operational semantics without you having to tie your application code to anything Finagle-y.

That was the genesis of this project Linkerd, which is what Buoyant spends most of its time and energy on these days.

Grant: Did the project come before the company, or did the company come before the project?

William: In our case, the company came before the project. We had the idea and we knew what we wanted to do, but we didn't really start on anything more than a couple prototypes before incorporating and trying to build a company around it.

Grant: OK, so you left Twitter together-ish?

William: Yeah. Actually, we were separated by about a year. I left and spent a bunch of time trying a couple of things, none of which really took off.

Then I tried to get back to my roots. You do the classic exercise, "OK. I want to start a company. Let me think about the last place where I worked, if I left, what would I miss from there?"

That was the exercise that started getting us down the path to thinking about Finagle and the power of what, at the time, we were calling RPC-proxy.

Grant: OK. I actually love this exercise, just in general, as a way to think about what in reality are the enterprise software opportunities. "If I left my company today, what are the things that I would miss more than anything else?"

William: Right.

Grant: That's a great question for people to think about as they're going down this path, so I'm guessing there was a couple other ideas that you had around the same time. Is that right?

William: Yeah, there were. As I was thinking about what were the formative things that made Twitter able to go through this transformation, one of them was Mesos itself.

But there was a company already commercializing Mesos, I was like "I don't want to do that because someone else is already doing it." The other component was that visibility stack that I mentioned, but there were a lot of visibility companies.

I was like, "I just don't know." It didn't feel quite as interesting to me. The third component, the Finagle component, the service communication component was wide open.

I was like, "It just feels like there's got to be something here."

Grant: Very cool. Then did you convince Oliver-- That's your co-founder, right?

William: Yeah.

Grant: That was the thing you guys should work on together, or did he work on Finagle at Twitter?

William: Yeah, that's right. He was deep into the core infrastructure there. I was actually more on the surface layer, I was doing back end engineering but I was staying close to the product.

I was working on things like the photos back end, so you upload a photo and it goes through a bunch of code to do stuff with it.

He was down and in the guts with Finagle when other things were happening. He was a natural person to do this.

Grant: Got it. You were more of a consumer of Finagle, and you realized how important it was to your workflow?

William: That's exactly right.

Grant: OK. So Oliver was more-- He had been much more involved in the project, so the opportunity was to bring him on board to help build the company around this concept.

William: Yeah, that's right. I had known him pretty well at Twitter, in addition to being basically an internal customer of his because he was building Finagle and other things.

We spent a lot of time together because we actually carpooled together. We knew each other pretty well at that point.

Grant: From the beginning you decided to run this as an open source company?

William: I think there was maybe a couple days in the very beginning when we were like, "What should we do? Should this be a commercial thing?"

But we rapidly realized that pretty much everything else in this world was open source, like--

Everything else in the software infrastructure stack was open source. So doing something proprietary felt strange to us.

Also, we're a bunch of open source nerds, so that was what felt very natural to us anyways.

Grant: OK. Instead of trying to build a proprietary closed source solution that you can go sell at banks for millions of dollars because it does something, you decide to take up a broader approach, which is "Let's open source it and start to get adoption that way."

William: Yeah, that's right. Although now that you say it, could we just have made millions of dollars doing the other thing?

Grant: OK. Obviously open source, I think you're absolutely right in terms of the infrastructure software world.

Most things, even the banks are going to adopt, end up being an open source project that they want to then help find the commercial version.

So they can continue to support the delivery of this, and also get the other features they want in.

How did you start to get people to use this and pay attention to you? Because even though it's an open source project, it's not just "If you build it, they will come." You need to get the word out somehow.

William: There were two components to that. The first was we had a big struggle around what we were going to call this thing, because there wasn't really an equivalent out there.

We started out, like engineers, we started out saying "This is a proxy that handles RPC calls, and Linkerd is an RPC proxy." We tried to describe that to people, and people would be like, "I already have a proxy."

We're like, "No. This is different." They're like, "I don't even use RPC. We use HTTP." And we say, "Under our model, HTTP is a subclass of RPC."

Then by the time you had that conversation, they had wandered off.

Grant: That's amazing.

William: We tried a couple of other things until finally we had the brilliant idea to call it a service mesh, which has no meaning.

Or at least at the time it had no meaning, it didn't really communicate a whole lot. It had "Service" in there, and it had "Mesh," which was weird.

But what it was, was a blank space that we could write into. Then we could say, "Let me tell you about this thing called a 'Service mesh,'and this is what it does."

Once we used that terminology it started making more sense to people, and that term caught on, and now a couple of years later there's like 12 service meshes. It's now a real-life thing and our conversations are very different.

Grant: So, that was a term that you helped bring to the market and helped define?

William: More than helped. We birthed it.

Grant: You birthed it?

William: Yes.

Grant: I think about some of the concepts and terms that get defined, it's really what helps to define the industry. Did you try to own the term?

Like create the web page for it, the chaos engineering page or the 12 factor page or something, to define what a service mesh was? Or was that not part of it?

William: Not really. Maybe not as much as we should have. I think mostly we were just happy that there was a noun phrase we could use.

We could tell you, "Linkerd is a service mesh." Maybe if we had made Service Mesh.com or .io or whatever, things would have gone differently.

Or maybe the other project would not have called themselves "Service meshes," I don't really know.

That wasn't really the focus at the time. The focus for us was "Here's this project, and we need people to use it."

Grant: Sure. Calling it a "Service mesh" just made it easier to explain what it was and not get it confused with all the things that it wasn't.

William: That's right. There have been proxies since the beginning of network engineering. There have always been proxies.

Describing and defining how it was different was an exercise, and it's not just a proxy component, there's the control plane component.

There's all these things wrapped up in this definition. I'd say, out of the two or three things that really formative for us getting open source traction, the first was that terminology.

Grant: Sure. Then was the next part, "Go to the conferences and present and talk,"or was it "Get early customers to adopt it, to then have them present to talk?" How did you approach getting it out in the market?

William: The next part was a whole lot of conferences and a whole lot of elbow grease. It was, "OK. Let's go."

It's an open source project, so that means we can talk about it at open source conferences. It's not considered a vendor pitch.

I think it's a project that people can contribute to, and we tried our best to make it a truly open source project and not something that we threw over the wall.

So there was a whole lot of that, especially in the early days. A whole lot of describing it, and we tied it a lot to the Finagle story.

Because one of the problems you have as a very early stage project especially, is people don't want to rely on you.

Especially something like Linkerd, it sits in this very critical part of your application stack. Every single service call is going through Linkerd.

That's a scary thing to do, and so we relied heavily on the Finagle story because Finagle was very productionized, and was widely used at Twitter and Pinterest and a bunch of other companies.

Nowadays Linkerd has got its own momentum and people use it all over the place, so we don't really have to talk about Finagle very much.

In fact, that the newest version of Linkerd is not even built on Finagle, but early on we really relied on that.

Then there's another key component here to the conference thing that took us a little bit of time to learn, which was understanding which conferences we should be going to.

Grant: How did you figure that out?

William: Trial and error. What happened for us early on is-- Again, coming into this as engineers, we were like "OK. What's a requirement for someone to find Linkerd useful?"

"They've got to be running microservices, because this thing mitigates and manages and monitors service to service communication. So, let's go to the microservices conferences and let's talk to microservices people."

We found a lot of them because that term was very popular, but almost everyone we talked to were these architects.

They wanted to talk about their 18-month microservices roadmap, and they wanted to talk about CQRS versus event sourcing, versus whatever. It was all interesting, but I was like "This isn't really going anywhere at all."

Then we stumbled into the Kubernetes community.

In contrast to these microservices conferences, the Kubernetes meetups and conferences we were going to were full of practitioners.

These were boots on the ground engineers who were like, "I've got two weeks to make this thing work, otherwise I'm going to be fired," or whatever it was.

We found these conferences full of practitioners as opposed to architects, and that made a huge difference to us.

Grant: When you say microservices conferences, is there a specific one that you're like "That's more academic," or is that the problem? How did you--?

William: You can go to MeetUp.com and you can find like 20 microservices meetups in any city, and our experience with those--

Again, this is a couple of years old, is that they were full of extremely intelligent people but they were not very practitioner-focused.

They were very architect-focused. Rightly or wrongly, that's what we found. From the technology perspective, Linkerd is not that tightly coupled to Kubernetes.

It's not like there's anything that's really Kubernetes-specific in Linkerd, or at least in that early version of Linkerd.

But from the market perspective, this was the audience of people who actually needed Linkerd and they needed it in a short timeframe, as opposed to the 18 month microservices roadmap.

Grant: Did you find that it was more companies as well? The user community was commercially-focused?

William: Almost everyone who we found who was adopting Kubernetes was doing it as part of their job. There's always people who are playing around with it and having fun, or who hope to use it as part of their job.

But even back in 2017 or so, a whole two years ago, it was largely people who were trying to bring it into their companies and do something real with it.

Grant: Now interestingly, because you came from Twitter, and I'm sure you-- Obviously you mentioned the Mesosphere concept earlier.

Two years ago it wasn't as clear that Kubernetes was going to be the clear winner, did you get involved in the Mesosphere and maybe in the standard Docker swarm world as well? Or not?

William: Early on, you're right. It was not clear that there was going to be one particular winner.

We had Linkerd, at least that first version. Now things are a little different. It could talk to Kubernetes, it could talk to Mesos. It could talk to console, it could talk to the zookeeper.

We just made it this general purpose Swiss army knife, and got a lot of adoption from people who were not using Kubernetes.

But over the past two years the distribution of people who are deploying it on Kubernetes has really increased dramatically.

Grant: Was going to conferences and talking about it at these more practitioner-oriented events, was that the real key to adoption?

Or were there-- Was it joining the CNCF? What was really the key piece that has been driving adoption for you?

William: I think there were a couple of things. Part of it was that, part of it was just getting it out there and trying to get it in front of people and describe why we were doing it.

That was meet ups and conferences and blog posts, and everything. We did submit it to the CNCF in-- I think this was early 2017.

It got accepted, and that was a big boost for the project because this was maybe the 5th project that the CNCF had accepted.

It was like, Kubernetes and Prometheus and some other things, and Linkerd. Pretty soon the Linkerd logo was up next to the Kubernetes logo, and I was like, "This is awesome."

But there were other reasons that the CNCF made sense for us anyways. This was everything about the CNCF exemplified the ideas and beliefs behind Linkerd.

Philosophically, we really aligned. They're very focused on, "OK. What is the set of technology that allow you to build these cloud native applications?"

Everything that Linkerd was being used for, we didn't really know that term beforehand. We were like, "Yes. This is exactly the things that that Linkerd can help with." It made sense in a lot of ways.

Grant: This is the Cloud Native Computing Foundation, right? CNCF.

William: That's right.

Grant: It's a part of the Linux Foundation? How do you describe it?

William: Yeah, it's part of the Linux Foundation.

Grant: Talk about that model, like we mentioned earlier before we started the call that you don't actually own Linkerd.

You contributed it to the CNCF, you are the primary contributors to that specific project. Is that right?

William: Yeah, that's right. When you give your code or your project to a foundation, whether it's ASF or the Apache Software Foundation, or the CNCF, there's things that you give up as part of that.

Part of that is you basically assign that IP to the foundation. You assign the trademark to the foundation, so it's not yours anymore. That sounds a little scary, right?

Some of our early investors were like, "What are you doing? You can't undo that."

But the reality is, if your code is licensed under Apache V2 anyways, that is an extremely liberal license and you're basically giving anyone the ability to do almost anything that they want with that code.

You can take an Apache V2 license project and you can build a commercial variant, and you can sell that thing and you can do whatever you want.

Who owns that IP is-- It's not like owning the IP to something that has a proprietary license.

That ownership is less important, really. What was important to us-- You do give up a little bit, but what was important to us was the fact that the CNCF was going to be able to steward and nurture this project.

For us, what was important for us was that Linkerd be a real open source project.

I alluded to this before, but we didn't want it to be something that you throw over the wall and it's like, "Yes. It's nominally open source. You can see the code, but we're not going to let you actually submit any pull requests. We're not going to tell you what the road map is and we're not going to answer any questions."

That wasn't the model that we wanted.

Grant: That's super interesting. So you're giving up some control, but that's control that you didn't necessarily want.

You wanted it to be more open and a bigger part of this, is that right?

William: Yeah, that's right. We wanted it to be alongside the projects that people were using in conjunction with Linkerd.

People were using Linkerd and Prometheus and Kubernetes as part of building their internal stacks. It just made sense.

Grant: Yeah. From the open source perspective and from the Kubernetes community perspective, the CNCF becomes somewhat of a kingmaker in terms of which projects it accepts and graduates and acknowledges.

That probably wasn't very obvious two years ago when you joined, it was the 5th project. But the market has shown that's a powerful signal.

William: Yeah. I think in some ways that's unfortunate. I know the CNCF really doesn't want to be the kingmaker. They don't want to set themselves up that way, but market perception sometimes is not something you have complete control over.

Grant: Right. It just happens to be how people perceive it. Not much choice there. Being a part of CNCF, it seems like that's been great for you.

It's helping drive more adoption, which is your primary goal as an open source project.

William: Yeah, it's been hugely helpful for that.

Grant: Have there been any downsides to it, or not?

William: No, not really. I think if we had a different business model, if our business model was "We are going to build a commercial version of Linkerd that's going to be enterprise-only.

It's going to be "Linkerd enterprise," or "Linkerd plus," or something. That was going to be proprietary, and then it would be a problem for us because we'd have to ask them for the trademark, and "Are they going to allow it?"

Maybe they would, I don't know. It would just be weird, but that was never our goal with Linkerd anyway.

We didn't really give up anything that that we wanted, and we just got the benefit of having this very sophisticated and helpful and philosophically-aligned organization helping us to nurture our open source project.

Grant: Yeah. If you think about the goal for an early open source company, you're saying is primarily adoption, that's the first and most important piece.

William: That's right.

Grant: The CNCF, I think, is likely an amazing path towards adoption to anything in the cloud native ecosystem.

William: That's right. It's certainly helpful.

Grant: OK, but obviously you're also a venture funded startup. So there's an expectation for some amount of, "This is going to be a business. This isn't just a side project, or it's not an open source project out of Twitter that you just do because you want to be able to hire better engineers. This is a company."

William: "They told us we could just build open source forever."

Grant: You might be able to. Tell me a little bit about what the early business model is, and how you're looking at looking at that.

William: This is where things get interesting. I think the world does not have this down to a science. I feel like maybe it's a "Grass is greener" perspective, but I feel like if you're building a straight SaaS startup, it almost feels like it's down to a science.

You need to have 1 million ARR within 18 months. Otherwise, "Get out of here." Then after that you need to double, triple, triple, double or whatever.

It just feels like there are all these benchmarks that people have in place for open source. It's much more of a Wild West.

There have been very successful open source companies, there have been companies that have seen tremendous adoption but still are trying to figure out the business piece.

There are companies that have not seen a lot of adoption, but that have a great business online. There's not a formula that's in everyone's mind that's like, "OK. Here's how you do it."

That said, there are a couple models that historically I think open source companies have followed that-- The classic one is the Red Hat model of "OK. It's open source, and then we're going to provide services and support on top of that."

The received wiIstiom is "Red Hat is the only company that could really ever make that work, no one else could really do that for whatever reason." They never go into why.

Then there's the engine X style models where we've got the open source thing, but if you want the features that are actually necessary for production, then you got to pay the big bucks for the enterprise.

Then there's a model of "OK. We're going to have a hosted version, the open source thing, you can do whatever you want. But we're going to have a hosted version and that's how we're going to make our money, and you won't have to operate it yourself so it's easier, and you get the economics of the cloud and all that stuff."

Then there's the fourth model, which I have called "The Buoyant model," This is the "William Morgan guaranteed successful open source business model TM."

Grant: Perfect.

William: Yeah. You may have heard of it. Here I think for us, Linkerd is one component of what we want to do as a company. It's just one component.

Right now it's what we're focusing on, we spend almost all of our time and energy on Linkerd and on adoption.

But I don't think it would be right for us to sell Linkerd. I don't think it would be right for us to have Linkerd Enterprise, or even a hosted Linkerd doesn't really make a lot of sense.

Because what Linkerd does-- The way it works is it has to be right next to your application code. For this to work we've got to be right there in the same data center, in the same pod, in the same cluster, in the same everything.

The hosted model, even if we wanted to do that wouldn't really make sense for us. But if you look at the broader picture, what Linkerd solves is a subclass of this broader problem of--

"Companies are moving into this cloud native world. Once they do that, everything's different."

The way that the code is working is different, the way that you structure your services and how they communicate is different, and Linkerd can help with that.

But everything else about the company is different too. The way the engineering organization operates is different, the way that HR operates is different, the way that finance operates is different. Everything gets changed when you move into this world.

So when you look at things that way there's a set of value that Buoyant can provide that's outside of Linkerd. That's probably not very engineering focused in the same way that Linkerd is.

So really I see Linkerd as a mechanism for enabling Buoyant to help companies with the other aspects of what is changing and how they have to change in adopting this cloud native approach to their product.

Grant: But that's not something that you're offering today, right? That's just the longer-term goal.

William: Yeah, that's right. Right now, all that we care about is open source adoption, and everything that we do and sell and build and say and buy is all in service of making Linkerd the best possible open source service mesh that solves these really concrete and immediate problems for engineers.

That's all that we do right now. Later on, there's some other fun stuff we get to do, but right now we've got to do that.

Grant: Those early business models are around support and professional services?

William: That's right. We do support and professional services right now. That's really the only thing that we do as a as a company besides build the core open source tech.

And I think everyone recognizes "OK. In the long run that's not the business model that you want, with the exception of Red Hat."

But it's actually great for us. It is. It's really good for us, and the money is nice of course. But more than that, what it gets us is it gets us insight into what's happening inside these enterprises.

We're a bunch of open source nerds, we understand how the open source world works and we understand Rust and Go, and how to use your package manager, or whatever.

We don't understand, "If I'm working at this Fortune 500 company and we're trying to do-- We've got all these initiatives and we're going to deploy this new thing and we're trying to make it cloud native."

We don't understand what problems you're having until we have this commercial relationship, which even if it's a support relationship, we're sitting in those meetings and we're learning about your roadmap and we're learning about what's painful for you.

That gives us a lot of empathy, which is very helpful, especially for software engineers.

And it gives us insight into the problems that we can solve for these customers in ways other than open source and professional services, so it's very healthy for us.

It's been a fascinating part of the Buoyant aspects of this.

Grant: The thing that I heard there for a business opportunity is, to get a seat at the table, to understand what the most complex problems are that's common across these large enterprises as they make this transformation towards the cloud.

Part of your goal is to use that unique seat at the table and that unique insight to figure out the next offering that maybe is a more commercial offering.

William: Yeah, 100%. I think you said the most complex problem, and it's really that I don't want to solve the most complex problem.

I want to solve the most painful problem that's the easiest thing to solve. I don't want to do the hard thing, I want to do the easy thing.

Grant: But maybe, you're using a cross-section of different organizations to identify what are the things that are consistently coming up as a roadblock to this future state of developer productivity and reliable systems, and things.

William: Yeah, that's right. I tend to be a pretty product-oriented person.

I put my product hat on when I had these conversations, and I'm always trying to figure out how we're going to get input from the right people, and how are we going to understand?

How we're going to empathize with the people who we have to build products for. Any opportunity to do that is something that I reach for as rapidly as I can get it.

Grant: As a vendor myself, the thing that I'm always like is "Pay us so that you can get us to focus on solving your problems all the time."

That's what we love doing, is solving problems. If you pay us X amount per month your problems become the problems that I focus on solving.

William: Right, exactly. It's the good old fashioned "Exchange money for value" relationship.

Grant: It's, "It works. Capitalism." OK, so are you actively trying to sell support in pro serve? Or are those more inbound interest, and then you'll provide a contract?

William: Right now, we're on the cusp. What we've done for the past few years is to be pretty reactive to it. We put a form up on a web page and we're like, "OK. If you need it, you need it."

That's starting to shift a bit, especially as Linkerd adoption is-- We're not just seeing the super cutting edge, super high tech companies that are very familiar with open source technology and very adept at wiring things together.

That was the vanguard of the companies adopting Linkerd.

Now we're seeing the second wave of companies who don't always have the time or in some cases the talent necessary to really be able to take an open source project like Linkerd and jam it together with Kubernetes and Docker, and get a deployed pipeline going and do all that with their own internal staff.

Or they don't have the time to do it, so things like pro serve start becoming more important for us, because we do want those companies to be successful with Linkerd.

They need help, and we should be able to help them do it.

Grant: Realistically, you are probably far more effective and efficient at doing it rather than the huge learning curve that's required to pick up this new set of technologies and workflows.

If you've done it a bunch of times for a bunch of different customers, it's not that hard for you to go turn it on for them.

Whereas it would take them weeks or months to get up that learning curve.

William: Yeah, that's right. One of the things I think I had to learn as a first time founder who had never run a company before is that companies are spending money no matter what.

It's not like you as a consumer, and you're like, "Should I spend $4 dollars for coffee at Starbucks, or should I-- Do I want to spend a dollar on this iPhone app? I don't know."

If you're a company, you're spending money. You're either spending it on employees doing stuff, or you're spending it on purchasing something.

You don't have this mental hurdle of like, "Am I going to spend this or not?"

The question for you as the buyer is, "OK. How am I going to spend that? Am I spending it on people internally or am I spending it on a product? What are the implications of that?"

That's a big mental shift.

Grant: It's a great point. Because it's acknowledging that their budget is there and that they're either going to invest a $150,000 dollars a year for an engineer to hire them on and manage this.

Or they're going to spend the same amount or slightly more or slightly less to bring in a team to do the same thing.

William: Right. Then you can just make the case to them. You can say, "Look. You want this thing, you can do it internally. It's going to cost you X, or you can have us do it and it'll cost you one tenth of that," or whatever it is.

Grant: Yeah. Let's shift gears a little bit here. I want to dive back into the service mesh world.

A lot of the folks that listen to this podcast are running enterprise software companies, maybe they're running a SaaS application.

Help me understand why I should be using a service mesh as part of my architecture in either of those cases.

William: The number one qualification is, "Are you adopting this cloud native approach? Are you adopting Kubernetes and Docker and microservices?"

If you're not, then service mesh is not going to help you at all. There's no reason for you to even think about it. But if you are, then you're going to need it.

Because what's going to happen is you're going to end up with 10 or 20 or 50 services running and you are going to be in a very precarious situation, because one small failure in one of those services is going to rapidly escalate to take down the rest of the site.

Or you're going to be in a situation where no one can understand what's going on at any point in time, because each of those services was written by a different team using a different set of conventions.

We've done all this work with Docker and microservices to enable this huge organizational decoupling, and now you can ship code even faster than ever before. But the state of production is now this totally Wild West state.

Those problems can be addressed directly with the service mesh, that's what the service can help.

It can add reliability, it can add visibility and it can add security to these systems in a way that doesn't involve the developers having to do anything.

This is a very platform-focused tool.

Grant: I'm wiring all the low-level components that I have that make up the various services in my application, be that databases or workers or cues or app layers.

All of those are using a service mesh for something?

William: It depends on where you want to start. Typically, people will start with the synchronous set of services that are often stateless, and then they'll extend from there into the database or the storage layer, and other things.

You can get very fancy and have lambdas and things like that, but the bulk of the service mesh usage that I've seen today is around the services that are often talking HTTP or GRPC if you're really fancy, t o each other.

They're serving an API or they're serving a web page or a transaction, or something that has a strict time requirement.

Like, "I have to respond to the user in 200 milliseconds, otherwise they are going to go away or they're going to swipe again, or something bad is going to happen."

Those are the situations where the service mesh can immediately give you visibility and increase the reliability of your services, and give you a bunch of security semantics.

Again, without you having to write any code or get the development team involved.

Grant: Are my services talking through the service mesh, as like a centralized hub and spoke? Or does the service mesh create the peer to peer connections between these different services?

William: There's two components to it. There's what we call the "Control plane," and then there's the data plane. The data plane is a set of very lightweight proxies that get embedded next to every instance of every service.

If I've got service A and service B, and A has 6 instances and B has 40 instances, you've got 46 instances of the Linkerd proxy that are stuck next to the other, just sitting next to each application.

That's fully distributed, and we do a bunch of magic to route the TCP traffic through those things automatically. The application doesn't even know that it's there.

That's the data plane, and that's fully distributed. On the control plane you have a set of components that are sitting off to the side, and they're coordinating the behavior of those proxies.

They're receiving telemetry from those proxies and you've got maybe three instances of the control plane running or something. If you're really concerned about high availability.

And the control planes providing you with a uniform API to change the behavior, and a uniform point of visibility into everything that's happening to the traffic that's going through those proxies.

Grant: Is it going through that encrypted, or unencrypted? How does it get through?

William: It's whatever you want to do. One of the big use cases for Linkerd is doing encryption transparently to the application for all internal calls.

What'll happen is A talks to B, service A talks to service B. That means A talks through its Linkerd proxy to the destination Linkerd proxy, to the destination B instance.

It's actually going through two proxies, so these proxies have to be fast for this to make sense. Linkerd can both initiate and terminate TLS on either side.

Now your application doesn't care about encryption at all, it doesn't know about certificates or any of that stuff, but you get encryption across all of the communication.

That's a really nice way of adding these security primitives without-- Again, without having to deal with the developer teams.

The reason why I keep coming back to that point is a lot of what the service mesh solves is actually these organizational issues where I as the platform owner, I want to add security.

I want things to be secure. But if I'm asking each of my developer teams to implement TLS and to do it in this particular way, they've all got their own roadmaps.

They've got their own product managers who are pushing them to do all this other stuff, so that becomes a very difficult challenge, especially as a company gets bigger.

But if I can just do this at the platform layer, then I have control over it and I have ownership over it, and the developers don't even know that it's there.

Grant: OK, then if I have a piece of SaaS software that I'm hosting and one of the requirements is really around enterprise level security.

Today, I think a lot of people will talk about on their security white paper that data is encrypted in transit. They'll talk about some little bullet point about TLS

So you're saying, "That's great. That's between the client and your server. But internally, your systems might be talking unencrypted. You should implement a service mesh like Linkerd in order to ensure that all communication in all systems is actually encrypted as it's moving around."

William: Yeah, that's right. In fact, you can drop Linkerd in. Linkerd ships with a CA, a certificate authority.

Linkerd will issue the certs, it will distribute them to the proxies, the proxies will do the encryption. It will initiate it and terminate it, it will validate against the certs.

You get all this stuff for free out of the box without the developers having to do any work.

Grant: So it allows you as an organization to improve your security posture from that perspective.

William: Yeah, that's right. There's this whole model, which I really like called "Zero trust networking," which is the opposite of what we had a Twitter.

What we had a Twitter, which is what many companies have had and still have, is we had this hardened perimeter. Getting into the data center was very difficult.

But once you were in, anything could talk to anything else. What is happening?

I think my sense is it's because people are moving to the cloud right now, you're running your code on these systems that you don't control and on a network that you don't have ownership of.

Maybe there's other tenants there that are doing who knows what. Now, you don't really have a perimeter anymore.

That means you've got to push all those security semantics down to the application layer, so a service mesh like Linkerd is a great way of doing that, because we can do the TLS for you.

We can do the certificates, we can provide things like cryptographically verified identity using Kubernetes.

We can tie that to service accounts, there's all sorts of cool stuff we can do for you that fit right into this model of zero trust security.

Grant: OK. You can actually use these certificates for each service to actually do an authenticated request, not just encrypted?

William: Yes, that's right. You can get very fancy with this. You can say, "OK. A is not allowed to talk to B because I haven't allowed it to."

Or maybe you can even get finer grained than that and talk about who is allowed to do what. Can we look at the user identity of who's making the request?

I've got to put a bunch of asterisks in here and say a significant portion of this is our roadmap items for Linkerd, this is not a totally solved problem.

But as Linkerd is used more and more in the enterprise, and as we have these relationships that we're seeing, as we watch people and help people adopt Linkerd for these situations, we're fleshing out this very sophisticated security roadmap.

Because these are things that people need to solve.

Grant: Yeah. Role based access control is something you're talking to there, and it's always super important for enterprise adoption.

You need to be able to implement the least privileged concept and deliver that down to every request, and make sure that's carried out.

William: The notion of doing it at the platform layer is part of what makes it so powerful. Because the same thing works with telemetry, or visibility, let's call it visibility.

We've already got-- Hopefully your services are already instrumented and they have all these metrics, and maybe you've got Prometheus or Datadog, or signal effects or something set up.

What Linkerd gives you is this uniform layer of visibility over all your services, and we can't tell what's going on inside the services, but we can tell you what the success rate of each service is.

And what's the latency distribution, and what's the request volume, and how are those things changing over time?

We can do it in a way that's uniform across every service, doesn't matter what language the service is written in or what framework it's using, or when it was deployed or any of that stuff. That uniform layer visibility, even though it's not the complete picture, is hugely powerful to the platform teams. Simply because it can be decoupled from what the developers are doing.

Grant: OK, shifting gears a little bit again. You were talking about cloud adoption, and we're talking about open source. I think one of the interesting areas around open source is the role that these large cloud providers play in the open source ecosystem.

Obviously there's a couple different angles on this, but I'd love your perspective because particularly one of the cloud providers has a service mesh that they talk about and they host.

Istio that Google has seems to be a big focus there. Maybe talk about how that impacts your company and your roadmap.

William: When Istio first appeared, it was very frightening. Because we were used to operating in a world where we had no competitors.

We invented the service mesh term, Linkerd was the service mesh. It was getting a ton of visibility interaction. Then all of a sudden, the 800 pound gorilla comes into the room.

It's like "I'm Google and I invented Kubernetes. Now I have invented Istio." We were like, "Oh no."

But over time we've found that it's actually been really helpful for Linkerd, because it served to validate the space in a major way.

It's not just this little project, Linkerd sitting off to the side being like, "Service mesh is a thing." Now it's Google and IBM and VM ware, now Hashi Corp.

Everyone has a service mesh project or product. That's been hugely validating for Linkerd. There's a lot more I can say about the design differences between the two, and so on that I won't go into here.

But if you look at the goals of Istio, like why is Google doing Istio? Why is it investing all this time and energy in it?

It's because it's a strategy to get you on to Google cloud. Google cloud will run Istio for you, or you just check that checkbox and now you've got Istio.

That's great, because it gives you a bunch of functionality. In fact, the value proposition for Istio is very similar to the value proposition for Linkerd.

It's got visibility and it's got security semantics, and it's got all the other things that it has.

But it works great on GCP and then doesn't really work that well or doesn't really exist in other places.

It's a mechanism, like AWS app mesh, to get you locked into these cloud platforms. That's fine. That is the game that the cloud providers play.

They want to get you using their thing, so they build functionality that is powerful and specific to their platform. Then you rely on it and you can never leave.

The goals of Linkerd, obviously, are quite different. We are an open source project, we're not a cloud provider.

The design philosophy behind Linkerd and how we expect you to run it is all very different.

We expect you to operate Linkerd yourself, so that means that we care very much about the complexity that it introduces.

How big is it, and how many system resources does it take?

If this is built as a as a feature of a cloud provider, then you don't care about that stuff because you're going to eat that stuff yourself.

You're not going to expose that to the users. That ends up informing a lot of design decisions.

If you look at the two projects, they're very different because even though the value props are similar, they're very different in nature and in product shape because the goals are very different.

Grant: One other piece I think that's interesting is tying it back to the CNCF, is that Google has obviously contributed Kubernetes to CNCF, but it hasn't contributed Istio.

William: That's right.

Grant: Do you have any thoughts on why that might be?

William: I'm not privy to the internal discussions, but my sense is that Google will not ever contribute Istio to the CNCF.

They say they will, and maybe they will. My sense is that they don't want to.

They want it to be the thing that ties you to GCP. They want it to be a value add that really only works in GCP, and maybe in IBM cloud because there are IBM folks involved as well.

They're motivated to make it nominally open source, but not to make it really open source. Because again, they want to get you using it on GCP and they want it to be a great experience.

If you were on GCP and you click the checkbox, you get all this cool stuff for free, and that's great.

Grant: Yeah, it's interesting. I think there's a lot of conversation, not just around the cloud providers coming out with competitive alternatives, which is what we're talking about.

But about the nature of cloud providers and their ability to take an open source project-- Like the Mongo DB example, where they can take these projects and host their own version and extract a lot of the value from that open source project.

William: Yeah. That's certainly been a hot topic in the open source startup community recently, because of incidents like the Mongo DB thing, and because of startups changing the licenses to counteract that behavior.

Grant: Or, at least attempt to counteract.

William: Or attempt, right.

Grant: Yeah. There's this very interesting question around, "How will these two forces, infrastructure as a service and open source, how will these shape the enterprise software ecosystem in the decades to come?"

William: I am so hyper-focused on this one little microcosm that I don't have a great big picture view.

My sense is that the days where you could build a company and say, "OK. The way that we're going to operate is we're going to have an open source project, and then we'll have a hosted version of it. That's how we'll make money," and that's the end of your business plan, I think those days are numbered.

I don't think that's a viable strategy for four new companies. I think existing companies for whom that's their strategy, OK, that's well and good.

You'll probably have to do additional things beyond that, but I don't think that's a viable strategy for a new company.

I think you have to do something different because if you just do that, then the moment your project become successful, someone like AWS can just offer a hosted version of it.

Maybe it won't be quite as good and they won't have your drive and your talent and the core DNA, but they'll have the source code and they know how to operate services at scale.

That becomes a real threat if that's your business model.

Grant: Yeah, it's an interesting world. There's so many different angles to take on this. I like your perspective of, "We just want to get into these problems and figure out how we can solve them."

At some point, do you think you'll offer something proprietary?

William: Yeah, 100%. I think there are so many problems out there that we can solve in a very concrete and immediate way with proprietary software that companies will want to pay for.

I mean, there's things that just don't make sense as an open source project for you to do. It's like, "OK. Have a huge identity system that maps into existing identity systems that companies have."

You could, but it's not really-- It's weird for an open source project to do that.

It'd be weird for Linkerd to have a bunch of logins and stuff. It's a dashboard, there's no reason for Linkerd to have a users database or something.

There are things that just make sense to be solved with proprietary software, and there's so many problems that these companies are having that I'm very optimistic about, because there's so much that we can help with.

Grant: Sure, yeah.

William: Linkerd is one part of it. It's a big piece of the puzzle, but it's not the whole puzzle.

Grant: I can't just deploy my entire company on Linkerd?

William: Yes, of course.

Grant: Just the whole thing is Linkerd? I don't write any other code, just Linkerd?

William: Yeah, absolutely. That's how Buoyant works.

Grant: Yeah, true. William, thank you so much for today. This has been great. I really appreciate your insights.

William: Grant, this has been a complete pleasure.

Subscribe to Heavybit Updates

Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.

Content from the Library

Visit library

Oct 19, 2022

Podcast

The Kubelist Podcast Ep. #33, Tailscale with Avery Pennarun

In episode 33 of The Kubelist Podcast, Marc and Benjie speak with Avery Pennarun of Tailscale. This conversation explores VPNs,...

Apr 7, 2021

Podcast

The Kubelist Podcast Ep. #12, Istio with Craig Box of Google Cloud

In episode 12 of The Kubelist Podcast, Marc speaks with Craig Box of Google Cloud. They discuss Istio’s features and community,...

Nov 11, 2020

Podcast

The Kubelist Podcast Ep. #6, Linkerd with William Morgan of Buoyant

In episode 6 of The Kubelist Podcast, Marc speaks with William Morgan of Buoyant. They discuss the complex service mesh...