October 10, 2019
Ep. #13, Cloud Wrangling with Natalie Bennett of Pivotal
In episode 13 of O11ycast, Charity Majors and Liz Fong-Jones talk with Natalie Bennett, Software Engineering Manager at Pivotal. They discus...
In episode 38 of EnterpriseReady, Grant speaks with Brandon Heller of Forward Networks. They discuss tactics for improving enterprise networks, tips for hiring engineers, and insights on the many roles co-founders play as they launch their companies.
About the Guests
Grant Miller: All right, Brandon, thank you so much for joining.
Brandon Heller: Great to be here.
Grant: So I'd love to just jump right in and kind of get a bit more context on your background and sort of how you got into enterprise software.
Brandon: Yeah, so I love to make things.
That's the simplest version.
And so I started out actually as a mechanical engineer in school and then took this class in digital computers and thought, "Man, that's really interesting how a computer works."
And we got to build a computer from scratch where we did the assembler, we did the program, we did all of the VHDL and that time logic that made the thing run.
And so in college, I got more into computer science, more into computer engineering, and a professor at Washington University in St. Louis I was working with, had a project that was kind of half computer architecture, half networking.
It was, how would we change the way the networking is done by enabling software to define the behavior of a router, a single box?
And I thought, okay, that's pretty interesting.
We're trying to get something working at 10 gigs.
So it's really low-level software.
And on the architecture standpoint, the processors we were using were Intel IXPs.
And this was early 2000.
This was before the multi-core or mini core era.
And so the notion of 128 thread contexts and hardware, 16 physical cores was completely foreign, right?
You had one or maybe two processors.
And so that got me more into networking 'cause there was a project to get started on it. And I kind of got restless.
St. Louis is, I love it, but it's kind of flat.
And so I looked for where else to go, I ended up at Stanford, and pretty much the first week I was there starting out with a project where my advisor, Nick McKeown, his goal was to change networking, starting with how to make network researchers create again by enabling the kinds of projects that they would test on and show what work they could do as grad students to be demonstrated at much larger scales.
So like, imagine you're trying to demonstrate a data center.
If you've got some little software router, it's not all that interesting to the people in industry who need to believe that this thing is real, that it's going to go somewhere.
And so the OpenFlow project was what we started on.
This was summer of 2007, and it's a simple idea.
It's like picking up a phone in my hand right now. If I want to load an application, I can.
And if I want to load an application onto my laptop, I can.
But if I've got a network and I've added all these devices, how can I do new things on that infrastructure that I already own?
Whether I am an internet service provider or whether I am a garage tinker, or whether I am a researcher.
And so a few years later, this concept of software-defined networking emerged and the core concept was kind of the same, but just one level higher that you could control the behavior of an entire set of network devices as one, you could do totally new things with them.
And so that just kind of sustained my interest.
It was a little bit of architecture and thinking about what should the API to the network devices be?
How should we control them? And eventually, it was 2013 and you have to graduate, you have to do something to graduate.
And we knew this technology called software-defined networking, we wanted to see it go out into the world.
And how do you get a technology out into the world?
Well, one way is you train people on what's already out there.
So we created this software-defined networking academy and went out there and went to big companies.
I went to Bangalore, I went to Sweden, I went to exotic Roseville, California, near Sacramento.
And we did all these boot camps in software defined networking.
So from day zero, people would learn what it's like, what this changes and then if they wanted to, it'd be practical hands-on skills.
And that was great fun, right? 'Cause you get to be the presenter, but also the content creator.
But that doesn't scale, right? Software economics, completely different world.
And the rate at which people at big companies would be able to take those ideas and run with them, just didn't match what we wanted to see, right?
We didn't have time to wait. And so there is an idea that kind of had been percolating, what if there was a safety net for networking?
What if you could actually see what was going on in your network no matter what was there, no matter how big it was?
And so we all were network operators, we being the four founders of Forward Networks at the McKeown Group Lab in Stanford, we're all systems builders.
We were all network operators at some point when we were trying to run our email over this completely new kind of OpenFlow-based network to get it up and running before we put it in the hands of others who could use.
And so that dogfooding experience of being a tiny network operator, let's say 20 devices max, shows you just how hard it is 'cause we had no idea what was going on inside the network.
It was a black box to us and it was harder 'cause it was a completely new stack of components.
You would think that in the real world where the stacks have been hardened for 20 years, that it would be much easier.
And it turns out it's vastly harder because when you go from 20 to 20,000 or more devices, it's hard to even have access, let alone know what's going on.
Every device speaks a completely different language.
And so we learned pretty quickly talking to real operators, the problem is of a completely different magnitude; of understanding what is happening in a network, so you can actually get shit done, right?
So you can make changes, so you can figure out what the problem is.
And so one of the founders, Peyman had created this entirely new way of thinking about networks called Header Space Analysis.
And it was an algebra for networks that would scale to an entire network.
Now, small network at that time, let's say, 10-ish devices, but it had all this potential.
And we thought, okay, this thing is bigger than SDN because imagine any network in the world that's been deployed and every network we saw was a mix of components that were 15 years old all the way up to 15 months.
If you could make sense of that with the model that this Header Space Analysis could build, then you could really transform the way that people operate and interact with their network.
You could give them that access to information.
And I think the analogy is the best way to explain things in general, so I'll use an analogy here. If I want to drive somewhere, I used to--
I remember 1998, 1999, I would open up a paper map. You remember that, Grant?
Grant: Of course. Well, paper maps and then eventually, we got to, what was the MapQuest?
That was the next generation.
Brandon: Yeah, MapQuest was a revelation.
It was terrible, but it was still a revelation 'cause it was very different than
Grant: The directions then printed out.
Brandon: I remember printing them out. Yeah. It was so much better.
Like you would look down at the paper directions occasionally as you were driving and you would hope that there wasn't any construction and that there wasn't any traffic, but it was still vastly better because that information of how do you get from A to B was organized and it wasn't up to you to look in the index, figure the source, destination where you're going, find the different cells with an interstate to follow.
It was just a lot of work to get to grandma's versus no work or a little bit of work, you still had to do the navigation. And then GPS came along and it completely changed the way that we navigate. I get in a car, I say, navigate to Grant of destination and it takes me there.
And another analogy is when we need to go somewhere on the web, like I need to find out information of literally any kind, what do I do?
I go to a search bar, I say, show me this.
And most times it's the first hit that takes me there.
And even if it's not the first hit, it's a very different way of operating.
And imagine that you don't have a search engine, but you need to navigate the internet.
Or you've got a paper map and it's five years old.
That's what it's like for network operators and administrators today.
And that's what we got a taste of, just a little taste of enough to know that this was a real challenge and this is what we're solving.
So we're at 2013, we have to graduate.
We graduate, we start traveling the world, teaching the world about software-defined networking doing these boot camps, but decide to start a company.
Grant: Cool. And so did you have like a technology at that point?
Were you commercializing something or was it more just like, we want to start a company around this platform shift, this movement to SDN and this movement to sort of like this next generation networking?
Brandon: So we knew at the time that this Header Space Analysis technology was pretty interesting.
Brandon: And we didn't have any particular ownership of it because it was effectively open-sourced, it was academically published.
It was fully disclosed as part of Peyman's thesis.
He told the world, he shared the source code.
And what we thought would be a great start and would get us pretty far, we learned that the real world is a crazy place and the ways that people configure their networks, every network is different.
You go to one bank and they bought a bank that bought a bank.
And so there's no such thing as one network, there's a whole bunch of different ones that magically inter-operate with, the bailing wire and duct tape is the phrase.
And every device speaks a completely different language.
And even within one vendor like Cisco, you may have completely different models that have different configuration syntaxes.
And even within one model, over time the meaning of that configuration can change in ways that would not make sense to any normal software developer.
Cisco ASA 8.2 to 8.4 had a change to the order in which firewalling and network address translation was done.
And so the same configuration before versus after if you missed the release notes, you'd go from having a network that was secure and functional to insecure and not functioning.
Brandon: And those are crazy things for a minor 8.2 to 8.4 kinds of change. We would never do that with APIs.
Brandon: But those are the kinds of things that happen. I don't mean to single out one vendor out.
We have some blog posts on this, network path not found, I think is the phrase for that one if you search for it on Google.
But there are just so many examples out there of crazy things that you see at scale and in the real world that in the lab, you would have never guessed.
You would have never guessed that you'd have enormous number of rules to translate network addresses, like 3,000.
Normally, I think it's one or two or three. Why would you need 3,000?
Or why would you have 400,000 lines of configuration and a firewall?
That seems like a lot. And those are all things that we've seen.
And all those things made that scaling challenge much harder.
And so what we thought was working and was working at 10 devices, really wasn't going to get us to the end goal.
And I remember walking into eBay and they were so excited that we supported their devices and the capabilities we showed we could deliver with a real live demo in front of them.
And then I think the conversation went, "Well, our smallest cluster is something like 2,500 devices. Can you support that?"
And I couldn't honestly say yes at that time
Grant: Because you hadn't tried it.
Brandon: I couldn't
Brandon: I'm not going to lie.
Grant: So like maybe.
Brandon: At that time, we didn't support it.
I said, "Well, I think, and here's why and if we stay on this trajectory, we're going to get there and you should trust me."
And he'd been burned in the past and lots of network operators and engineers and decision makers who run those teams have been burned in the past by software that failed to deliver.
And so we're not just in the enterprise space, we're in the enterprise networking space where people have been especially burned in the past and shelfware has been a real challenge.
And so the bar for us was not just that we supported what they had.
It wasn't just that we scaled to the size of their network.
It wasn't just that it was usable.
It wasn't just that it was secure.
It wasn't just that it worked when they actually opened up the box and installed it.
The bar is pretty high there.
Brandon: They have to believe it's going to work.
Grant: I want to sort of dive in a little bit more to understand sort of the network operator, like that persona within kind of general corporate IT, like who is this person?
Who are they working with?
What are their responsibilities versus the responsibilities of other corporate IT admins?
Brandon: Right, so our target audience are our best users, the ones who use our product, most on a daily basis are ones who take tickets.
These are network operators because there's something that's wrong.
And typically, the first thing to do is point a finger at the network side and say, "Is it the network?"
Maybe it is because from an application developer or a DevOps or a security perspective, the network tends to be a black box.
So it's not that they necessarily want to point the finger, it's that that's the easiest thing to do because they don't have visibility into that.
And so often, that question, is it the network?
Arrives on someone's desk and until it's resolved, you've got to figure it out.
You've got to troubleshoot, you've got to prove innocence.
Grant: Someone who is managing like the actual physical machines, they can say like, "Look, my disc spaces is empty. I've got plenty of memory and CPU available, so it's not my problem. It's running, it's must be something in the network."
Is that sort of how it happens?
Brandon: Yeah, often they'll look at their own counters.
They'll look at where they would think to look and then they'll point the finger.
And the challenges are often at the boundaries of layers, right?
It's when the fingers point at each other, but each side doesn't understand the other one well enough to understand that it's actually the interface between them.
It's the expectations one side had or in the world of networking, maybe there's an overlay network, one that runs on top of another, but where traffic goes from an irregular underlay to that overlay, there's a change in behavior.
There's a restriction on the size of the packet.
There's some issue that's preventing that traffic.
And what makes it even worse is network is a completely dynamic environment.
So just because it worked before doesn't mean it's going to work now.
And just because it broke now doesn't mean that it just broke.
It could have been broken months ago.
And so you can't just look at what's changed.
You do look at what's changed. Always, that's the first thing.
Brandon: Did someone make a change?
Send out the slack who changed it?
And if no one changed it, it could still be broken and it could be a result of the network because as traffic moves, maybe the configuration wasn't done consistently.
And when you get to large scale, like for internet service providers or media companies or financials or government institutions, these are kind of the customers I interact with the most, compliance becomes a huge issue.
Just knowing if it's configured the same across all devices.
Because at that scale, there's always going to be one that missed the memo, right?
It isn't configured in the same way as the others and not having knowledge that everything's consistently and correctly configured, gets in the way when you're debugging.
You go in with no assumptions.
You have to assume that everything's broken in some complicated way.
And you have to basically establish the truth of what is happening right now, not what was happening a week or a day ago.
What is happening right now, so you can start tracing the traffic to validate assumptions you've got and see if they're actually true.
And that is a manual process. That is what operators do.
And when they can't figure it out because they don't, humans inherently, no matter how good they are, can't make sense of an entire network of that scale with that much complexity in their heads.
So they have to go to their teams, they have to escalate.
They have to go to the engineers who might've set up that network.
And hopefully, it's just one engineer who gets the escalation.
And so there are places where the engineers are actually the most loaded ones because they get all the tickets because there's only so much the operators can do with the knowledge and the skills and the access that they have.
We can completely change that by blurring these lines between those roles and giving everyone access to where traffic is going.
It's like everyone has Google versus no one has Google.
Everyone has access to the same information.
You can all get sports scores instantly.
You can all get information about that topic you need to know, and you have the same speed.
The bar has been lowered. So the whole notion of take an operator, it's their first day ever seeing the network.
How can you give them the knowledge and the insights that the most senior, 10 years in it, in this particular network, for this typology, with these devices has?
And that's what we're trying to do; make information accessible so that every change, every troubleshooting event, even automation that you want to do to actually make the network better if you get the time, becomes possible.
Grant: Great. Okay. So that's super helpful context.
Now, let's rewind just a half second now about, you come out of university, you have this like Header Space Analysis thesis, you're working with your four co-founders that were all like network operators at Stanford in a lab, is that what you said?
Brandon: Yeah, we were grad students in a network systems research lab.
So we weren't measured by the number of papers that we were generating at all.
We were getting systems into the world to enable new ways to use networks, to make them bigger.
And our specialty was kind of the core infrastructure that could enable others to get their work done.
So for example, projects that come out of this group is something called NetFPGA. So an FPA is reprogrammable hardware.
So if you had a box you can plug into your computer and you can now turn into a switch or a router or kind of load balancer, or even do some kind of, who knows, other kinds of computation there, then you've enabled all kinds of research. And so it's satisfying when there are thousands of citations for systems that have come out of your team.
I had a limited impact on that project.
I used it in one class, but that's exactly the kind of thing or OpenFlow.
It's an open standard that when implemented within a year by a whole bunch of vendors meant that you could buy like a switch or router off the shelf and make it do something completely different.
And then you can test your idea at line rate on real hardware at scale without having to build a router or switch and actually maintain the overflow spec for three years.
And it was pretty fascinating trying to take this capability and evolve it to expose the hardware functionality that was there so people could really get the scale and the speeds that they needed and the flexibility interacting with vendors which weren't used to thinking in this way.
They weren't used to thinking how can I enable my customers to do completely different things with my product that I didn't expect?
It's an open ecosystem play. And then other things from that group, it's very much applied networking research.
Grant: Okay, so a lot of interesting technologies have emerged from this sort of graduate group out of Stanford.
And then some of those have been commercialized and I'm sure there's like successful companies and maybe--
Is Nicira one of them? Is that one of the things that came out?
Brandon: Yeah, Nicira is probably the biggest success story, right?
So Martin Casado left Stanford summer of 2007.
And the seat that his butt was keeping warm was pretty much the one I happened to find my way into.
So maybe there's some good karma that I happened upon.
And so that summer, they were in a tiny little room, about a mile away building this company called Nicira trying to figure out how to make networks more flexible.
And they focused on the virtualization space.
So if you use NSX, which is VMware's network virtualization product, that completed the three-legged stool for VMware of compute virtualization and storage virtualization and network virtualization.
When you had all three, you could really unlock the potential of everything.
So when your VM moves around, it's tables follow it.
So yeah, that's one example of a kind of technology influenced by work done at Stanford that found its way into the world.
Another company called Big Switch Networks recently purchased showing how you could use commodity boxes off the shelf with custom firmware to operate a network that's very similar to the ones that the biggest web giants run; Googles and Amazons of the world.
So there are a lot of big ideas coming out of there.
I just got lucky to be part of that.
Grant: Great. So you've kind of seen some of this success before, you'd seen some predecessors do it and you just kind of like, "Okay, the four of us can start this company."
When you brought this to eBay, it was Forward Networks and you were bringing this to them as a sort of potential early beta customer design partner. Is that right?
Brandon: Yeah, it's just one example of the kind of company that we talk-- I mean, we don't have a commercial relationship with them at the moment.
Brandon: That'd be great too, but it's a telling story.
I think the other telling story, as we start to get more into commercial deployments, so we were partnering with a bank, overjoyed to get our software into their lab where the security considerations are very different, right?
The bar is very different in a lab versus a real world, but this is a product where the value is the highest when you're in the real world, when you're on the actual production network.
And people are naturally hesitant to do proof of concepts on a full production network when if you do something wrong, it's their job on the line.
So we have to build a lot of trust.
And so early, we would start out in the lab and show what was possible.
And the best feedback I ever got was about a year into the founding of the company.
And it was from a financial operator and this guy was really, really up to it, right?
He was tip of the sphere, looking for ways to bring automation into his network, to level up the people on his team.
And he had deployed it in the lab and worked with a few people.
And he said, "Brandon, my team can't use it. I love it, I love all the functionality, but my team can't use it."
Because we weren't speaking the language of our users.
We weren't speaking in IP addresses.
And we weren't speaking in network functions that they understood.
We were speaking the language of header space.
We were speaking in academic language that didn't make sense to them.
And that was one of the best pieces of feedback we ever got. "I can't use the product."
It has to be to a level of usability.
And so early on, we got all these lessons.
We got the scale lesson, for me, it has to be so big that I trust that it's going to support a larger network so that if my network is special in some scalability way, it's going to work.
I'm not going to waste my time on a POC with some like-- 'Cause no one has time.
Network operators and engineers, they're like firefighters that are so busy putting out fires that they don't have time to learn about sprinklers.
They don't have time to add firebreaks. They don't have time to train their people to get better fire engines.
And so just getting on their calendar is often the biggest challenge because they're so busy running what they have, let alone spending the time to get ahead.
So another lesson that, the biggest one, I imagine you would ask me about pivots at some point.
And that's a question people tend to ask, like what was your biggest pivot?
Did you do a pivot? And a few years in, the third best piece of feedback or statement is the same guy who had said, "I can't use your product."
He said, "So network data needs to stay in the network and I'm not on the security team. So I don't make her change the policy."
Have you ever heard anything like that? Like data has to stay, has to be on-prem, right?
Grant: Seems familiar.
Brandon: Once or twice?
Brandon: So we had been a cloud-only product at that point.
We thought, it's great, right? Everyone can get the data.
Grant: Yeah, yeah.
Brandon: Why won't they do it? It's the modern way of the future.
Remember, this is 2013. We thought everything's moving to cloud. Everything. Immediately.
Grant: Yeah, yeah. It was a fairly common belief at that point.
Brandon: Maybe some people take a year or two.
Brandon: And so we got the feedback, network data has to stay in the network.
And at that point you go, "Do I want access to three letter government institutions that have physically air gapped networks? Do I want access to financials where outages are reported in the news and they have to report to the SCC and they have to-- It's seven figure cost every time anything goes down. Or even just an internet service provider wants to control the upgrade cycle because they're deploying a network and they want to have that control."
There are so many reasons that companies go on-prem or even simpler ones like it's the only way that they can buy the product. They can't buy subscriptions.
There's so many reasons that companies may only be able to in this moment consume software that is offered on-prem that we had a decision point and we had resisted it for two years.
And we went to the board and said, "We really don't want to do this because it means we don't get access to data to make the product better.
Why wouldn't customers want us to deliver a better product?
We want to deliver a better product. If we have access to data, we can test everything in advance.
We eliminate the vast majority of risk every time we want to change.
We want to get new features out, not just every month, we want to get it out every week, every day.
So how do we do that if you don't have the data? What about the support costs?
And what about the challenge of just building a virtual machine or the instructions combined with systems that we test so they can run this thing in their environments?
Every environment looks different.
How in the world are we going to handle this?"
And the realization was, if you wanted to unlock the market and most of our enterprise Fortune 500 market is giant companies that have giant networks, that's where the value is. We had no choice.
Grant: Yeah, that's interesting. And so, I mean, you built your own way of doing that for a while, right?
Brandon: Yeah, so I actually led the effort to build that for a while.
So I was familiar with virtualization and I was running engineering at Forward at that time. And so it was work.
We looked at what was out there at the time. This was about 2015 or so, there were a few companies and it didn't seem like any of them were ready.
And from my perspective, right, we have to trust the operation of our business and the customer experience on every dependency on every vendor that we partner with.
Grant: Yeah, for sure.
Brandon: And so it was the reverse of the eBay situation. It was, can I trust a company to partner with so that every time my software is deployed, that customer gets the best possible experience, and there are no bugs from that dependency or from that partner, depending on how you see it?
Brandon: And I didn't see anything at the time that seemed like it was far enough in the market to where I could look at case studies and believe that they had already solved the problem at a bigger scale and more environments to where I could trust that we could do that.
I had no desire to build this, right?
My team had no desire because what is it that we do uniquely in the world?
It's we built software that understands a network, that scales, we build APIs to that to offer to customers, we build a UI on top that's beautiful and easy.
And I'd rather have every person making the product better, making the experience better.
Brandon: And packaging does not represent delivered value. It does, but it doesn't represent that, right?
It is delivered value. Your thing has to work.
Brandon: And you have to get the logs and you have to get both visibility and it has to just work in every environment.
So there's an enormous amount of value, but it's non-differentiating differentiating value.
Grant: But you invested a lot in this. I mean, you built your own configuration and administration and everything else for delivering a VM in first, right?
Brandon: Yeah, so we built a virtual machine and we knew that we wanted to update that software easily.
And so we shipped a virtual machine, a disc image based on Ubuntu and all of our product really was inside a container.
And then there's a little bit outside of it. We call it Admin D, an Admin Daemon and its purpose was to enable upgrades.
And so our upgrades would be effectively a container and a script and the script would typically just replace the container and start it up and we'd be good.
And I mean, I'm oversimplifying.
Of course, there's a lot of stuff you need to figure out behind the scenes, like, well, what if it doesn't work?
What if the upgrade fails? Well, how do you get access to that system?
How do you integrate authentication with this?
Well, now you got a separate authentication method to get in.
Is it a Linux user account or is it some other kind?
So you got all these questions that you have to figure out and the answers aren't necessarily obvious as to how you'd build it, how closely coupled that container would be to the virtual machine you ship, how customizable that virtual machine should be, I could drone on and on about it, but it's work and no one wanted to do it.
But we did it because it unlocked our revenue.
Brandon: It enabled our product to be used with the constraints of the people who'd get the most value out of it.
And if you put it like that, it doesn't seem like there's a choice.
Grant: Right. And it's funny because particularly a sediment that we felt was prominent back around that same time, maybe a couple of years before, was people just didn't want to build versions of the product that could be deployed on-prem.
And we looked at some of the companies that were doing it, like GitHub, I think was a great example and we modeled a lot of the stuff that we did after what GitHub Enterprise had done.
And they solved a lot of the first principles problems.
We looked at that and we said, "Well, the challenge, if you don't do it, is you're limiting so many potential users from being able to actually use your product."
And so there's this amount of empathy that you can have where you're saying, look, you're at some big bank or you're at some three-letter agency or you're at some, just like organization that has a lot of sensitive customer data transversing throughout these pipes.
And in order to get them your solution, which you know is superior to what else is in the market, this is the only way you can deliver to them, right?
So either you have to sort of accept that reality or withhold your product from so many potential users.
Brandon: And we were dragged, kicking, and screaming into this whole thing.
And again, to use an analogy because I'm a total space geek, and I love keeping up with all the developments recently, and I think the best analogy for shipping on-prem software is a mission to Mars.
Grant: All right, I love this.
Brandon: So anywhere on earth, you can pretty much call and trust that you can get the call through.
Maybe you need a satellite phone in a few places, but the potential for real-time communication exists.
You can get access to the data, you can get access to the people.
As soon as you have a mission to the moon or really to Mars, the communication delays get in the way. And so the other thing is you can't get new resources to the people headed there anytime soon, right? There's a two year window, every two years you have an opportunity to go to Mars.
That's the minimum energy orbit.
Anyway, the colonists going to Mars or whoever's going to plant their flag, they need to leave with everything to stay alive for those two years.
And when they want to make a call home, depending on where they are and the galaxy or in the solar system, it's potentially 11 or 12 minutes.
The recent Mars Rover landed, and it was, I think it was like 10 or 11 minute lag because at that point they were the other side of the sun from us.
And so imagine having to communicate with that amount of delay and that's what it is.
And I don't just mean for three-letter agency level of true air gap security.
I mean, any environment where you don't have direct access through a network to a production system that may need to run your software.
And if you're running a bank, right? If you're running a media company, what's in that cluster is super valuable.
So you don't want to give access.
You have a very good business reason to lock that access down.
But from a support perspective, your sales engineers or your customer success representatives or who was ever enabling the customer to be successful, maybe it's just the documentation you provide, they need to have everything.
They need to have all the provisions and all the knowledge and all the training to survive on their own without your help.
And so the level of trust you need to have in the rocket and the level of trust you need to have in the instructions on how to fix the rocket when it inevitably breaks, it's really high.
You've got to trust that thing with your life.
And I feel like it's the same way, you've got to build to a level with on-prem software to where, if it is deployed in an environment where people don't have immediate access and you don't have access to data to diagnose, they've been enabled to solve problems independently.
And that's a really different bar.
It's so much easier when you're running a cloud and you just SSH into the box and you see what's going on.
Or NPS what's going on, not hard.
You get a real-time metrics, you get the data, you could copy it onto a machine and you can test out your fix.
You can run a debugger on the problem.
What if you never see the data and you know that you will never see the data?
Then every database migration has to be tested to very different level.
Every feature evolution you do has to be thought of with a higher level of certainty.
So the level of testing raises and how you build your product also it's all that work to package it and make it easy to consume.
Grant: Yeah, exactly. I mean, I'm going to steal this for our team at some point because we think about all the challenges of making on-prem software so much easier, right?
The whole concept for Replicated has always been like, "Hey, on-prem software has always sucked."
One world has tried to say, "Well, let's get rid of on-prem software by making SAS really great and really secure and make people feel really comfortable with SAS."
And we're always like, or we can just make on-prem software not suck, right?
We can make it easier to use.
And that's our goal. And we talk about that, that's our thing.
And so I love this idea of sort of seeing it as this mission to Mars, because that's super inspirational and interesting and it gets you, it really does you, right?
Like there's so many pieces and things that we've built that I sort of like think about that fall into this category of like, oh yeah, that's how you prepare your customer to do automated analysis.
That's how you prepare your customer to manage their own upgrades.
So it's like, I love this. This is super helpful for me too, so.
Brandon: And that problem we solved wasn't even really the mission to Mars.
It was a mission to the moon. So it's much closer, it's much easier.
People have already done it 50 years ago, problem solved because it was one box.
Brandon: And as soon as you want your product to actually scale and there are a lot of good reasons, it's not just that we wanted to support larger networks, it's we wanted to support the networks they had with shorter processing delays.
We wanted more users. We wanted faster search results and more of them or the different processes occurring.
So our software has to collect data and it has to process data in a batch form.
And it has to do real-time interactive queries, three completely different kinds of processing.
So if you care about the operation of this thing and you want to see that your data is collected as fast as it can be, and that it's processed as fast as it can, you probably want to isolate some of these completely different processes.
So that as well as in the middle of the night when the power goes out or a machine's network card breaks, which happens in the real world, you want to know that nothing's going to break and the system is going to continue operating and no one has to even receive a page or it's more of a warning than a, "Hey, take immediate action 'cause we can't operate the business till you fix something."
So those are three reasons right there; scalability, fault tolerance, and consistency of processing and the isolation around that, that our biggest customers want.
And they asked us, they said, "We want these things because then we can really put your product in the critical path and depend on it daily. And we want to do that because we see the value."
And so the real problem is when you go from one box to N.
Brandon: When you want your product to scale and be fault-tolerant, completely different world.
So in 2015, we had to solve the easy problem, one container, one box, but as a lab crew where we had more and more pieces that we wanted to evolve at different rates, we wanted more and more containers.
And so that's kind of how we fell into looking at Kubernetes a few years later as a possible solution to how we might run our product across multiple nodes without adding complexity around its operation.
Grant: And then, I mean, obviously you guys decided the Kubernetes was the sort of the right choice.
And I think you made that choice before it was necessarily the only choice, right?
I think now it's kind of clearly the right choice and the only choice.
But at that time, there was probably still DCOS and Docker Swarm and other technologies that were very popular, is that right?
Brandon: Yeah, so Docker was a big thing back then.
Docker was the big wave, but you could tell that there was something interesting about Kubernetes.
And so with a small team at Forward, we were trying it out, we actually built our own complete prototype using off the shelf stuff of a version of our product that could scale to multiple nodes, we would build the cluster for you.
So if you have virtual machines, if you have VMware style infrastructure or bare metal servers, you could deploy it in that way.
And then you would deploy our application atop that cluster that we had built for you and other companies that exist to make Kubernetes clusters that you can trust and that you can put your apps on top of.
And one app per cluster is not the typical model, but it's a much more deployable model for a provider of software if that's how your customers consume it as a virtual machine or a few nodes.
So we looked at what was out there, and I'll give the computer science perspective here.
There are languages that are explicit where you say what you want, you declare what you want, and then there are things where there's kind of implicit behavior and declarative systems are beautiful because you're documenting and you're defining the behavior and it's all in one place.
And so automation systems like Ansible is trying to be declarative.
It's trying to give you a way to say, "Here's what I want."
Make it so, declaring what you want.
And Kubernetes is exactly that way. It's really two things, right?
It's a runtime that's been hardened by people at Google who know how to build these things with lots of containers, with many services running on the same set of machines.
So it's a runtime layer, but then it's also an interface to define what your application is.
What are the pieces? How do they connect? How do they connect to the outside world?
And what resources do they need? That's the definition of an application.
And that, just to me, feels like such a better way to implement any application.
You can change the resources with a knob. You can say, "Yeah, I want three of these things at all times."
And then the runtime will make sure that the fault tolerance is there.
And if you build your application in the right way, you don't have to deal with many of those details.
You have to deal with some, right?
Every application has to be aware of its own fault tolerance considerations and distributed systems are really hard.
And we're probably not going to get into that on this episode.
But it's a really hard problem to solve, that Kubernetes does a lot of the work for you.
It just makes it so much easier. And if you've seen one Kubernetes app, honestly, you can pick the next and you can run with it.
So, a few years ago, we started converting some of our internal applications, a great place to start. In the DevOps world is with Jenkins.
So we want to test our software all the time.
In fact, within 10 minutes, every developer on my team should be able to know that their software works and that it's not going to break a customer experience.
But we have 100,000 tests that we need to run on and we need to figure out which test to run and make sure that every time we change our model, we're not breaking behavior.
Or if we're changing the behavior, we understand how we've changed it.
And we know that it's actually fixing a bug.
Or if we change it, it's adding a new feature. Things like that, right?
Where it's called acceptance testing is you bring the testing to the developers as much as you possibly can.
And if you can have your integration system start all these containers and run all your tests and scale more dynamically and just use more resources for this, it's great.
So Kubernetes provides one way to do it because you can have Jenkins run a fleet of agents and it starts up the agents and every agent has only the resources needed to run that particular test.
And so you get much higher utilization out of the resources you've already got, but you also get a much more manageable system because the developer can say, "I need these resources and this resource template to run this kind of test."
Then you have your giant, big long-running tests that are happily executing alongside the little ones that trigger every time there's a new piece of code.
So that's an example of getting started with Kubernetes. We also use Gerrit or one of those companies that uses it instead of GitHub.
And so we've poured that application.
And once you've done one, the second one is much easier because you understand the core concepts and what it's like to operationalize.
But it is a bit of a barrier to learn.
So the challenge with us with Kubernetes is if we're going to have a customer that depends on that, it has to be really easy to use.
Grant: Yeah, I remember us--
I think even maybe you and my CTO, Mark, my co-founder, talking about this a while back, which was like, "Hey, we want to move our entire architecture to Kubernetes. And we want our team to really adopt it and to sort of feel like they feel comfortable with it, right?"
This is a new thing they need to learn, new paradigms, the new things. And it's a challenge, right?
And we sort of made a similar decision, which is like, "Hey, we need to start moving a bunch of our services here because we know this is the future. And we need our engineers to feel like they know how to operate it and they can operate some number of applications and they can take those lessons and apply it to the more important stuff down the road."
Grant: Yeah, I love that.
And it's funny, like everybody kind of figures these things out independently, but then there's these patterns that emerge, you're like, "Well, the right way to do it is do a couple of these things, learning experiences."
And you probably look back on those initial Kubernetes manifests that you wrote five years ago and you're like, "Wow! We didn't know what we were doing." Right?
Brandon: Kubernetes has this unusual combination of it's really clean.
It's got kind of an elegance to it. It's a declarative spec for your application.
But then it has an enormous practical benefit because it removes a lot of the operational burden when you want to run on multiple nodes.
It has its own concepts to learn, to make that possible, but it does so much.
So yeah, it's definitely worth looking at.
Grant: I love it. And one of the things we want to talk about a little bit later on, I think is some build versus buy concepts, but I'm really interested because ultimately, I mean, I've known you for five plus years, right?
And we were talking about your VM years ago.
And I was subtly trying to get you to use Replicated for years.
And it was never really the right solution, maybe the company was too small, we didn't have the continuity you were looking for, but eventually, we started working together and it feels like in the last year plus, maybe some of the changes that we made around our products with cots and other things have really continued to help your team feel confident that Replicated had built some of the right things.
But I just kind of wonder what your perspective is on that.
Like what shifted, what changed? How did you make that decision?
Brandon: Yeah, so the choice to do a proof of concept and see if our product could be deployed on top of Kubernetes with Replicated managing the creation of the cluster and the update process was always an insurance policy.
Initially, starting as an insurance policy.
Brandon: If it doesn't work on one node, only then do you move to multiple because everything necessarily becomes more complicated.
You're asking for more resources, there's more delay, there's more complexity everywhere.
And you want to, again, avoid complexity where you can, make things simple for your customers to deploy and adopt and expand.
And so for us, it was that insurance policy for when people really needed it.
And I'm really lucky because our engineering team has some amazing people.
So a year ago I transitioned kind of control of engineering over to Yasser who was one of our first engineers.
And he was a scaling engineer and he built out the scaling and it was every year, every single year, year after year, 10X improvement in scale.
Grant: Oh wow.
Brandon: And so we went from 10 devices to then five years later, we were at the 100,000 device level.
And it was to where I could show an actual running version of our software that was so large that it eliminated the doubts.
And instead of the conversations going, "Let me show you why we believe we can scale and the evidence whose trajectory leads to the place we need to be."
It instead was, "Here's a demo, it's running. Next question please."
Brandon: And that's so amazing.
Brandon: And there are other reasons we talked about, right, to move to multiple nodes.
And those particular reasons became more and more interesting. So fault tolerant operation, right?
It's in the middle of the night and it goes something breaks and it's not my fault.
It's not the customer's fault. It's just, it broke.
You can't trust hardware 100%. Is the application can operate. And so the additional benefits of going to multiple nodes kind of kicked in and that's where I think Kubernetes and the system based on it that wasn't just providing the scale benefit, it was providing these others became more overall compelling. And so today, there is no customer that we have ever come across who actually needs multiple nodes to scale.
This is crazy, right? 100,000 nodes demonstrated on one box.
We actually took all the debugging data we had ever seen.
So this is realistic data from every customer we'd ever seen, put it on one box and it ran on one node and it processed in a reasonable amount of time.
I think it was roughly an hour or two. And so we don't need multiple nodes to scale.
And we're fortunate to have that property because we've invested heavily.
And we have a team that just Monday morning, they get a network that has some new property and then Friday there's software that goes out the door sometimes that solves that.
And sometimes it's not a week, sometimes it's months.
But now those fire alarms go off and it tends to be vastly shorter or we haven't had any actually, in many months.
We've seen the complexity that's out there and so we're kind of getting ahead of real world complexity.
So kind of connecting it back to Kubernetes, I guess.
For us, it started as an insurance policy and then it became a differentiating enterprise grade feature because we can say we have fault tolerance because people care about fault tolerance and that improves the operational experience.
And we can say that you can isolate processing and when we go to our dashboard and look at how long it took to collect data and process that data, it's a straight line.
It doesn't have the variation that you necessarily have if you can't control where stuff's happening.
Grant: Yeah, that makes sense.
Brandon: And with Kubernetes, right, it's declarative spec.
So you can give the customer the opportunity to change how they want to allocate resources, whether they want two of their virtual machines doing the batch side or whether they want a whole bunch of them that are doing the more real-time side if they want to support more users.
So you give them control over how to optimize their resources.
Grant: I love that. Okay, let's kind of move into a couple other areas, right?
So you mentioned you've given up the sort of day to day operations of the engineering team.
So talk about how your role as CTO has sort of transitioned throughout the growth and history of the company.
Brandon: Yeah, so I'm a co-founder first and on the West Coast, I'm a co-founder comma CTO.
And on the east coast, I'm a CTO comma co-founder.
That titles sometimes matter a little more there.
And if they don't know who you are and they're not up to date on the latest companies then the credibility of a few letters tends to matter more.
Grant: Oh, funny.
Brandon: I'm an employee first and foremost. So I have to do whatever needs to be done.
And in early days, it was build the UI for the product.
Brandon: I'm not a UI developer, but I could build one.
So I had to do that, someone had to do it. That's what a startups like when you got a few people.
And then as you grow, you get people who specialize more in those areas and are just a much better choice.
They can do a better job. And we've been fortunate to get amazing UI developers.
We had for a while just two building the entire product.
You get someone who focuses on scaling and someone who focuses on data collection, then you get someone to help out with all the other attributes.
And it's the same way. So my role evolved based on whatever needed to be done at that time, whatever was the biggest problem to enable the company to continue to exist and to grow.
And so a year in my role shifted, I was the sales engineer.
I would go on premises to the customers and show them how to get the product working and do the integrations.
We would work with API calls and within our first year we had an internet service provider.
They had a lab in Palo Alto and they were able to test this interesting network.
It had some software-defined networking elements, it had some more traditional ones, and they had a full DevOps process, 2014, where they could press a button and verify that the network that they had built with all the changes they'd introduced through their code was yielding exactly the outcome that they wanted.
They could do test-driven development of their entire network configuration generation.
They could press a button and no matter what they had changed, hardware, software or vendor, they could confirm that.
And I thought this was going to be the thing that would take over.
And this requires some integration, right? Someone has to know what the API calls are.
And this was so early that we had to evolve the API calls in the process of a POC.
We don't need to do that anymore, thank God.
Things stabilize over time as you see the same use cases over and over, but then the role continued to evolve.
And we took a really unusual approach, I would say of structuring engineering.
We did not have managers. We just had team leads.
And we had people who were so technically deep in the area that they would work.
Yasser to just give one example, right? He did a PhD in search.
He then went to Microsoft to build Bing.
He then went to Facebook to build Graph Search.
And then he's looking for a company that had less than 10 people and interesting scaling challenges and we got lucky.
It's that simple. And so his opportunity was to grow a team.
And so he started growing a team.
And it's people like this that you need if you're going to have a light touch management style and you're going to communicate a goal and a priority and they're going to run with it and they're going to find the right way to solve the problem.
And for a while, we didn't even know what the culture was.
People would ask us in an interview, I was like, I don't know.
That seems like a fluffy question, right?
Grant, what's the culture at Replicated?
Grant: That's so funny. I love that.
Brandon: Put you on the spot.
Grant: Oh yeah, culture Replicated is all about curiosity.
That's the first and foremost, caring deeply.
There's a whole bunch of values that we like to talk about, but ultimately, it's about like we want long-term people who care about-- They play the long game.
Every relationship matters, every interaction matters.
And that should ultimately align us with every partner, with every customer, with every prospect, because like we're going to see him down the road.
So I think as long as you don't view things as very short-term, but you view things as long-term and you're interested in that long-term success, that's a foundation for us.
So there's a handful of different things. We try to define it.
We did what I would call a culture audit to sort of figure it out a couple of years ago and said like, well, what are we like already?
And we came up with a couple of things, wrote it down and then now we try to live up to those things.
So we try to make sure that we maintain curiosity and we maintain a level of caring and we do try to default towards action and all the things that are sort of in those core values.
But back to Forward Networks, you couldn't describe your culture.
Brandon: I couldn't, but then I realized, well, where did we come from?
Four people who had worked together for six years in the fires of demo hell, where you can't move the deadlines for paper.
They're not going to move it for you.
And so that creates stress and a good way to know if someone's potential co-founder is have you been in stressful situations for many years and come out stronger?
Grant: I love that.
Brandon: Right? Have you been through the fires and have you built that trust?
Have you had those trust-building, culture-defining moments?
And that was what we had.
And so we communicated openly, honestly, and transparently and we maintain that as the company continued to grow and sought out people who solve technical problems in ways that make their own jobs irrelevant.
And so by that, I mean an engineer who automates themselves out of existence is exactly the kind you want to seek.
Brandon: Because if they think aggressively, they think how can I-- I'll give one example.
We need to test the world's diversity of networking devices individually and together going back 10 to 15 years.
Grant: That's a big one, yeah.
Brandon: That's the one problem we need to solve. So we can't just add people.
We can't afford it. We don't have the money, right?
We're a startup. So you need people.
No one ever has the resources they need.
So with limited resources, can people focus it on ways that scale with people time, with money?
And if you find people who can think at the level to automate the system that needs to be automated, build the infrastructure that removes the need for people or makes them more efficient or makes other developers more efficient or makes their customers or their customer success reps or their sales engineers more--
Or even the sales team more efficient, those are the people you want.
And for us, it tends to be people who are really deep into computer science and they've had to solve really hard problems.
In fact, this is the question you can ask on a culture interview.
Tell me a hard problem you solved or debugged.
And did you learn something from it? Did you come out stronger?
You can often learn a lot from whether someone's really been challenged and whether they seek the challenge, whether they have that curiosity you mentioned.
And so getting back to your question, how has the role evolved?
Brandon: Well, the role always evolves to whatever you need to do.
And so even one role is to do culture interviews and to make sure that that culture expands so that you have the engineering bandwidth and you have the bandwidth actually across the entire company, marketing, sales, product to continue growing to do a product that does more easily in less time.
And so my role now is shifted yet again and it's what we're doing now it's communicating more about what it is that we do and why it's interesting and why people should not just hear the words I'm saying, but actually see it for themselves and go to the website.
Grant: And what is the website?
Grant: Great. That kind of leads me to the next good question which is like you started the podcast which I've been listening to and think is really phenomenal.
Like when we were talking a little bit about this before the show, like
Brandon: What's the website?
Grant: Yeah, seeking truth in networking, right?
Brandon: Got it, yeah. Seeking truth in networking.
Grant: Yeah. So what was the inspiration for starting a podcast? Like why podcast? Why now?
Brandon: A bunch of reasons. One was really simple.
The pandemic hit and we're all locked at home, isolated, feeling detached from reality and the world.
And that's a way to reach out to people and talk to them.
Another reason is I want to hear people's stories and I want to tell them, and we've got connections to a lot of people in the networking industry.
And I thought, okay, there are a lot of people whose stories are so interesting that within the circle of people identify as being connected to networking, be pretty interesting and it would be a worthwhile use of their time to hear a luminary speak for an hour about their life experience, where the system they built that went from an idea in their head to deploy it and the things they learned interacting with people.
And we've got this kind of unique company perspective.
It's half software, it's half network. And we're actually a software company in the world of networking.
We're not really a networking company in the traditional sense, right?
Because the people we hire are, like we were just talking about, they're software developers.
We need some people with deep networking experience especially on the customer facing side, but computer science concepts pervade the existence of the company.
And so you end up getting access to a lot of people.
And if you want to tell stories in networking more broadly because it's fun, because it's meaningful, why not do it?
And so quickly we realized, well, yeah, we could talk to a few professors and we could talk to our circle of friends.
Or we could raise the bar and we could take a risk.
And the risk was to go out to the biggest Grants that honestly, we thought everyone was going to say no.
We thought everyone was going to say no because why would they trust us?
We have nothing to show. They don't know what they're signing up for.
And it was the opposite. Everyone says yes, people want to speak.
And I want to tell their story and you get a few people who speak.
And then I can point to the story that they've told and then it becomes a little easier to get others on.
And so our whole kind of mantra evolved from telling stories in an entertaining, insightful, useful way to network operators and engineers where it's the people who've changed the world of networking to, and this is the important bit, changed the world through networking.
So it's about connectivity, it's not about technology.
Grant: There are parts of it that are about technology.
And I find it so interesting because there's so much context that's provided, right?
Like when you interviewed the creators of ethernet, right?
Like as someone who doesn't know that much about networking, right?
It's really interesting to sort of understand the foundations, where did this come from?
And that information feels like it's locked up in only a handful of people and it's hard to find it.
And so it's really cool that you're exposing it.
And I think for folks that even want to come into this space in 10 years, these are going to be really useful episodes for them to listen to and understand like, well, like why?
Like why are things like they are? And I think that history is just super, super helpful.
Brandon: Yeah, and it's really good for me to hear that because there are podcasts I listen to that are topical.
It's what happened in the last week that's going to affect us in the next week. And there's value there.
But I like more of the unique stories that no one else has brought to the world and that are consumable while you're running or while you're cooking dinner or the best review we ever got of our podcast was, "It made my Costco trip interesting."
Grant: Yeah, I love that. I love that.
Brandon: Podcasts can be consumed wherever you live.
It takes kind of idle time that would be work and turns it into an opportunity to learn, be entertained, and that's kind of what's unique about the medium, assuming that the stories are good enough to entertain you through that.
Grant: Yeah, and there's something about like that this idea that history is philosophy and like you can study history to understand what actually happened, right?
And why people are like they are.
It's better than just sort of saying like, "Well, let me think about why it's like that."
It's like, well, let's just go study the history about what happened, what led us to this point, what's happened in the past, right?
So when you understand how ethernet evolved or how any of these protocols evolved or people have moved in and why things are like they are, it gives you context that allows you to say, okay, that makes more sense.
Let me take this forward in a more modern direction or let me maintain that history. So I just think it's super valuable, but.
Brandon: I totally agree. Like what is ethernet?
Lets you plug in a cable and get to the internet in most cases.
But why is ethernet? It's so much more interesting.
What does that even mean, right? How did it come to be?
Where did it come from? Where is it going? Who are the people involved?
What are the stories? The stories you heard in this episode, right, it's a company getting founded.
It's the same thing. And it was like a lifeboat for me.
It was amazing to be looking at Bob Metcalfe who is one of the most influential people in history for connectivity, right?
He's the person who invented ethernet.
And then he's one of the two people who brought it to the world.
He and Bill Crowl, it was a few more than two, right?
And he talks about those people.
And so in the process of doing that conversation that we had, that was the great part about it and Bob's talked to plenty of people.
He can entertain, it's like a masterclass and how to get someone sucked into your story and string them along.
And then it's not even this later, you're like, wow, that time went really fast.
I learned a lot from it and I encourage people to listen.
We actually split the episode in two and we realized there's a broader story here.
There's early days of Silicon Valley.
What was it like when personal computing wasn't so personal?
And then it's his life advice.
And he goes in a completely different direction and talking about the technologies that he's excited by, that he's personally investing his time in and it's not related to connectivity at all.
It's clean energy.
Brandon: So I'd encourage people to listen to that one or if they want to start with the first episode, Mark Andreessen, another example of someone who created a technology, blew it into the world and it's affected all of our lives.
He created the web browser and now he's changing the world of venture capital.
And so people who've done that multiple times are super fascinating to me because those lessons they've learned and how you evolve a technology, how you iterate on it, often translate to the next one.
Grant: Yeah, and at this moment in time, people are pretty excited about the clubhouse and audio in general, I still prefer podcasts 'cause I think it allows you to select the content you want to listen to, share like, hey, this was incredible and share with people so that they can get the same context.
Clubhouse is a great conversation, but I still really love podcasts as the primary form of audio.
Brandon: I think each form has its own unique element that can tell a different kind of story or concept, right?
So we can go really deep and we're doing that right now.
And you can hear my excitement and the words and you can hear stories that wouldn't really convey the same if it was just words on a page and a blog.
But a blog is a more consumable way for some people.
I don't always have headphones or not everyone even has headphones.
And then YouTube is a completely different way. All the concepts that are best explained visual and visual ways you can show there. And then there's like even short form visual content. It's what can I show you in five minutes?
And actually on our website recently, we've added what we call Forward Fixes and on YouTube.
And it's here's a problem that mathematical modeling can solve.
And in five minutes, you'll see how our product can be used to deliver a solution to that problem.
What does it look like? People need to see things, we're visual thinkers.
And that's very different than even clubhouse.
Clubhouse is access to people, to their voices, it's more topical, and I'm even seeing a hybrid model where the best elements of clubhouse conversations can turn into curated podcasts.
So there's a lot happening in the world of media to get people access to good information and the best speakers and stories are starting to rise.
Grant: Yeah, and I think it's just-- I love what would be kind of these niche topics, right?
Enterprise Ready and seeking truth in networking, are these like niche topics, but it's really like, I think this is how you get deep, right?
You want these people who were somewhat T-shaped and the idea that you can go really deep in a podcast and hear from experts and you don't have to be in Silicon valley, but you can be anywhere in the world to hear the thoughts these folks are thinking about.
It's so democratizing. I love it.
Brandon: So Grant, are there any podcasts you listen to that are pretty far outside the Enterprise Ready wheelhouse?
Grant: Oh, man. Well, I listened to a ton of audio books.
Like that's where-- So a lot of the things that I listen to are things that come across my sort of world there.
So stuff I just finished, there's like the classic, "The Sovereign Individual", which is this crazy book that predicted the rise of cryptocurrencies.
And I was just reading "The Beginning of Infinity" by David Deutsch guy.
I was finishing up, there's an older book that one of my friends suggested, the one that was kind of referencing around history, which is called "Lessons of History" by Will and Ariel Durant.
I mean, and I'm a person-- I'm not like I have to finish everything.
So I'll listen to a bunch of different books at the same time and take a bunch of different inspiration from things, for me, it's all about like how do I consume a lot of different things and kind of take one concept from history and apply it to enterprise software.
Take something I'm learning about, enterprise software.
And if I had to politics, so for me, that's the like being able to truly kind of cross-functionally learn across all these different interests is what I love the most.
I think I don't have any like these, super deep nerd habits.
If anything, I'm like a political kind of philosophy nerd.
Like I love... There's this book by this guy, Thomas Soul, who's this really, really old--
He used to be a communist and he became like pretty conservative.
But he wrote this book years ago called "A Conflict of Visions, which I just found to be one of the most interesting books in like comparative political philosophy, like I've ever read.
And for me, those are like the nerdy things that on a Saturday morning I like go for a walk and listen to, so.
Brandon: The nerdy that I like to indulge in on a Saturday morning and on a walk around the neighborhood is stuff that in my daily life, I would have no exposure to, but I wish I learned a little more about, so biology.
I don't understand it anywhere near the level-- Like I can ask basic questions, like I remember a year ago, it was like, what do mushrooms breathe? And I didn't know.
Grant: Oh, okay, yeah.
Brandon: And I felt like it was so basic.
Grant: oxygen, right, yeah?
Brandon: Yeah, so topics like that or space or things I just don't know about, I feel like it's an easy way to dip your toe into a topic, in many cases here, a book author, say their story about why they're excited about it, and then you choose whether to commit to it and listen to the entire book.
And if you're hesitant to commit to things like I am some times, then it's a nice way to be entertained, but also learn something and not feel like you wasted your time.
Grant: Yeah, a lot of times you get most of the concept from just like the one hour interview versus like going to read the 10 hour book or something, so.
Brandon: Sometimes you don't need eight hours of listening to a book to really get the concept and spit it to someone else and get the value out of it.
Sometimes, I feel like an hour is about the right amount.
Grant: Exactly. Okay, so I want to take you back into enterprise software real quick.
There's a couple of things that I was thinking about.
So I know the Forward Networks kind of value proposition is primarily around reliability, right?
Is that sort of what you would say is the value proposition?
Brandon: It's efficiency of people. It's reduction in the duration and the severity of outages.
And it's enabling automation without which you couldn't make business changes or you couldn't deploy applications.
So as one example, Goldman Sachs has deployed our product 17,000 nodes, multiple years, it's the full scale of their entire global network.
And when they're deploying the latest application, imagine something like an Apple card, it has to run on servers on every continent and those servers need to connect through 17,000 devices.
So which one is it? So we go from, how does traffic get from A to B at the kind of lowest level, the most direct query, the most in the weeds?
And all the details relevant to that to troubleshoot a problem and understand behavior through to verifying higher-level properties like is my network secure?
Is my network connected in the way that I think it should be?
Is it fault tolerant in the places where that really matters, especially around security postures, zero trust plus rate these kinds of things all the way to enabling automation.
So they have an API call, that API call needs to scale so their entire network can be fast to show them all the firewalls to touch so that they can deploy applications.
And many years ago, right, fam infrastructure could do that in an automated way, via API, but at their network with what they've got deployed, they had to find a way. And so they partner.
Grant: Okay, and so if I break it down, obviously, you do a lot, but it still seems like the primary value proposition is sort of this concept of like reliability, but like sort of to your point is a function of velocity and being able to change and like move fast, right? Does that make sense?
Brandon: Yeah, there's like a connection between the two, so you can't move fast, you can't get to compliance checks.
You can't get to automation.
You can't get to all this the new equipment that's going to make your life better if you're stuck with what you've got, if you can't operate it effectively, if it still takes time every time a ticket comes in to figure out if it is the network, to see what the paths are to find out where applications live, to simply access information, to find where this Vlan spans.
There all kinds of questions that people have.
And it's people, it's operators and engineers who are hands on keyboard going to the right places to find the right information.
Brandon: If you can snap your fingers and every query returns a result and some queries return the result you're looking for before you're done typing, if you have kind of that level of interactive speed, then that phase at the beginning of every problem you've ever need to solve to get access to information, that comes before you can actually get to the insight, enables you to get to the insight immediately, right?
If you start with the critical information to get the insight or in some cases now start with the insight, we show you what the insight is.
This is where traffic is getting dropped, everything looks okay and here's why.
If you start with the insight, you can focus on the fix.
You can evaluate potential fixes.
And so the value is kind of getting caught up with reacting to problems today which is largely a manual way without our software becomes much more automated so that teams can transition to a more proactive, more predictable method of network operations and engineering so they can get the time to deploy what they really want so that they can deploy the more exciting stuff so that they can spend the time training their teams to do the automations that take it from, I do things one at a time, to I do it once and it goes to my entire network with whatever system they use.
Grant: Yeah, that makes sense. But the one thing that I sort of--
When I think about the reliability and sort of all the things you're talking about there and speed and being able to sort of move fast, create amazing products and get things out in the market, and you mentioned it a little bit in your first description, but it's not the primary use case, which is security.
And it feels like, a technology like Forward Networks could probably very well have had a, like we are a security company, it's network security.
Here's how we do, and here's how it works.
Is there any context in terms of like why that's not the primary call to action or value proposition?
Brandon: So it's a great question 'cause it's timely.
So you know what happened at the end of last year, right? What's top of mind for every security administrator out there?
Grant: Okay, the SolarWinds hack, yeah.
Brandon: Supply chain breaches. It completely changes the way you think about security when you can trust no one.
Brandon: Right? You deployed software from a vendor, you trusted vendor, tens of thousands of people deployed that software.
It's running on the very core of your environment.
And then an update comes in and now it's got a rootkit and Russia is controlling it and they're accessing whatever they want or they're using it as a vector to get to where they want.
Brandon: So it puts to the forefront the importance of security and what do you do about it.
And if you don't know what's happening in the network, how can you make sense of a network security posture?
If you don't know who can talk to in the network if you don't know if the firewalls are even applied to traffic between two servers, then how can you productively trace a breach?
You're stuck with an evidence of what actually happened after the fact.
Brandon: Or more kind of host-level systems to try to have the protections.
So I think it was really hard or possibly even impossible to have a defensible security posture towards zero trust which everyone wants to go to with proper segmentation if you don't know what your network is doing fundamentally.
And it's really hard without software support to know what your network is doing at any point in time, let alone continuously.
And so the original value proposition was because we were network operators.
We saw how challenging network operations was.
Let's start there because no one else can do this.
No one else can build that model that scales.
No one else is thinking about networks in this way.
And security was always in the back of our mind, but it honestly seemed a little bit scary 'cause it's a crowded space.
There's a lot of people chasing those dollars.
How do you stand out from that crowd?
Well, I know how we stand out in the networking world.
It's a completely new set of capabilities, but I didn't know in security and none of us really know until recently, but you got to listen to the customers.
And when they're asking you, "Well, how can I use this?"
And when the network team says, "Hey, let me bring in the security team because they're going to get a lot of value out of that."
And then the security team is saying, "Yeah, let's learn more. Let's actually get it deployed."
And then you have teams that are saying, "Yeah, we're both going to get value. So let's both contribute to getting this deployed. Let's both find some budget."
That's exactly the point.
You're blurring the lines, you're making network data accessible to everyone in the organization, operator, engineer, security, security operations, security engineering, DevOps, application developers, the information about how applications connect can potentially make every application, every security inquiry better.
So let's make it available. Let's give it the APIs and let's give it the UI support to make those use cases easier.
And so, now our executive team is really where we have to listen to the customers and we do and they're talking more about it and they see the value more and more.
And there are already some players in the space, but what we've got is unique because it's deeper.
Brandon: Right? It understands everything at every level of every virtual or physical box, cloud or on-prem through which a packet ever has to go, we have visibility into.
And so if you've got that, you've got a great starting point of behavioral understanding to help make all those teams more efficient and give them answers.
If the question you're asking is, is my network secure?
That's a really easy to pose question and that's extremely difficult to answer with any level of certainty.
And so there's this concept of verification, which is not that I'm testing a few things that I can think that I should test.
It's not the test cases I've written.
It's everything that's testable. It's a high level property.
And if network teams are thinking in terms of those high level property as a fault tolerance and security and connectivity, then everyone benefits, you can actually get toward compliance.
And other industries had events that were so spectacular in their level of failure that it change the way people thought about testing.
Two quick examples. One of them is the Ariane 5 rocket, back to space, same exact software as the Ariane 4.
First time it goes up, what happens? Kaboom! 37 seconds in.
Because the trajectory was steeper. If it was a bigger, faster, bolder a rocket and there was an overflow condition.
All the three computers experienced the same one. The guidance didn't know what to do.
It turned sideways and broke apart and then exploded because the flight termination kicked in.
That's a billion dollar project. Now, it's not a billion down the line, but it's a lot of money lost and it set that project back a lot of years 'cause you have to regain that trust.
You had a failure that was very public. Or Intel.
I don't know if you remember the '90s, the Pentium chip.
Grant: Oh, of course.
Brandon: So Intel had to reclaim entire PCs because the motherboards had chips that you could multiply two times two and you'd get something other than four.
You'd just go to Excel and you'd get the wrong answer.
And all the financial people were using-- Spreadsheets was a good use case at the time.
We didn't have as much of the web. They were like, "I can't use this PC."
And it was a hardware failure that couldn't be worked around.
There was no work around.
Brandon: So Intel had to reclaim chips from systems that were deployed and running on desks at the cost of, I think it was roughly a billion dollars.
Brandon: And at the time, people were saying, "Is Intel going to make it through?"
I mean, even bringing up the third example, I think it was December, 2012, Amazon had an issue.
Again, not to pick on any of these companies, it's very hard to make products that work in every possible circumstance if you can't test them in advance.
So Amazon had a change where they were updating their network and they had a kind of backup network for management and it was, I think, 10% of the size of the main one.
And they accidentally shifted all traffic to the 10% one.
What happened was there wasn't enough bandwidth to support all the operations.
So all the servers got backed up, they started triggering conditions that had not been tested.
Distributed systems are hard and it was the interaction of all those conditions that hadn't been tested 'cause why would this ever occur?
Why would you ever have no network across everything or everything starts backing up and the headlines were, "Is the cloud dead?"
No, the cloud's not dead. Intel's not dead.
The Ariane 5 rocket's still flying, but it set those projects back quite a bit.
And they would prefer not to have those headlines. It's exact same thing in networking.
If you can verify behavior, if you can ensure that properties hold no matter what your topology looks like, no matter what your devices are, no matter who touched the device last, then you have a much better network security posture.
You have a much better reachability posture. You have a much better fault tolerance posture.
You have a better business that you can trust more.
So communicating a concept like that, right?
It's an abstract concept of verification.
And it kind of attention grabbing ways is pretty hard because it forces people to think how do you go from testing individual points to reasoning about abstract things.
And in the software space, there's a company called Coverity that commercialized this.
They took static verification, it's a program that understands the program.
And what it does is it finds every loop the program could enter, every place it could crash, every place it could leak data, every way that an input leads to successful output, users happy.
If you can find that before the product is deployed, it's much cheaper, right?
Going back to our mission to Mars analogy, I want to know that there are no bugs on the software that's running the rocket before I ship it there.
On the life support system as well.
So verification techniques from the world of computer science and especially programming languages and that verification community slowly percolate into systems we use and have enormous value because they eliminate entire classes of errors.
And if you run those systems, then you have a higher level of assurance that your thing is going to operate no matter what the circumstances are, including ones you didn't even think about, that have never been encountered in the history of that system's operation and existence.
Grant: Is that sort of related to like this concept of like formal verification, which I don't really know much about, but it's a term I've heard thrown around in terms of like really like the most
Brandon: It's exactly it.
Brandon: If you can test everything and if you can reason about behavior abstractly, then you can find bugs before they manifest in the field.
You can eliminate, find one instance of a potential bug that represents a problem and know that it's not going to happen.
It's really hard to do, right?
Because every line adds some cost to verification and it's hard to get these systems to scale.
Brandon: So it's still in some areas research edge, but that's exactly what we've done.
We've commercialized network verification where the initial reason to deploy is that it it's just faster to figure out how your application is connected, so you can figure out if it's a network problem or not.
But then as people use it, right, day zero is up-to-date map your network, day one is I'm searching, I'm productively understanding and troubleshooting behavior that I didn't have the insights I needed to before 'cause I didn't even know the traffic was going that way.
And I couldn't understand how all these different tables interacted to yield behavior.
And then day two is, oh my God, I can automate.
I can verify, I can press a button and continuously ensure compliance, security, fault tolerance, connectivity.
And then I can mix it in with DevOps.
So as my network is continually evolving, as my networks are evolving, the network is always meeting the application's needs.
Grant: I love that. And I mean, is this sort of been the vision since day one?
Or is this vision sort of evolved as you've worked with customers?
Brandon: I'd say we're still not even practically deployed for the original vision at the scale that it should be.
That original concept of a safety net requires the most basic of integration, but a lot of network operations and engineering teams don't have a software mindset of yeah, of course, you mix everything into your change pipeline.
You have a change pipeline, right?
It's in many cases, systems that are changed by people or maybe they've got a custom change management system, but it tends not to inter-operate with DevOps.
And so we came at it with a computer science mindset and that mindset is pervading the world of networking increasingly, software techniques are making networking better, but changing people and changing processes takes time even if the products are already there.
And so that's the big challenge, it's showing that it's already there, the product can already do what people need, but getting it adopted in more places as people learn what's possible, as they learn why network verification is something the CEO is going to be really happy to hear because it helps them sleep at night, because it means that the likelihood of an outage that puts them in the news is much lower.
It directly addresses CEO-level concerns all the way down to, I'm in the weeds of making a single change.
And I've been on those change windows.
I've gone in on a Saturday and had to move equipment from one place to another and you spend a week or two preparing for it.
You test everything that you can.
You start and stop every system you can so that you know how to start it up.
When it's the only thing that's changed not the stack of 10 things that's changed.
And the last one we did was last July.
We moved all of the VMware infrastructure for our company from a closet at the office we were moving from to a co-location facility.
So we'd get more trustworthy power cooling, all that.
And even though it's not moving a lot of stuff, it's a single digit number of physical boxes from one rack to another.
It took enormous planning because the entire operation of all our developers depended on that.
And so when we did that, it was wake up at 7:00 a.m and then get home at 2:00 a.m.
And you don't go home until everything is working, until business can resume.
And inevitably, you go home at 2:00 a.m and then you wake up the next morning and you miss something.
Brandon: Cause it's hard to make sense of the network you've got, let alone the applications on top and you can test everything in advance, but people can't think even if they're the best ones, myself included, everything they need to test.
And so you're inevitably fixing the fires that you set, that you didn't realize you were setting because config wasn't saved.
And then the next morning it starts up and it's mostly working, but maybe in a few specific areas, a few people can't access the thing that they need.
So speeding up that process, providing that visibility, we dog food, we use our own products to prove our own network experience.
And yet we know when it's three orders of magnitude or for larger scale, it's a different world.
Some values even stronger.
Grant: I love that.
When your customers are looking at Forward Networks, is there a build versus buy or is it more like head to head competitors? What's that landscape look like?
Brandon: Yeah, that's a good question.
So what we do, very few companies in the world can build, let alone even think to build.
And at the places where they would build it, it's not meeting their core competency in the network teams.
So Goldman Sachs, right? They're an advanced company, they do new things with software. They are tip of the spear.
Brandon: But their network operations and engineering teams may not have the tools depth to maintain software that understands the diversity of all the equipment across their 17,000 devices, keeps it up to date, makes it all scale, provides a consumable API and UI and wants to keep doing that indefinitely.
So a company like that builds because they don't have a choice, but then at the first opportunity that they can buy, they buy because they know the pain and they know the value.
And so for us, giant companies that need to do automation that depends on network operations or they want to know if the network is ready before every application deployment.
So that was another example there.
These are huge companies that need to keep things running that have production payment processing clusters and everything they can do to increase the level of assurance on every change is meaningful to their business.
Grant: Well, one of the things you said about that sort of build versus buy scenario is the last comment you said and want to keep doing that indefinitely, right?
Which I think is very much underappreciated.
Like I think about it, I call it like kind of continuity, right?
Because I think it's fairly for our business, I always say our customers can build anything, they just can't build everything.
And so it's like this, do you want this to be a core piece of what your team does in perpetuity, right? Forever.
Because if you're going to be-- For us, it's delivering software, for you, it's monitoring and managing network reliability and security.
Do you want to make sure that your team is always building the software to make this possible, right?
Brandon: Exactly, as a core, as a context.
Grant: Right, exactly.
And I think this is probably true in your space as well as at ours, which is probably true throughout most of the technology in an ecosystem that is always evolving and sort of evolves pretty quickly.
It's almost even more important to have a vendor or else your team is just constantly behind in terms of catching up with what the next generation of things are that they need to integrate with and update to and it's like, it's this constant battle that if every organization is repeating the same effort continuously, it's like this huge amount of wasted effort inside of like an economy, right?
So if you can centralize that effort for one vendor, not only it's an efficiency, but you're also getting massive gains because they're seeing lots of different problems and you're seeing it before you get to it.
And so you kind of benefit from the network effect there.
But this idea that you have to want to do it continuously, I think is really, really important.
Brandon: Start with the customer experience and then ask the question which choice, build versus buy is going to deliver a better experience faster?
And then when I look at the line like you were saying, what's the trajectory? Is it going to get better faster?
And often, with subscription software, if there is a player in that space who you can exceed the trust bar with and you believe it's worth building your product on top of that, because it is context, but it is not the core differentiator.
Then you should at least look at what's out there.
And at the minimum, you're going to learn what best practices are and what your experience could be.
Maybe you choose for economic reasons or trust reasons to build it yourself, but you've at least seen what is out there.
And increasingly, we will always be under-resourced.
We will never have the resources to achieve the giant ambitions of enabling networking to have all the driver assist functionality based on a complete, precise, accurate model of the world.
And in the same way that cars are becoming more autonomous and driver assist maker are driving safer and more efficient, we want to do the same thing with networks and there's a lot yet to do.
And for us to get there faster, I want talented engineers working on a model of the network, working on ways to show what we can do.
And so when it becomes build versus buy for platform versus core differentiator, it's going to be for that platform we're building on if we can trust it and it delivers a good experience, it can be integrated in a reasonable amount of time then we have to seriously consider it.
Grant: Yeah. There's also one thing I was thinking about within Forward Networks in terms of your customers sort of build versus buy analysis.
You talk about this transition that your customers are making and they're saying, "Oh, we're going to bring in the security team, right?"
It's not just like one team anymore.
And there's almost this level of like inside of enterprise software companies, we refer to it as customer success, right?
Or which is going into the organization and helping other parts of the organization or more teams adopt and start using this tool in sort of consolidating on this as a best practice internally.
And if you build that yourself, right, if you build the solution yourself, so if one of your customers hobbles together some networking stuff and an investigation tools, then they're going to be responsible for onboarding that security team and helping that security team really like feel like they are getting a product that solves their problems as well.
Brandon: Exactly, and it's not just do I have a platform to build on?
Do I have a phone that I can pick up and call when I don't know the details?
And that's honestly, for us, it's equally to more valuable to know that the customer experience will never be blocked on the knowledge we have of that platform.
It's always going to take time to build that up if we're building a system internally. It's not going to be hardened over 100 plus deployments, it's going to be over just the number that we've got.
Grant: Yeah, I love that. Cool, Brandon. This has been a really insightful conversation.
We've touched on a lot of different topics. I really appreciate it.
I've learned more about networking that I knew before.
Learning more about like your experience and it's super interesting.
Is there anything else you want to say, just kind of in closing.
Brandon: Yeah, go to forwardnetworks.com, check it out.
The best way to really understand the stuff we talked about is to see it for yourself and all you've heard are words.
'Cause I did words but words. And you can see pictures and you can see video and you can do the clicks.
It's a very different experience. So reach out, we'd love to show you a demo.