1. Library
  2. Podcasts
  3. Unintended Consequences
  4. Ep. #10, Infrastructure Buffers with J. Paul Reed of Netflix
Unintended Consequences
27 MIN

Ep. #10, Infrastructure Buffers with J. Paul Reed of Netflix

light mode
about the episode

In episode 10 of Unintended Consequences, Heidi and Kim continue their conversation with J. Paul Reed of Netflix. This time they explore the hidden complexities of infrastructure systems, including public transportation, domestic utilities, public spaces, and the internet.

J. Paul Reed is Senior Applied Resilience Engineer at Netflix. He is a recognized speaker on DevOps, release engineering, and operations complexity. J. Paul has written articles for O’Reilly, DZone, and Atlassian and is the author of DevOps in Practice.


Heidi Waterhouse: So, we're going to pull it back to technology.

Put on your predictor hat, your futurist hat, and tell me what we need to build out for our next unknown unknown.

J. Paul Reed: Well, I think in general, the way America views, a lot of infrastructure in general is not particularly productive or conducive to the outcomes that we want.

So, you can see the impacts on infrastructure from the pandemic in a lot of different ways.

So, this idea of last mile and good internet. I mean, all of the struggles that a lower income houses had with remote learning.

I had some coworkers that on meetings, they would have frame drops and drop out because you've got three kids in class and Zoom calls and you've got the wife on a Zoom call and I'm on a Zoom call.

You can see that, right? You see it in transportation infrastructure, right?

The fact that Muni was decimated in the first part of the pandemic because of the way that their-- BART too, the way that their funding works.

If you don't have people writing it, then thing just stops running because they get a lot, not most, but a lot of their bark gets most of its money from fairs.

I think that's right. Muni, it's a little more complicated, but now we're looking at-- Okay, well, when you open up, those things, aren't going to come.

There's report Muni is not going to come back to full capacity till 2022, because of the funding problems.

We haven't figured that out as a country on how to do that. Right?

Kim, this will resonate with Kim. I bet. I don't know about you, but I wish PG&E would go away.

They've ignored their electrical infrastructure and their gas infrastructure.

They've let all that go to hell for 50 years because they're an investor owned utility and it was cheaper.

Heidi: They're the cost of a bunch of the major forest fires we had the past couple of years.

Paul: Exactly, exactly.

So what's interesting to me is when their solution is, well we haven't really replaced these towers for 70 years and their useful life when they were designed as 50 years.

Instead of like fixing NATS, which is an infrastructure, it's a capital improvement.

We're just going to turn the power off on windy days. Like what?

Heidi: I remember. They're terrible.

Paul: They're still doing that. Right?

Heidi: Yeah.

Paul: I think that actually impacts you more of an Oakland, right ?

The wind comes to the hills and stuff and they'll turn it off.

That has a bunch of knockoff impacts, right?

Where it's hospitals that have to jump, streetlights, food.

I was pisses cause it was like, I like to cook.

I have stuff in the freezer that I've cooked for leftovers.

That's all melting, right? You saw people running around getting dry ice and all that stuff.

The thing is that we need to start thinking of infrastructure as, in a sense, a buffer. Right? You look at a lot of high-speed rail in Asian countries or even bus infrastructure in Europe, right? Trams and trains in Europe, right? That stuff, investment has served them well.

I think in situations, they've been able to, I mean I'm trying to remember--

Somebody was doing mobile COVID vaccinations with the bus infrastructure, right?

I think it was actually in Colorado. It was where my mom was.

It was, they were the buses basically weren't running.

They were able to use the bus capacity to do mobile vaccination stuff.

Heidi: The thing that happened was you could get downtown to San Francisco to get a vaccination, but you had to have driven.

You couldn't take a bus. They wouldn't vaccinate. If you had taken a bus.

Paul: A hundred percent.

So I'll tell you my version of that. I actually had, when I got vaccinated, I had to drive a friend, drove me, cause I don't have a car and we had to go 30 miles away.

It's still bay area. It was up.

Actually, Kim is up in San Pablo, but it would have been easier. I could have taken Bart.

I could have got myself there to Oakland Coliseum, but you have to have a car.

They will not vaccinate you if you were not in a car. So it's, okay, great, thanks. So they had slots too.

They had slots that were more convenient for me, but I had to do this other thing because that was the only thing available.

It's Heidi, to answer your question.

Biden has been talking about infrastructure spending.

Even Trump was talking about infrastructure spending, but the framing of that is often, all right, we're going to spend a billion dollars on infrastructure, but then it's, okay, well what's the, it's kind of a boondoggle.

Heidi: What are you doing for the other 49 states?

Paul: Well, that, and also it turns into kind of a boondoggle for construction companies. Right?

This viewpoint of, we should invest in infrastructure, and we, t hat's a public good is a framing that I think we've kind of lost because it's just weird.

I don't know why we've lost that.

Heidi: Okay. I think we've lost public good in a lot of directions.

It's distressing, vaccination conversation, fascination protects you and not me.

Paul: Yeah. It's funny.

I have a friend who in college, he used to say this and it took me into after college to really understand how true it really was.

He used to say, it's nice to be nice.

What's interesting to me about that is infrastructure.

It's nice to have infrastructure because it's nice. Right?

It also allows us adaptive capacity actually is what we call it and resilience engineering, but that buffer to use buses and other things for other purposes in emergencies, like forest fires, like pandemics, floods, earthquakes, whatever it might be.

We've taken this sort of private concept of running every system at 98% efficiency, which, even Google tells you, they don't do that with their servers because it's not efficient, but we do that.

We try to run it public infrastructure that way.

That's not working out well for us in my opinion, but we have, it's funny, you said let's get back to the technology and we're talking about infrastructure.

Heidi: I think it's separable.

I think it's useful to say, we have learned in technology that you cannot run your CPU at 98% and get anything done. It doesn't work.

You have to have a buffer to accomplish anything.

If we can bring that back to the public space and say, look, we have to have everybody vaccinated in order for the people who turn out to not produce antibodies for them to be okay.

Paul: I'll point this out, bringing this back to our earlier conversation.

One of the things I do worry about a little bit post pandemic, both for individuals that are chomping at the bit to get back to work.

Companies are like, okay, everybody's back, let's run everybody 110%. right?

Heidi: To make up for the time that you were less productive?

Paul: Right. There was a Netflix, HR, BP business partner.

There are HR folks in there. They're great. She said, and she was very clear about this.

I love this. She said, "You are not working from home. You are in a pandemic. You might be getting some work done in a place that is your home, but you are not working from home."

I loved that framing because I think it put the impact on all of us sort of front and center.

Kim Harrison: I feel like I had to explain that to my parents a few times.

They've been retired for many years. They're like, that working from home thing just doesn't work.

I spend half of my workdays pre-pandemic working from home.

This is not the same. Please do not compare the two. So far from the same.

Paul: Yeah. I have a good question for both of you.

I developed these little kind of habits in the working problem.

I love listening to NPR on my commute home, but I have a commute time.

What I would do is, I would go out to the kitchen and I would put NPR on.

I would wash whatever dishes I had to wash.

That's-- that was my commute at the end of the day, because that was also, okay, you're stepping away from work. Right?

I was working at my guests that I also do my normal, not work things at. Right?

It was that those lines get blurred all over the place.

Either of you have little lag, pro-tip life hacks for getting through?

Heidi: The commute is half an hour on my couch playing a video game before I go talk to my family.

Paul: I like it.

Heidi: I'm pretty much done with Animal Crossing.

I got the statue of ultimate perfection, or something. I won Stardew Valley.

Many people do not know that you can win Stardew Valley, but I will grind enough to win Stardew Valley.

I got all the fish, I got all the crops. I won this game.

I'm looking for the next thing, but--

Paul: I will admit I played--

Then I moved, so I haven't hooked my PlayStation back up.

But I played Yakuza Zero, which is a kind of a Grand Theft Auto-y but it's very kitschy, but it's a fighting game.

So it's, it very much was, I need to mash some buttons and do some Kung Fu and beat people up.

That was my outlet on that. Kim, did you have a life hack?

Kim: I kind of do something similar.

I give myself like a good 20 minutes before and after to read or I don't know Netflix or do something, but living in a studio apartment probably a weekend, I realized I'm in one room.

This is it. So by the time my first meeting starts, it has to feel like work and not a messy bedroom.

I have to make my bed every morning. I have to put on clothes that are less pajama.

I have to make my bed to feel it is now an office space and not Netflix viewing bedroom.

Paul: Right. Yeah. I know. I could imagine that I had with a studio where there's less kind of separation.

I had my desk and the desk that I'm at was three feet from my bed.

It was in my bedroom. Right? I did have the benefit of my TV was in different room and whatever.

That was hard with just sleeping literally next to your work.

So I can imagine where studio, where those lines get blurred all over the place would be really hard.

Kim: So I have a question.

I feel like you have brought up the socio element of this many times.

We've talked about this a lot. What inspired Netflix to send some of you to the school?

Who realized we can't just buy systems.

We have to help these people think about it and just think about all of it in a totally different way.

Paul: That's a really good question.

So I did get my master's before I was at Netflix.

I think the question you're asking is sort of, why did Netflix think about getting people with this expertise?

Netflix did not send these folks to learn. Yeah.

Kim: Apologies.

Paul: Yeah, no, no. That's okay. I'm just clarifying so that you understand the answer.

We talk a lot at Netflix about strategic bets and that's a kind of a business concept.

It's the same thing, right? If you've heard that term before, but strategic bets.

This is all a Dave Hahn jam, right?

We have a resilience engineering team, but it's actually not the applied resilience engineering that I do.

That team is really responsible for our AB test infrastructure and chaos engineering infrastructure.

It's a little different and in fact, that's technically not resilience that's robustness, but I don't think we're going to get them to change their name to the robustness engineering team, which is okay.

That's totally fine.

Heidi: Those of you who don't know the difference between resilience and robustness is resilience is people and robustness is systems and people who are nerds about this care greatly.

Paul: Yes, very much so. Lots of our words to be pedantic about.

So, Dave is the manager of the core team and Dave, you'll often hear this referred to as the safety one, which is, sort of linear views of the Domino's effect, right?

Where it's a chain of events and there's an incident.

We're Swiss cheese model is another one.

You know, it's a very kind of traditional safety science or thinking about safety science and it's still used all over the place. Right?

Safety one and then there's safety two.

What you're talking about Kim, is these crazy little people, they're more in the safety to space and this idea that there's socio-technical systems that you should look less at failure and look more at the success and how the system works and all of that kind of stuff.

This was a strategic bet that Dave made to build out a team that would help bring some of the deeper systems thinking and safety to thinking to Netflix's operations and largely focused.

You want to talk about emergent stuff, largely when I was hired-- It was originally before the pandemic. It was largely focused on incident reviews and that kind of stuff during the pandemic. My job changed pretty drastically because I was actually helping the team and helping the organization wrangle some of the emergent complexity coming out of this stuff.

Right? It's interesting.

I've said before, I used to say this when I was a consultant, that part of the role that I play is a little bit of an organization, organizational therapist, right?

So looking for weak signals of risk and then surfacing them and bringing them to leader's attention and then having deep discussions about what to do about them.

Which sometimes may actually be, yeah, we understand the signal, but it's not actionable or not something we're worried about enough. Right?

And which is fine.

So yeah, it, to answer your question, Kim, it was really a strategic bet that Dave made that he thought it would be in Netflix's best long term interests to start exploring this thinking.

If you look at a lot of the way that Netflix does operations, it has a lot of a stored history of being at sort of the forefront and thought leaders in that space.

I think the way Dave was thinking about it was a natural progression of what is the forefront of the thinking on this.

Then having me come aboard and there were folks by the way, doing similar work before me.

So it's not like I was the first one there, Nora Nora was at Netflix doing this sort of work.

In fact, Heidi, I think Nora was at Netflix when she presented at redeploy. The first one. Yeah.

So it's not that I was the first one, but to make a concerted investment, that was a bet that Dave made.

I'd like to think it paid off. Maybe I should ask, I should ask Dave to pay off.

I think he would say it did.

Heidi: So I know that you're interested in air traffic safety.

What's one outcome of the decreased travel that you think will surprise people?

Paul: So, I am interested in air traffic safety.

I'm also a pilot though, full disclosure. I haven't been up in the air for a number of years now. I'm not current.

The thing that I think will surprise folks is in various pilot circles, there have been people posting videos, landing their Cessna's and their tiny little aircraft at SFO and at LAX and at all of these huge airports and their reason that they can do that is because the law, especially at the beginning of the pandemic in air traffic, basically you have to have those towers staffed and their bored up there.

If they got you take off from Palo Alto Airport and you go up the bay and then call San Francisco terror because of the pandemic, they were bored.

If you want to land on the runway. Cool.

Which were, which runway would you like to land on? We will--

Heidi: That's the one that comes in from Asia. Sure.

Paul: Yeah. Yep. Yep.

So there's a lot of videos of pilots from all over the country landing at airports that they otherwise would never, ever, ever get to land net.

Then you get to write it in your log book.

I landed at SFO. That's something I have never done by the way.

I've flown a lot over SFO, but I've never landed at SFO.

I think that's a thing that decreased travel, an effective decrease travel.

I think probably surprising to lots of folks.

Heidi: That is super surprising. You're going to love this. We have an upcoming guest who is not a tower controller, but a center controller.

Paul: Oh, nice. Nice. That's awesome.

Heidi: It's super, super nerdy. So that's--

Paul: Which center?

Heidi: The MP, Minneapolis.

Paul: Minne-Center.

Heidi: Yes.

Kim: What would you want us to ask this person?

Paul: So I would love you to ask what you asked me like Netflix, and COVID say words, tell that story.

I would love to hear that story at Minneapolis center.

I also think they'll probably going to have a lot of interesting perspective around when you talk about scale and sort of emergent stuff.

I can speculate on some of the stuff that I think is the national airspace system is actually restarted.

It wasn't, I don't think it was as abrupt as post 9/11 where the day of 9/11.

Then right after literally everything was shut down. Right?

It's not going to be that abrupt, but I do think there's all these things around pilots that have to stay current.

Airline pilots have to stay current. So, when you shut these systems down, because the planes aren't flying, you do get some weird emergent effects.

I'm not so worried about the pilots they're humans. They'll figure it out.

They've got ways to do this, but you look at the airlines, they parked a lot of planes in the desert for a lot of months and--

Heidi: They're not meant to be parked. They're not designed for it.

Paul: Yep. Hundred percent.

So the thing is, if there is a class of failure mode, like rubber that gets brittle in certain seals on the aircraft or things like that, we may see some operational impacts.

I don't mean crashes, but we may see some diversions and things like that.

I mean, this wasn't due to that, but we'll maybe see videos of that.

That United plant in Denver that was littering, it's trash all over our Veda, on fire.

We might be treated to some more videos like that, which as long as nobody gets hurt is great for me because then I can put them in my talks, my conferences--

Heidi: That was such a great example of a successful failure.

Your plane is on fire and one of the engines is not working. What happened? Everybody got off fine.

Kim: You guys are making me really anxious right now.

Heidi: Oh, you know what? Don't be air travel.

Even after this. So much safer than being a cyclist.

Paul: Oh man cars. Yeah. By the way, Kim, you're not alone.

I'm thinking of Rich Grows right now.

He's a Twitter friend and he always-- he's a good friend, but he's always like, Paul, you make me so nervous.

I can't go to a Paul re-talk if it's a conference I flew to. I'm like, okay, I'm talking about--

Heidi: I can do a Nick Means talk now. Right?

Paul: Yeah. It's one of those things where it's like, listen, it's going to be fine.

It's going to be fine. If it's not fine, if you live through it, go buy a lotto ticket because, it's your lucky day.

If you don't, I mean, I love you dude, but it was your time. Yeah.

Kim: Oh my God.

Paul: It'll be okay. It will all be okay.

Heidi: Jim, you know how much I fly?

Do you know how many exciting things have happened to me on airplanes?

Kim: This is true. I think exciting is when you get bumped up to like first class. You sleep well.

Heidi: Yeah. The most exciting thing is Delta put me in a Porsche and drove me around on the tarmac because I fly too much.

Paul: Oh, that's cool. I didn't know that.

Kim: That's really cool.

Heidi: It's really cool. It was a really tight connection for an overseas flight.

As I'm coming off, they met me at the jet way and identified me by name and walked me down the stairs of the jet lag and loaded me into a car and drove me around to my connection and looked at my passport.

I never set foot in the airport.

Paul: Nice.

Kim: That's really quite special.

Heidi: It was see loyalty for life. That was cheap.

That was so cheap compared to what I spend on airline tickets.

Paul: Well, and the impact was so outsized for you as a customer.

You're like-- I will never forget that.

I will tell that story and I will probably always fly Delta now.

Heidi: Exactly. Yeah. My friends.

It's an interesting, how do you do outsized things that are surprising and delightful?

Paul: The phrase we use at Netflix is moments of joy, right?

Moments of customer joy. How do you get more of those? Right? I think that's a great example of a customer joy.

Heidi: Any other recommendations for people we should talk to or things we should be reading?

Paul: Yeah. So, I have a couple articles in a couple of Riley books that were released in the last six months.

They're one of the 97 things. I think it's 90.

So one of them is 97 things, every SRE should know.

I think the other one is 97 things every cloud engineer should know.

When I got my copies of those books and we'll link, you have show notes. Right?

Well, we can get links to those. I know Nathan Harvey and Emily Freeman edited one of them.

Then the other one was edited by Jamie Wu and Emil Stolarksy.

I have to tell you paging through both of those, the article title for one of the articles I wrote was, "What can Safety Science Nerds Teach us about Operations" or whatever.

Then I wrote the article and I love that title. Right?

I think it's the one that Jamie and Emil edited.

I really liked the structure of it because it starts from how do you go from zero to one?

There's a bunch of articles. Then the next section is how do you go from one to 10? Right?

Then the section that my article is in the end. It's, how do you go from a thousand to infinity? Right?

Heidi: Hello nerds.

Paul: Exactly.

Heidi: You have actual scale problems who are big enough to actually be using Kubernetes.

Paul: Right. Well, and also it's this, what are these crazy people that are on the bleeding edge saying that one of two things will happen.

They will either self immolate with whatever technology they're playing with or it'll be like the next fusion thing that we're all using.

So, it's one or the other, but, there's an article they wrote before we know which it is. Right?

My point was you should grab copies of those books because when I sat down, when I got my copies and sat down and started kind of just looking through them, I was just amazed at all the contributions that they got.

The variety of the different names I recognize and some names that obviously lots of friends that I knew, but you know them, you would know them too Heidi, and probably Kim, you would know those names too, because, we see them at conferences. Right.

Then also people that I know of, but don't know personally were in that book, both of those books.

I really liked the way they both were segmented and suctioned out.

There was really something for everyone and a lot of good content in there.

If you're looking for new ideas and even sort of new hot takes on current ideas, those two books are a good place to just kind of page through and, get tidbits of knowledge.

It's funny, Jamie and Emma were like, and Nathan, Emily were too, the word limit is a search 600 words.

Heidi: It's so hard.

Paul: Yeah. So yeah.

The good news is the good news is you will never-- well, the investment is very low, if it is right.

You're not going to waste a ton of time on that.

Heidi: It's very dense though.

Paul: I'd recommend those two things to take a look at.

Especially if you're wanting to re-familiarize yourself with all that stuff.

Heidi: Is there anybody or any job title that you think we should be inviting on?

Paul: Ohh, that's a good one.

You're looking at a lot of scaling stuff right? Have you had network engineers on?

Heidi: Not yet.

Paul: There you go. I would get a network engineer.

I would love to hear about some of how they deal with some of the scaling stuff.

It would be great to have like an Amazon network engineer that has to support the APIs that connect EC2 instances networks and all that kind of stuff, but all of that.

Heidi: Yeah. How did we scale the internet that fast? Maybe somebody from a CDN.

Paul: Yeah Hundred percent. CDN.

Heidi: Any final thoughts for the amazing audience?

Paul: The next few months are going to be joyous and they're going to be hard.

So just give grace to yourself and others.

We talk a lot about this being a Black Swan event, once in a lifetime. I hope that it is.

As we go through that, just-- the emergence of that remember emerging from the pandemic it's going to be emergent, literally and figuratively.

Kim: it's going to be wild.

Paul: It really is.

Heidi: You know, Australians don't understand this phrase, why sponsor black and Australia.

Paul: Oh. Yeah.

Heidi: Which I think is a super interesting reflection on like expectations. For another time.

Paul: Have an Australian on, there you go. Just about that.

Heidi: Australian network engineer. That'd be amazing.

Paul: There you go.

Heidi: If any Australian network engineers out there listening, we'd love to talk to you. All right. Thank you so much.

Paul: Thanks for having me on.