January 21, 2020
Ep. #24, Enterprise Analytics with John Whaley of UnifyID
In episode 24 of EnterpriseReady, Grant joins John Whaley of UnifyID to discuss the process of monetizing university research, how technolog...
In episode 17 of O11ycast, Charity and Liz are joined by Pete Hodgson, an independent software delivery consultant. They discuss the DORA report, closing the CI/CD gap, and leveling up software delivery.
About the Guests
Liz Fong-Jones: So, Pete. Less than half of the low to middle performers in the DORA report have continuous integration, continuous delivery. How can we close the gap?
Pete Hodgson: I think a lot of it is around education. I think a lot of it is unknown unknowns, like a lot of people particularly in the enterprise don't actually know there's stuff that they could be doing that they're not doing.
Charity Majors: Really?
Pete: Yeah, for sure. It's a combination of that, or "You're saying that word, but it's not that word."
Liz: People redefining the definition to not be what we think it is.
Pete: Yeah. I think that's really interesting, not what is the truth, but the way we talk about things like continuous integration.
CI is an awesome one, because if you talk to Jez or the people who blazed the path for CI, they would say "Absolutely . Definitionally, CI means integrating code into a trunk or master at least daily."
I guarantee most people in the industry would not just be surprised at that definition, but would actively argue with people, "No. That's not CI, CI is my Jenkins infrastructure or my Techton," or whatever.
Liz: I see. They would say "The idea of long-lived feature branches that diverge regardless of how often you build them is not compatible with continuous integration," that you have to merge with--
Pete: I think it gets really squishy with definitions like "Long-lived feature branch."
Depending on who you talk to, "Long-lived feature branch," people are going to be like "More than a few hours and I start getting nervous."
Other people will be like "More than a couple of weeks and I start getting nervous," and I really think it is that broad a gap.
Sometimes it's people hearing the correct definition and disagreeing with it, but a lot of the times it's people not even realizing that their definition is controversial.
Charity: I feel like when we have these conversations about definitions though, I feel like we're still talking about most of the people who are to the right of the 50%.
If you look at the DORA Report, it's saying that half of engineers out there are suffering under-- They can't even ship once a week.
I wonder if it's less that they don't know about it, and it's more that they don't think that they're good enough or they don't know how to get there, or they think it's for someone else.
Pete: I think there's probably a little bit of both.
I live in this bubble of although I talk to XYZ customers, I still live in a little bit of a bubble. I live in Silicon Valley-ish.
Charity: We all live in bubbles. It's fine.
Liz: I think we live in bubbles, but I think you're the closest to breaking out of that bubble. Of having those--
Charity: As a consultant.
Pete: I think I'm less surprised maybe than you two are about "How bad" things are, but I think I would much rather say I'm excited by how much opportunity there is to lift the bar.
Or, to raise the raise the level. Like introducing stuff that seems like super table stakes, like not even really proper CI practice, but just really basic stuff can really help a lot of people.
Charity: These things aren't new. They've been around for a long time now, and the gap is enlarging.
Like, the gap between people who are high performers and that 50% who are losing ground is what I look at and I go, "Why? How? Do they not see something in it for them? Do they not see--?"
Because they clearly don't-- I'm generalizing, but it seems to me if they're losing ground they must not prioritize getting out of that state.
Pete: I think maybe a lot of it could be a little bit of a disconnect between the people feeling the pain, and maybe even the people who are aware of what the solution is and the people who can fund it, and all that stuff.
Charity: Feedback loops.
Pete: Yeah, feedback loops is a really interesting way of thinking about it. As an individual contributor or an engineer on the ground, maybe I read a blog post and have found out about all these ideas, but I need to sell it to my manager's manager's manager and--
Charity: And they want to spend all their--
Liz: The manager's manager's manager wants to bring in a or-- or--
Charity: Or they just want to spend their engineering cycles on building product features, not on a bunch of navel gazing internally or something.
Pete: Yeah, maybe. I think some of it is misaligned incentives, and I think a lack of agency or empowerment.
Where if I'm in a more traditional enterprise-y place, I can't go off and stand up a CI.
I've been a few places, and I see this less than I used to, but I've definitely been at places where the trailblazing team were doing this shadow IT thing where they were literally running a team city insulation on a desktop on a spare server, underneath--
Charity: Unsanctioned computing.
Pete: Yeah, like shadow IT. I think now it's more likely that the tech lead or the tech leads manager is using his personal credit card to pay for a GCP.
Charity: I feel like part of the feedback loop that I've been focusing on for the past couple years is putting software engineers on call, because once you've connected those dots so that the people who are shipping the code are feeling the pain once something breaks, it's like everything kicks into hyperdrive.
Suddenly things are-- Teams just take off with making improvements and everything gets better, and until you've connected that part of that feedback loop you're just dropping shit.
Because the people who are making the changes, they're not feeling it.
Liz: Now would be a great time for you to introduce yourself.
Pete: Hello, my name is Pete Hodgson. I'm an independent software delivery consultant is how I'd describe it, so I work quite a lot with product engineering teams and helping level them up in terms of delivering good quality software at a sustainable pace.
Charity: Who hires you? What makes a company go, "I need to call Pete to help my team get better?"
Pete: I think it goes back to what I was saying just now, around people who are aware that they have an issue and they are aware that it's worth-- They feel the pain, they believe that they can--
Charity: There's a better way.
Pete: They believe there's a brave new world, or at least a slightly less crappy world, and they think I can help them.
Charity: Is this often like somebody joins you, or comes from maybe a Google background or something or whatever, and they're like "It can be better," and then they convince somebody to pay money to you to come in?
Pete: Yeah. A lot of times, like for me personally, I think it is really varied and it is really random to be honest.
But a few times it's a new CTO comes in and is like, "Show me what your CD pipeline looks like," and the tech leads are like, "What's a CD pipeline ?"
Or "Here's our CD pipeline," and the CTO is like "That is not a CD pipeline. That is a shell script that you run on this one is special server."
All that stuff, and I think that's definitely a lot of times it's someone who understands that where they are today is not where they need to be, and has a vision for where they want to be, basically.
Liz: How did you wind up going from working on CICD pipelines and software delivery to observability? How did you discover observability?
Pete: I think a lot about feedback loops, it's almost my favorite thing.
I have in the back of my mind, I've been working for years and years on these core fundamentals of what I think makes a good software delivery system.
Definitely in there is a tight feedback loop, and you can't have a feedback loop if you don't have some observability.
Maybe "Observability" isn't exactly the right word, but it definitely fits into that bucket of stuff.
Liz: It's this interesting synergy, but also this chicken and egg effect in that it's hard to get observability quickly if you can't deliver software within three months.
You add the line of code to try to get better interpretation and then it takes three months to deploy, or conversely, if you have really bad observability but really good delivery. Right?
Pete: I think there's a middle ground. Because if you are the organization that deploys to production every quarter, you almost certainly have a cornucopia of pre-production environments, and you care about them actually a lot more than--
People who are really humming at CD and are pumping out changes to prod every day, they don't actually pay as much attention to their staging environment and their pre-production environments.
People who release to production every quarter, they live in their pre-production environments and they live in various different ones. I think going back to that agency, where you can start shifting the needle in terms of things like observability, is introducing observability in those pre-production environments.
You do start seeing the value, because QA's are like "I found this weird bug where when I click on this thing, this thing happens ."
Then if an engineer can pop open their tooling, and this is not just observability in the Honeycomb sense, but even just log aggregation.
"Can I get some--? Does my pre-production environment have logs so I can see why stuff is not working?"
That is a thing that isn't in all pre-production environments, so adding all that stuff does bring you value and it's something that an engineer has the agency to actually do when a lot of times they're not allowed to touch prod.
I had to fill out a form in triplicate in order to get some new tool.
Liz: Exactly. We talk a lot about putting engineers on call, but for some organizations that have more pedantic auditors that don't believe yet in separation of duties by code review rather than separation of duties by separate job roles.
I think that there is that need for that middle ground of "Developers can benefit from observability in their staging environments, even if they don't have production access."
Pete: Most of the time that separation of duty stuff is not actually the auditors.
I've talked to a few people recently who were trying to figure out how to do-- They're moving to CD and they're like, "OK. But how do we do PCI?"
We talked about it a little bit and they said, "We can't do this, this and this because the auditors won't let us."
I said, "Why don't you talk to your auditors and ask them what they actually need? What is the value they are trying to get, and how can you deliver it a different way?"
Charity: Feedback loops.
Pete: Yeah, exactly.
Liz: So many misconceptions. If you don't question your assumptions, or document your assumptions-- Yeah.
Pete: Or if you don't ask, the other thing I'm really passionate about is "What is the value of this activity?"
The value of audit and compliance stuff, it's not actually security. It's passing compliance.
It's nice if you also get the value of better security and all the rest of it, but if you say "OK. What does the auditor need in order to check those boxes?"
And then "How can we deliver that in the most effective, efficient, whatever way?" Most of the time the auditors are not the problem , a lot of times it's fiefdoms and someone who's scared their job is not going to exist anymore.
Charity: A lot of times people use auditors as their fig leaf, or their cover. Their excuse.
Pete: The number of times where I was a client where they just said SoCs, just randomly.
They'd be just like, "Is there a reason we can't--?" And literally there'd be a grumpy person sitting at the conference room table with his arms crossed saying "SoCs."
If you dig into it, they would normally not have much to say about it. But it was literally, they would just invoke random acts of Congress.
Liz: We have to overcome the objections of naysayers when we're trying to introduce the practices of moving faster.
Pete: I think you have to show them the value. You have to say "A, you're not going to lose your job. And B, this is how it's going to make your life better."
People respond and fundamentally humans feel pain, and they responds to the idea of not feeling that.
Charity: Nobody who has a job in systems engineering operations today is ever going to lose their job.
The question is how do we bend this cost curve, how do we make it so that you can do more?
Pete: Yeah. I'm not the first person coming out with this idea, but people who are trying to bring ops people into--
Charity: The engineering fold.
Pete: Yeah. They could learn a lot from what folks did with bringing QA and testers into the engineering fold 10-15-20 years ago.
Because it's actually the same thing, and I think also on the other side they could probably learn a lot from how they are bringing product people in.
Charity: I feel like we're in the second wave of the DevOps transformation now, where the first one was like "Ops people must learn to write code,"and it was like "Yeah. OK. We got it."
Now I feel like the pendulum is really swinging back the other way, and it's more like "OK, software engineers. Time to learn how to build operable services, it's time to learn to own your stuff."
Pete: I think it's super exciting because I think I saw a lot of that first wave, and I think it is still going on, is product engineers excited to play with a lot of cool new toys. Like, "Docker? I can Dockerize stuff."
Charity: But when you give them the ability to see what they've done, it's taking the blindfold off. I feel like today one of the biggest problems in systems is just that nobody understands them.
They don't understand systems, and they're shipping more stuff that they don't understand on top of the systems that don't understand.
They're just blindfolded, swinging the bat at the piñata, and that's how we're managing our systems.
We've just internalized the idea that it has to be this way, and it doesn't.
Pete: Totally agree.
Charity: You can build systems that you understand.
At Honeycomb, we ship from Chron and developers have muscle memory if they go and they look at this stuff through the lens of their instrumentation, after they've deployed to change they go and look at it.
"Is it doing what I expected it to do? Does anything else look weird?" That catches over 90% of all incidents before it ever gets to a point where it could page someone or bother a user.
Pete: I think you see a similar affect in other places, so if you give a product engineer access to analytics and user facing analytics rather than server facing analytics, you get the same effect where suddenly they start thinking about the funnel and how many people are converting, and all that stuff.
They come up with their own ideas, which is the most exciting part.
Liz: I think that's one of the interesting things that Christine Yen has been saying, that when you have the data coming out of your service that aligns with what developers expect to see rather than system-level stats that make no sense to product developers, that really closes that feedback loop.
Pete: Yeah, for sure. I don't have an ops background, I messed around with building my own--
Charity: We all dabbled in our childhood.
Pete: Yeah, exactly. I didn't inhale, but I definitely built my Linux distro.
I definitely come from that product engineering background, and I found Nagios and those cool looking graphs. Kind of like--
Charity: The angry fruit salad of doom.
Pete: They look cool, but terrifying. Like if you showed me a flame chart, I'd be like "I don't know. I feel like I'm not capable of working with that stuff."
But then once you start using it, and once it's stuff that you put in there that you're pulling out--
Charity: Using ops teams as the glue that interprets to software engineers what's happening--
They are the translation layer. They're like, "So you ship this code, you speak about your endpoints and lines of code and everything."
Now, that's what it means when it hits "Memory percentage. This is what the three different kinds of memory percentages are."
Liz: It's not even translation, it's like divination.
Charity: It super is. Software engineers should only have to really speak their language, but you should be able to break down and see which end points are slow and what did they have in common? Just super basic stuff like that.
Pete: I think when I was getting coffee with Liz, whenever that was a while ago, one of the things I was saying to her that I think is super exciting is exploreability and being able to start with something that isn't super intimidating . Like in Nagios, or whatever. Play with it, and--
Charity: Go deeper.
Pete: Go deeper and learn stuff and slowly get there.
Charity: That dopamine hit of just, "I didn't know this existed."
Charity: Oh, my God. You literally get hooked on it.
Pete: Yeah, and I did. I went through the Honeycomb example data set that--.
Liz: Yeah, Play.
Pete: Yeah. That was super fun for me, that was a super fun 10 minutes.
Charity: It wasn't even your data.
Charity: If it was your code and your data, you'd be like "This thing. I always wondered why this was so slow and weird."
That's the challenge that we've often had, is just anybody can make a demo look good. Every data tool out there looks great, and they're often-- They look very much the same.
It doesn't really sink in until you've used it on your data, and that's when you're like "I could never ask this before. I could never do this before. That's what that is?"
Averages cover over a multitude of sins, and until you can break down by any high cardinal and string them together --
And then the thing, I'm sorry, I'm doing a little advertising thing here. But the thing that Danielle designed where--
Because humans bring meaning. Any machine can find spikes in your graphs, only humans can say "This is good, or this is bad, or this is meaningful."
So we built this thing called bubble up where if there's a spike on the graph, you're like "I think this might be interesting."
You draw a little circle around it and we pre-compute for all the hundreds of dimensions what's different between inside that circle and outside, and then we sort them so you can say "These ten things are different about this spike."
Liz: Yeah, it's the opposite of other people's AAOps approaches. Other people take an AAOps approach of "The computer analyzes all the signals and tries to alert you when there's something anomalous," but there's always something anomalous.
Charity: There is always something anomalous.
Pete: And, humans are really good pattern matching machines, especially if you put the visual cortex in there too. You can do so much cool stuff.
Charity: Exactly. Instead of the computer deciding what not to show me, I want it all to be shown to me so my eyes could just pick out what's interesting.
Liz: Yeah, or at least know what questions you want to ask.
Pete: That serendipitous thing of, "I was looking at this thing over here. Hang on a second, why is conversion going down ? That shouldn't happen."
Charity: We've been paging ourselves to death. I say all the time, "People should delete their paging alerts and move to SLO based alerting."
But in exchange to get that, you have to agree part of the contract is you have to affirmatively spend more time in projects poking around, just looking at it and just exploring.
Just opening yourself up to the possibility that you might notice something before it becomes a problem.
I feel like when you're shipping something is the right time to just go check in.
I know that something's changed, I expect there to be a change, so I look at it through the lens of my instrumentation and I see.
Half the time the things you notice aren't going to be related to your change, but you're in there everyday and you're looking at it.
If you're only looking at pride when things are broken you don't know what's happening.
Pete: You don't know what is out of place, yeah. I t was really to me, I interviewed a good number of organizations recently and I was asking them about how they do continuous delivery.
There was this super consistent theme of engineers carrying something, the ones that were what I would characterize as high performing, and I don't have the fancy stats of folks like DORA report people.
But the people who just, to me, smelled they were a high-performing organization.
Charity: They all spend time in prod.
Pete: Every engineer carries the feature to prod and just looks for a while.
Liz: Right. The fewer hand offs you have, the better.
Pete: Most of them don't actually have that much fanciness around alerting and launching and stuff.
Charity: It doesn't take that much.
Charity: You just have to be willing to go look.
Pete: Particularly if, I think the other thing you need is for that thing that you're carrying through to prod to be small enough that you feel not overwhelmed by--
Charity: It has to be yours. You can't be batching a bunch of commands, or other people's commands. You're just going to be overwhelmed by it.
Liz: We talked on an early O11ycast about the idea of smaller batches resulting in better predictability.
Pete: I'm super excited by the weird correlation, or there's an intersection between this move to fine-grained SOA like micro services, and this idea of team ownership.
Because you can't ship a monolith and say, "This isn't my changes that I own." You just can't, so you have to do all this extra stuff.
I think going back to what we were saying earlier about people misunderstanding what some of the concept means, I think a lot of startup organizations and enterprises actually have cargo culted a lot of the really pioneering stuff that people like Facebook do.
Facebook do some amazing stuff in the face of this gigantic monolithic deployment. If you're not doing that, you don't need to do that stuff.
Liz: We see the same thing with people saying, "Google does it this way. Therefore, I am too." It's like, "You have to resist that impulse. What is right for your context?"
Pete: Almost certainly it's not what Google does, because you're almost certainly not Google.
Charity: Almost certainly.
Pete: Or anywhere close to order of magnitude of that scale, so why would you want to--?
It's like watching an elephant and being like "I should walk like that elephant. They seem like they're making a lot of progress in the world."
Liz: At the same time though, when you move from a monolith to a micro service SOA world, it creates new challenges that are not solved by the same tools that would solve your monolith problems.
Pete: Yeah, for sure. When I talk to early stage startups, normally the advice I give them is this idea of a modular monolith, like start off with bigger pieces than you think you need but put some bulkheads and some seams in place from the get go so you can break it apart where you think those seams are going to be.
But don't start with micro services, because they're a real pain.
Charity: It's the same advice that I would give somebody that is doing database architecture , "Don't start with a giant sharded database, but start with it in mind that you might someday need to."
Pete: Put those seams in place so that you can--
Charity: So you can do the migration.
Liz: I think that's super interesting in that it ties into how I first encountered your work, Pete, which is your work on the idea of having your observability be separated from your domain logic.
Pete: Yeah, that came from--
Charity: Great blog post, by the way.
Pete: Thank you. That came from seeing a lot of --This is one of those things where I saw a lot of teams that just instinctively did this, because it seemed like the right thing to do.
But then I came across a lot of teams that this thing hadn't occurred to them, so I was like "I should write that down so people can hear about it."
But the idea of, "Why do you want your domain code, your shopping cart logic or whatever to be sprinkled with calls to an analytics framework, or--"
Charity: Spans, and--
Pete: Yeah, like all that stuff. I think I see this happening a lot when an open source tool or whatever tool provides you with an API, and you assume that's the API that you should expose to everyone inside of your code base.
It's really easy to not do that and to put some stuff around it to sand off the edges, and fit it to your context. You've got to fit it to your context.
So when people, like I would say if someone's using any logging framework or service or whatever they should not be using.
Liz: I think it's logging and metrics frameworks that particularly encourage this behavior of not batching up your data so it can be correlated, and I think that's the danger.
Pete: I would say the biggest thing is that you shouldn't have to think about that as a product engineer.
As a product engineer I should just be able to say "I'm doing this thing, I'll let my observability system know this thing happened ."
"I'm doing this other thing, now I know how much the total cost of the shopping cart was. I'll let my observability system know that happened."
I really shouldn't care at all whether that's batched up, whether it's put into some sidecar or written to a local file system or in memory cache, or each one is a UDP packet. Who cares?
Charity: Once you say it, it sounds to obvious.
Liz: And yet we have so many people that we encounter who have been burned by having woven in deeply a vendor implementation into their code, and now they feel they're trapped.
Charity: Let's talk about the state of instrumentation in our industry as it is. It's, shall we say, delayed?
I feel like it's way behind where it should be, and I mostly, as I was saying, I mostly blame the vendors for this.
Because vendors have been out there saying, "You don't need to understand your systems at all. Just give us tens of millions of dollars and install this magical library and you'll never have to think about it again."
I can see why that's very alluring to your average CTO or CSO or whatever, but I'm sorry, at the end of the day somebody does have to understand it.
When you make those promises, you're just going to end up delivering something that is incapable of doing anything specific.
Pete: I think a big part of it comes down to-- I'm a consultant, so it always comes down to people things.
I think in this case it comes down to historically, I'm an operations person, so there's three groups of people.
I am very generalizing, but there are three groups of people. There is the product manager people, there's the operations people, and there's the product engineers.
The operations people want to know what historically in a pre-DevOps, a second wave DevOps world, they want to know what the latency is of requests.
They don't want to have to do too much communication with other teams.
Charity: They don't have access to the source code, either.
Pete: Yeah, it's high friction. So what they should do is go talk to their product engineering friends and say, "It's really important for us to do our jobs so that we can see latency.
How do you think we should do this?" And together they sit together and they sit around the campfire and play guitar, and then they decide how they are going to do observability, or whatever.
In reality, what happens is a vendor or a tool comes along and says, "If you pay us money, you don't have to talk to those people."
Pete: The same thing happens with product managers, where a product manager wants to do an a/b test.
The right way to do it is to work with the product engineers to weave it into the code, and then a vendor comes along and says "If you pay us money, we'll just let you mess with the web page and you don't have to talk to a product engineer."
And the product engineer is like, "Why? You're ruining our lives." And the product managers are like, "Because you're ruining our lives."
It just happens when you've got silos, or when you've got teams that don't work together.
Charity: Or it just happens because we're human beings.
Charity: Talking to people is hard, and we all have a limited number of cycles. So we're like, "If I can just not have to deal with those people."
Liz: It's almost this idea of context, if you share context it makes it easier to work on a thing jointly. But if you don't share that context, then you want to hoard your context and silo it from other people.
Charity: It's always introducing an element of unpredictability whenever you have to dependent someone else too.
It is your job to manage down the number of unpredictable, to limit your exposure to that.
Pete: De-coupled things are good, and it is actually good if you can figure out a way to say "This system is not reliant on this other system," that's great.
I love how in the last maybe 10 years there's been this shift towards this judo move, "If i t's really hard to do this stuff without all these teams talking to you, why don't we move the shape of the teams and bring the devs into the ops thing or the product people or the QA, or whatever."
Then they're not having to talk across teams, because it's someone who's in their team.
Charity: Which is really just trying to realign so that everyone feels like they're on the same side.
Pete: If they literally are, if you literally go to lunch with this person every day, at lunch you're going to say "We were thinking about how we're going to measure CPU latency,"it's just really weird human stuff that actually works.
Liz: I think it's important to keep in mind why we're doing it, and not just do this rote act of reorganizing our teams because someone says "DevOps is putting dev people on ops teams," or vice versa.
But instead thinking about "Why are we doing this, to break down the silos? To share information?"
We talked earlier about the state of telemetry. I remember when I talked to you, Pete, it was almost a year ago and we were bemoaning the fact that there were no common standards and thus people really had to, in order to get pluggable telemetry, had to use a outside library or use--
Or in some cases build an instrumentation pipeline. Where do you think things sit now?
Pete: Honestly, I don't know enough about that. I honestly don't track it enough.
Charity: What do you think, Liz?
Liz: I think that we've both seen the evolution of the idea of an observability pipeline, similar to what people at Cribbl are doing, similar to what we've seen , for instance, Slack do with their Kafka based pipeline for telemetry data.
I think that's one answer that's becoming more production practiced and production ready, is the idea of the observability pipeline and queue. The other idea that we're starting to have increased confidence in is open telemetry, now that there's actually much more momentum behind it and now that it's not vaporware.
A year ago it was, "Let's have open census and open tracing, stop fighting with each other and work on one project."
Now there's actually an SDK and an API that you can run in production, and I think that my advice six months ago that I even wrote in a blog post was, "You should even consider keeping open telemetry at arm's length, because goodness knows whether that's going to become another standard or whether it's going to fizzle."
But now I feel like I have sufficient confidence that I would actually potentially just put up open telemetry directly in knowing that the SDK is flexible enough that it provides the domain-oriented observability pattern that you were hoping would exist a year ago.
Pete: I think beyond observability in that sense, if you talk to a product manager and say, "How do you find out how this thing is performing?"
Most of the time they are not going to talk about open telemetry, they're going to talk about Segment or one of those tools.
It's weird that they're all going through similar problems that other similar systems have gone through, but they're all being solved by a whole new set of tools.
Liz: I feel like we still have this large gap between BI tools and observability tools for ops and in teams.
I think that definitely is a gulf that we need to solve, because certainly people ask us today, "Does Honeycomb do BI?"
And we're like, "We use it for BI at Honeycomb, but we're not optimized for that right now."
Charity: Data is data, so you can, but--
Pete: I think I was talking about these three teams of product manager people, product engineering people, and people who are looking at things from ops.
Almost everywhere I've been, in fact everywhere I've been, there's three different parallel chains of information being pushed through.
Liz: It's this thing of, "Why are you paying this for this data three times? Now we're storing it twice, but it should really be one."
Charity: Tools create silos. The edges of tools create the edges of reality.
Pete: Then the flip side is in theory, they provide interfaces for those silos.
Charity: But in reality you end up arguing more about the nature of reality than actually just getting on the same page and solving the problem.
I'm really frustrated just in the inside about this idea that you need to have metrics, logs and traces in three different tools.
Liz: Because now we're going from three different data silos to five different data silos.
Charity: It's insane and it's unnecessary, and that I blame the vendors for. Because vendors are just like "I've got a logging tool I need to sell, I've got a tracing tool I need to sell, and I've got a metrics tool so I guess that's observability."
And you're like, "Literally this is worse than before."
Because now you've got a human sitting in the middle just copy pasting ideas around, trying to visualize it in three different ways. They're all different sides of the same elephant.
Liz: To wrap things back around to where we started, we talked earlier about the gulf between the haves and the have nots of the people who are struggling with basic continuous integration and continuous delivery practices, versus modern practices that are both doing great CICD and also doing great o11y. How do we close that gap?
Charity: What are the quantum leaps that we can take so they don't have to just fast forward through ten or fifteen years of all the changes?
Pete: I don't know. I think it comes back to this thing of awareness that this pain you're feeling doesn't actually need to be there.
Let me give you a weird analogy, when I was a younger person I was in my 20s and I was building a rails app.
The way I looked at the logs was I SSH-ed onto the three different boxes, and I tailed them. That's how you looked at logs. I didn't know that was weird, and then someone told me that you can send them all to one box and then you can just SSH to one box. That was amazing.
Then someone taught me Grep. Then I showed up at some other company and they had Splunk or something like that.
My mind was almost literally blown, like "This is amazing. How far back can you search in the logs?"
I think there's this awareness thing that people need to get to that there's this whole world that they don't even know is there.
I think part of it is teaching people that putting up a big screen in the dev area with a dashboard doesn't actually help at all.
Or maybe it helps a little bit, but almost forcibly sitting with a product engineer and saying, "Let me show you this thing and all the cool stuff you can do."
Charity: There's one shortcut that I have thought of, and I have been thinking about this problem a lot.
The main one that I can think of is "Quit your job, go work somewhere else that is a high-performing team," because I feel this is how we spread knowledge.
This is how we cross-pollinate. You were just talking about how you get brought in by a new exec who comes in like, "It can be better."
I feel like in engineering, you can be very highly paid and there's very little risk, but if you think that you're working in the bottom 50% of teams go find a better one and learn it.
Because I think there's this misconception that the high performing elite teams are made up of the elite engineers, and that's not true.
They're no better than-- They're all made up of median engineers, b ut they have been exposed to better ideas and they have higher standards for business.
Once you've seen it, it's just like with your body once you've worked up you can bench press 500 pounds. It's so much easier to get there the second time.
Liz: And it turns out once you've seen it, you can go back to one of the lower performing orgs and make it better.
Charity: You can do it and you can take it with you, but while they're there without this cross-pollination I don't see how anything changes.
Pete: I think I'm trying to imagine hypothetically, someone is working at Nebraska Insurance Company and then they go and spend--
Charity: But there's a GitHub repo of companies that hire remotely, and you can work with some of these-- Almost all of them are the best in the world, best in class.
Pete: But then after they bring it back, I think the thing is-- They do their tour of duty at Stripe or something.
Then they come back to Nebraska Insurance Company and they're like, "Everyone. You will not believe."
But then what do they do? I think you need to do all that stuff, the awareness thing, but I think you also need once they land back in--
Charity: Sure, you need a better reintegration strategy and then just telling them that they're wrong. But I can think of a few--
Pete: I think it's like, what I'm saying is I feel a cultural--
Charity: There are a couple ways that I've seen this happen. One is, of course, you hire a manager who's like "This is appalling. This must change."
Another is an engineer, and this is a very bottom up way. An engineer just comes and installs-- Like, healthcare.gov and they came and just installed new relic.
Then you just lead by showing, and you're like-- This happened at Parse when I didn't sense much in Scuba, but a couple of my engineers started playing around with it and suddenly I realized that they were debugging things faster than I was.
That was very hard for my ego to take, so I was like "I guess I'm getting on board ."
Liz: Yeah, I think that's the thing of it used to be that very high quality o11y tools were unique to Twitter and Google and Facebook, and now they're available to more people such that you can just turn them on.
Charity: With free tiers you can just do it.
Liz: I think what we're getting out of this is that we can lead by doing and showing, and that we need to clone Pete a bunch of times too.
Pete: The obvious answer is bring me into your organization.
Charity: Obviously, just pay Pete lots of money, I get a 10% cut.
Pete: Actually, that does not work. I can tell you from experience, what does not work is robbing some consultants on your team when you don't have the culture.
Charity: The engineers are fun.
We all love creating something, or we wouldn't be an engineer.
We love building, tinkering, creating. We love results and the ways that engineers impact each other and affect each other are so much out of just looking over and going, "That looks better than the way I know.
That's really cool." I feel like as an engineer, if you want to have influence, first you solve it for yourself.
You just do it, and then you learn some communication skills so you can not rub everyone the wrong way . Just showing them a better way.
We all as humans, we crave it and we are drawn to it. You have to really drum this out of us hard to get us to stop responding to that.
Pete: I think it's very true, and I think you saying earlier about engineers, it's not like the engineers at the high performing orgs are better.
Charity: They're no better.
Pete: I've worked with some amazing and super passionate, passionate in a different way sometimes--
Charity: Some of the best engineers I know are in the backwoods of Idaho.
Liz: Because they figured out how to make the hacky thing work, right?
Charity: Yes. It is incredible.
Pete: Or they're just really good. They're just really good.
Charity: They are amazing.
Pete: They are just quietly, they're just like "I don't know. I don't really feel the need to talk about it. I'm just going to do it."
Charity: Some of the best engineers I know are in the middle of nowhere. It's not a question of how good you are, it's a question of being open to new ideas.
Pete: My favorite lean manufacturing story is about the NUMMI plant that's in Fremont, that's now the Tesla plant, which is an interesting twist.
It's probably a long enough story that maybe I shouldn't go into it right now, but basically the TL;DR is they took the worst performing auto plant in America and they turned it into the most high-performing auto plant in America with the same people.
Charity: The same people, yes.
Pete: They changed the culture. They empowered the people to help to fix things for the bottom up, and there was someone on the plant who was like "I should install an observability tool."
Charity: This is what-- Working at Facebook as a manager and hearing the conversations about "The bar," like, "Our bar is so high."
I would just roll my eyes every time, because it's so self critical. This is the performance they do internally for each other, and it has nothing to do with reality.
It is about the ideas, it is about the quality of the ideas and it is about your openness to new ideas, and it is about your communication level .
At Honeycomb, our engineering interview is that we give them a little take home piece of code the night before.
But we stress that is not the interview, the interview is you come in the next day and you talk about it with a couple of your peers.
Because we really strongly believe if you can explain what was going through your mind, what you chose, what you didn't choose, the tradeoffs, you could do the work.
Pete: That is a lot of the work.
Charity: We almost don't even care what it looks like, compared to "Can you communicate about it?"
Because that that's the skill set and tool set that we want our engineering team to have when it comes to everything about the way they work.
Building software is a team sport.
Pete: It doesn't really matter if you can do it all on your own.
Charity: Ironically, the "Elite engineers" can be really hard on teams. They can be really damaging, they can be really--
Charity: Because it's just, it's not that it's irrelevant but it's orthogonal. It doesn't matter. It is so much--
Liz: There's no 10x engineers. There's 10x teams.
Pete: There's the person that 10x-es the team--
Charity: Is almost always not the "B est engineer."
Pete: I know we say it's not even all fog and I think there is a--
Charity: A negative?
Pete: Yeah, for sure.
Charity: I wanted to be kind, because I know-- Some very good engineers are good friends of mine and I didn't want them to feel insulted.
Liz: Excellent. Time has certainly flown by, thank you very much for the enjoyable conversation, Pete.
Charity: Thanks for coming, this was really fun.
Pete: Absolutely, my pleasure. It was super fun.