1. Library
  2. Podcasts
  3. O11ycast
  4. Ep. #88, Metrics Are Good, Actually with Charity Majors
O11ycast
37 MIN

Ep. #88, Metrics Are Good, Actually with Charity Majors

light mode
about the episode

On episode 88 of o11ycast, Ken Rimple and Jessica Kerr speak with Charity Majors about the shifting role of observability in modern software development. From AI-assisted instrumentation to reducing developer cognitive load, the episode examines how teams can move from reactive monitoring to continuous learning. It’s a deep dive into observability as a sense-making practice, not just a tooling problem.

Charity Majors is a software engineer, author, and the co-founder and CTO of Honeycomb, a leading observability platform. She is the co-author of Observability Engineering and a long-time advocate for high-cardinality telemetry, systems thinking, and developer-centered production practices.

transcript

Charity Majors: You know, AI is the new cloud. I am old enough to remember when the cloud team was the special team, the experimental team, off on the side doing something weird. And then two, three, four, five years later, those positions had switched. Right? Cloud was now the mainstream.

And it was like, okay, there's this weird laggard team over here dealing with the on prem stuff. So we stopped saying cloud all the time because it was the new normal. And I feel like we are in a compressed version of that timeline.

We're in the position now where-- For a while, everybody was just like rolling their eyes, "oh another AI thing." And that's like, well, do we even have to say AI? Because it's the new normal and it's weird if you don't have an AI angle. So, yeah, what was the question?

Jessica "Jess" Kerr: It was, what's happened since the last time you were here?

Charity: What's happened since the last time I was here? Oh, boy. You know, we've built a lot of shit at Honeycomb. AI has gone from a toy to like real thing. I have been working a lot this year on the second edition of Observability Engineering. I actually can't remember what happened two years ago. So I'll tell you what's happened in the last six months. How about that?

Jess: Great. Yeah. The last six months has easily taken two years.

Ken Rimple: Yeah, that's true.

Charity: That is a great one. I agree. So the second edition of the book actually--

Jess: And this is Observability Engineering, right?

Charity: Yes. I think books are kind of like children. You're not really allowed to say that you're not proud of one if you made it. But I kind of am not really proud of the first edition. It took us three and a half years to write, and at no point were we like, "this is done. This feels great. Let's put a bow on it."

It was just like, oh, my God, we can't. If the editor will take it, just ship it. Just get it out, because I can't do it anymore.

Jess: Put it in boarding school.

Charity: Yes, take it away. I am no longer mothering this book, but that came out in 2022, I guess. And when O'Reilly asked us about doing the second edition, I was like, yeah, I want to do this. Like, I actually feel really excited. I feel like now we know who we're talking to, it's so much clearer.

So the part that I'm responsible for, I got together with my co-authors over the summer and we sketched out a section, part six, Observability Governance. And it was a bunch of chapters, all for observability engineering teams, about how to manage the data, how to do migrations, how to picture, how to roll things out. I worked on it for a few months, and then over Thanksgiving.

So my mistake, the first mistake I made was asking the rest of the world for their advice and their stories. I posted a couple times on my blog like, hey, world, do you have thoughts about migrations? Do you have thoughts about buying software? Oh, boy. Yes, the world has thoughts.

And I'm very grateful for all of them. Also very angry about getting them because a few weeks later, over Thanksgiving, I realized, oh, everything that I have written and decided to do for this section of the book is wrong and useless and needs to be thrown away because the text and subtext of what people were--

Do observability teams need help? Yeah, sure, of course, everybody needs help. But is that where the system is breaking down, falling apart? No, it is not. The subtext of the stories people were telling me was that where it's all falling apart is in the technical decision maker layers.

Between CTOs and VPs and directors and distinguished engineers and principal engineer and observability engineering teams, there's no shared idea of what observability is, what problem they're trying to solve, whether it's like, operational outcomes, reliability, whether it's developing faster.

So a lot of execs are like, ah that's so 2020. Is that even a problem we still have? Anyway, so I threw out the entire section and Claude and I stayed up all night hammering out a new outline with chapters that basically address the technical decision maker hierarchy from the top down. Starts as, like, open letters, CTO. What does a CTO need to know about observability?

You need to know that all of the things that you have on your roadmap for the next three, five years are very likely backed up behind your slow, shitty observability.

And what do VPs need to know? They need to know about how to build organizational. And Jess, you're going to love some of these chapters because the world is so fucking complicated that the only way I could really think of to break this down for people is by through the systems thinking lens, right? Talking about the feedback loops, the accelerating feedback loops, the balancing feedback loops, where observability fits into this.

Jess: And this is a completely different kind of feedback loop than the one we're used to, which is, "Is your software working?"

Charity: It is and it isn't. Yes. I won't spoil it. Anyway I'm trying to wrap up all of my chapters by the end of the year. It should be in people's hands by middle of 2026. But I am actually-- This child I will keep.

Jess: So there's more of you in the second edition, like your own convictions.

Charity: Yeah, Yeah, I think that's true. Yeah. I'm looking forward to it getting out there because I think part of me really feels like I have abdicated a pretty significant part of my role over the past couple years by really stepping away from social media.

So I'm also ramping back up with I've started a Substack because it's not enough to say something once. Right? It's not enough to write in a book. It's not enough to say it at once. You really do have to be out there every day kind of mixing it up, listening to people, trying language, seeing what lands, seeing what people respond to, hearing what is meaningful. And I'm excited to get back to that, too.

Jess: Yay.

Charity: Yeah.

Jess: Oh, well, I bet everybody's gonna be excited for that. And also, you're on this podcast again.

Charity: Awesome.

Ken: So, meanwhile, while you've been writing this book, we've been doing a lot of stuff here and thinking about a lot of things. And one of the things you brought up, and I'm just going to pull out here and try to talk about, is your view on metrics has slightly shifted.

Can we discuss that? Because I have a bunch of stickers. If I go out of the way, there's a whole bunch of stickers here.

Jess: Look at all those stickers.

Ken: I pulled the Metrics ones off because a couple of them are slightly newer in terms of the perspective we have on metrics, I think has grown over time.

Charity: Yeah.

Ken: Let's talk about how you view metrics in 2025, end of 2025.

Charity: So first of all, the term "metrics" is thrown around a lot. And I wouldn't be clear, there's two different ways that people can use this term and hear this term. There is the generic term metric that just means computer numbers, data about stuff. Right? Like how are the product metrics, right? And I think of that as small-M metrics.

And then there is the metric data type which is a number stored in time series, usually with some tags appended to it. And those metrics have been like the workhorse of software telemetry for 30 years.

Jess: So these are time-series metrics like CPU usage and memory and file system availability.

Charity: Exactly.

Jess: But we also see a lot of "how many requests" and "how many errors."

Charity: Yeah, they're cheap, they're fast. If you're running third party software, it comes with a bunch of them already built in. And the thing is, I came out hard against metrics for many years just talking about how much they suck. They're terrible, they're the worst, they're outdated, they're archaic.

All those things may or may not be true. They're kind of true, but they're never going away. Right? They're here. And the thing is, it's one of those things where is it the right tool for the job or not? And here I think we can usefully separate the field of observability kind of in two. There is the part that's traditionally associated with SRE ops, which is the health of the system.

Jess: Define "system" there.

Charity: CPU, memory statistics, is my disk filling up. Infrastructure.

I define infrastructure as the code that you have to run in order to run the code that you want to run.

Jess: Nice.

Charity: It's someone else's code. You don't want to uncompile it or add more instrumentation. You want to pretend it's a black box. It just works. You're not touching that. It's someone else's code, but you need to understand its health.

So that's the infrastructure. And metrics are honestly the right tool for the job. Metrics and logs, the whole three pillars model, was built for that world where you can't really control what the log output is, you just have to deal with it. Built in metrics, aggregated around the health of the system components, the Kubernetes, you know, each Kubepod, whatever.

Jess: And you don't want to debug it, you don't want to understand it deeply.

Charity: No. You want it to just fucking work.

Jess: You just need to babysit it. So this is like it's someone else's kid and you just need to make sure they stay alive until their mother gets home. As opposed to your baby.

Charity: Yes. Then there's your child, who you need to know intimately. You probably want to know them intimately, their wants, their needs.

Jess: You made them for a reason.

Charity: You made them for a reason. Right.

Jess: This is your custom software.

Charity: So there's the infrastructure: metrics, logs are the right tools, and then there's your crown jewels, like the code that makes your company money. Right? Where getting into the nitty gritty, understanding not just what's the collective health of it for whatever definition of that, but like what is the experience of every user as they're interacting with your product? What is your product quality as experienced by each user? Aggregates can cover over a multitude of sins. Right?

Jess: And metrics aren't evil there, but metrics are never sufficient. And if you think they're sufficient, that's--

Charity: But they can't answer, they just can't deal with that because they can't handle high cardinality. Right? Metrics are an aggregation tool and you want disaggregated data, you want to be able to slice and if you ship your diff and it's 99.98% healthy, that doesn't mean that your biggest user doesn't have a shopping cart that's timing out every time they try to submit. Right?

They're just kind of two different domains. There are very few use cases for metrics in the sort of product user analytics. There are a couple . Monitoring queues is one of them, but they're pretty rare.

Jess: Yeah, there are some things where aggregates are actually more useful than a massive amount of tiny details.

Charity: Yes. But the workhorse data type of your software is a structured data log that can handle high cardinality.

Jess: A structured log or event.

Charity: A structured log.

Ken: Trace events.

Jess: Or trace span.

Charity: Yeah, yeah. You want structured data because you want to preserve that context and you want to be able to slice and dice and zoom in and zoom out and just like treat it like a business analytics tool, right?

Jess: Yes, yes. Much more the direction of BI.

Charity: Yes.

Jess: And it makes sense that we used to consider application metrics time series for things like requests and failures as sufficient. Because it used to be the software was run by ops, w ho didn't want to understand the internals, who just wanted to make sure it would still work.

So now that we're not sending our kids off to boarding school, meaning ops--

Ken: Boarding school is ops. I like this. Keep going. I want to hear where this ends up.

Jess: Right, right, now that as developers, we're homeschooling our children. In groups. In groups. We have teams.

Charity: Yeah, yeah, yeah.

Jess: Right? We want to know what they're doing and how they're functioning and what hurts them and what they're allergic to.

Charity: Very important. What does make them crash dump in the middle of the day?

Jess: Exactly.

Charity: So--

In writing this letter to the CTO in this part of my book, I realized that the entire DevOps movement has really been trying to create one feedback loop that includes both devs and production.

Jess: Yes.

Charity: And to this day, it has mostly failed. It is 1% of teams, 1% of developers that can do that. And for a long time, I'm like, ah, put your devs on call, do this, do that. I have come to realize it's not actually reasonable to ask most developers to instrument their code and look at it after every-- Because if you're using a three pillars tool--

Like cognitive load for developers is the scarcest resource in every R&D org. And you're a developer and you're like, okay, I have a niblet of data. I've got a shopping cart ID. I know it's important. I want to put it in my telemetry. All right, Is it a metric? A log? A trace? A profile? An exception? An error? Is it 1 or 2 or 3? If it's a metric, is it a counter?

Jess: Oh, those are so hard.

Charity: It's so hard. And then what's the cardinality of the data?

Jess: What's it going to cost for me to put that on the log, the trace, the metric?

Charity: And then you deploy it and then, okay, where in all these dashboards is my data? Do I need to make a new dashboard? Do I need to craft a manually bridge that derives this metric from that? It's actually too much. It's too much. It's too much even for like--

The reason ops teams can use those tools is because when you're managing third party software, that's your whole job. That doesn't change. Telemetry doesn't change because you can't change it. It's too much to expect your average developer, like it could take all day to do that and they've got a full time job, right?

So here's where I am realizing that AI, like the DORA Report came out in 2025 and it said this over and over again. This is not a tools problem, this is a systems problem. And if you look at like AI has made it more important than ever to have these fast feedback loops, right? It has shortened the definition of what fast is. And it has also technologically made it, for the first time, you can see how it can be possible for developers to do this. Jess, that demo you made of Honeycomb, right after re:Invent.

Jess: The MCP Canvas one.

Charity: Yeah. Just you're writing code and you don't have to do anything special to instrument. It does it for you. And you don't have to go look for the dashboards because the detail you need comes to you in your developer environment, your IDE, in Slack. Wherever it is you're doing your work, you just do your work.

The telemetry part, that feedback loop is in the tool. This has never been possible before, right? And I really think that by the time the book comes out six, seven months from now, I expect there to be half a dozen tools that can do this. But this is so fucking exciting. I gave a keynote at LeadDev Berlin last month and I was like 25 minutes long, right? And I'm talking about the feedback loops and all this shit.

Afterwards I was swarmed with people who had questions and over 50%-- These are all senior leaders, they're all directors and above, principal engineers. Over 50% of their questions were some variation on, "This sounds great. How do I get my developers to go look at it?"

And that's when it clicked for me. I'm like, oh, all these people want to do a good job. They're all good developers. And if they're not looking at their telemetry, it's probably because it's not reasonable to ask them to look at it. Because it's so fucking hard.

Jess: Because it's far away. And because you have to understand the data.

Charity: You have to do the job of instrumenting and then go find it in the mess. And there's how many different tools? There's your metrics tool and your logging tool and your tracing tool and your profiling tool and your exceptions tool and your error tool. And if you just added some new-- You have to make a new deck and it's hard, right?

You have to go to it. It's not coming to you. We can close this feedback loop.

And the thing is this is so powerful because when you-- Calm down, Charity. It's okay.

I also have been really investing in my stick figure art recently because I really think it helps people.

Jess: Where did that come from?

Charity: Because think about it, there are two kinds of feedback. So the feedback loop that developers live in every day is build, test, merge, build, test, merge. Right? That's the basic cadence. Right? I'm not a developer, but is that right?

Jess: I think yeah, yeah, that's pretty reasonable. Oh wait, you forgot the PR review which makes it build, test, wait, wait, wait, wait, wait, wait, wait, wait, merge.

Charity: Okay. So the feedback loop that hits production is the operational feedback loop. Right? It's the one that gets kicked off by a page.

Someone's pager goes off or some customer complains really loudly. And that's what sends someone to go investigate, understand and fix. Those operational feedback loops are necessary. It's an important stopgap. It helps keep our system stable. But the ideal feedback loop would be one that looks like: build, deploy, learn, build, deploy, learn.

Jess: Fuck around, deploy, find out.

Charity: Fuck around, deploy and find out.

Jess: The trick is to keep the "fuck around" small.

Charity: This is true. The companies that are world class of this, Intercom, for example. They ship, it takes them what, five minutes from the moment you merge your diff till it comes out fully deployed.

Jess: That's really fast. We don't achieve that.

Charity: I know, it's a Ruby on Rails app too. They ship 300, 400 times a day.

Jess: Nice.

Charity:

Every time you ship, you learn something. You don't actually know what your code does until it's in production. That is an organization that is not just a little bit better than a company that's shipping once a day or once a week, it is operating on an entirely different plane.

Jess: Yeah. And if you have that feedback loop and you can learn something after you ship because something is helping you make the observability easy enough, often it's not about your code. Your code is fine. It's about your users, it's about your customers. It's about how they interact with that.

Charity: Yes. What are people doing with this change I just made?

Jess: Exactly.

Charity: Is it what I thought they'd do? Is it different? What is the impact?

Jess: What is my impact on the business and why should I get a raise?

Charity: Yes. And I find this so intrinsically motivating and exciting, when you're an engineer, being able to see the impact you're having on the business.

Ken: It's huge.

Jess: Yeah. And on the people that our software serves.

Ken: Yeah. So agent/AI driven observability-infused development loops basically.

Jess: Observability-infused. I like that.

Ken: Meaning that basically when you're working with some very useful, in the future, good development tool that does this, where people that have already figured it out, you're able to basically have it guide you into instrumenting the right way and then have it as it's going through and doing its changes and pushing things up and making sure things deploy as it's then turning on the feature flag and watching what's going on. It instrumented what it wanted to know to do the experiment, to then prove it out for you without you doing a whole lot of work on your own. That's basically it.

Charity: Yeah. It's checking in with you being like, does this look right? Does this look right? It knows the diffs that you've submitted over the past. And so when something starts to go wonky with an endpoint you were recently working on, it pings you in Slack and it's like, "hey, Ken. What about this? It looks like that line of code that you submitted two weeks ago, it might be contributing to this over here."

Ken: Right.

Charity: And maybe, you know, we always start with the errors and outages, but also, you could ask it like, hey, if this starts to get real traction, could you let me know and where the users are coming from and look for patterns and how they heard about me or why they're using this or if they're using it on mobile or you know, just like interacting with--

How long have we been saying, Jess, having a constant conversation with your code in production. Right?

Jess: Right. The running code, that's the important part, not the code that you're reading, that's easy enough. The running code that's interacting with users.

Charity: Yeah. And it's just been so hard in the past. It has taken such a high level of familiarity with the tools.

Jess: Yeah, you have to have a passion for it. You have to want to look at this stuff for its own sake because no one's asking you to. It's not your main job.

Charity: And you often have to have had some ops experience yourself so you can translate between like, what does it mean if the CPU spikes and all this stuff. But like most devs don't and nor should they have to.

Ken: And point to my little discussion trying to define this again, I was coming from myopic. I'm an engineer mostly and not a DevOps person mostly. So there's the other missing piece which is having that 360 view of everything and thinking about it from the developer's perspective, the Ops perspective, the code quality, the user's perspective altogether, having AI help you tie that together sounds exceedingly powerful.

Charity: Yeah, I'm excited. We keep getting off the topic but like we also wanted to talk more about metrics and I think that there's-- So like in some ways it's easy to fall into this, "well, metrics are a bridge to the past and AI is about the future."

Jess: Yeah, but the AI knows a lot about the fast and it draws from that. And the most useful libraries these days are the ones that the AI is already familiar with.

Charity: Yeah.

There is a big deep well of shallow metrics data everywhere. AI thrives on context and structure, but it's also like not half bad at just finding a decent guess from a pool of shallow data.

Jess: Okay, so we talked about the last six months, but this episode will be published in 2026 and that's kind of a special year because it's our 10-year anniversary. So tell us what happened 10 years ago.

Charity: Oh my God.

Jess: I know, that's like five generations.

Charity: I cannot even believe. Honeycomb was founded January 1, 2016.

Jess: Wow.

Charity: Christine and I were two bright-eyed, bushy tailed little engineers who were just like, we miss this tool. You know, I, as a serial dropout, had never really considered starting a company. When I was leaving Facebook, I had a pedigree for the first time ever, so investors were kind of like, hey, what a couple million dollars?

And I'm just like, "On behalf of all women and queers and dropouts everywhere." Haha. But we 100% thought we would fail. I just thought it'd be a relaxing way to spend a year.

Jess: What? Okay, you are delusional. Wow.

Charity: Writing code in the corner, write some golang code. Just like detox, relax. And then we'd fail and then I'd just get back to work somewhere. I mean, what's the worst could happen?

So yeah, Honeycomb, famously, we started out by writing our own storage engine from scratch, after a career so spent telling people, "Never write a database. Never ever, ever write a database. If you think you should write a database, let me be the first to tell you you should not write a database. No one should ever write a database."

So we started by writing a database. But I would actually say that the hardest part of that first year was trying to figure out how to talk about what we were doing, because there wasn't any observability.

Right? There was just monitoring and logging and--

Jess: Right. The word wasn't part of the software industry yet.

Charity: Right. And you know, it was July, I think of 2016, when I was at work on Market Street, at WeWork, late and I'm looking fucking around on Wikipedia and I'm like, "observability, the ability to understand the internal state of any system just by observing-- hey, Christine, this looks kind of cool."

And I don't know, man. No one should ever start a company. You should not write a database. You should not start a company. This is why ops people never start companies because we're such pessimists. We're just like, well, it's definitely going to fail.

Ken: Things fail.

Charity: Nobody wants to give money to somebody who's just like, I know all the ways this is going to fail and it's probably not going to work out. Well, that's where we start from, right? But man, I feel really grateful and just sort of like dazzled or weirded out to be here. Like, I was never one of those kids, like, "I want to start a company" because I've always kind of low key, hated those people.

I think they're too good to work for anyone else. And I just like, I despise you. You know, but being able to run a company and work with so many of these people that I adore and meet some people that I adore and have time to just think and write about the software industry and--

Jess: You have that now. You didn't nine years ago.

Charity: No, it was miserable. Seriously. The first five or six years I was a mess. But then I got my ADHD diagnosis and I got my medication and my sleep cycle and things are really good now. It was a rough road though.

Ken: So in part of that, in your early years, you're defining a whole language for people. You're helping with groups of people that are like minded that you're bringing along the way. When did you really start to feel like that understanding of what observability is started to gel enough that you could talk about it without defining it for 40 minutes before showing them anything?

Charity: So I look back on the years from 2017 to 2020 as the years that I played language cop. I was the person who would wade into every Twitter thread and go, "well, actually the way you're using the word 'observability' is wrong. Let me tell you what observability means."

I was so annoying. I can't even believe. I am so annoyed by that. I hate those people, you know, but I don't know, I think around 2020 or 2021 or so was when the Gartner category got created and I stopped fighting that fight because I realized nobody likes that person. It kind of doesn't matter. It's time to focus on outcomes.

And that might have been the wrong decision. I don't know. Now the term has come to mean anything that has anything to do with software and telemetry, which I do not believe. I don't think it serves users because there are a couple of real distinct generations of tooling.

You know, there's the "three pillars" generation built for ops, SRE and there is the sort of APM 2.0 generation, I'll say, which is like built for understanding the quality of your product, its impact on the users. And they're all jammed together.

Like this is the merged together bastard stepchild of three or four different categories. And now there's data observability and LLM observability, all these other categories--

You know, the second largest bill people have after cloud is observability.

Jess: Wow.

Ken: I believe it.

Charity: People are spending between 15% and 25% of their cloud bill for observability, but that ranges from 10% to 50%. There's even this one dude on Reddit, I got sent a link to this guy who's like, "I'm spending more on my Datadog bill than I am on my cloud bill. Please send help."

So there's variance there. And this is driven by skyrocketing complexity. Right? It used to be that all that complexity was bound up in a little application.

Jess: Yeah. And then time-series metrics were enough.

Charity: And then time series metrics were plenty. Yeah. If a developer actually really had to get into the guts, they'd take it offline, take a debugger, they'd step through it.

Ken: When you didn't have a thousand different services in your application on different cloud containers and serverless and everything else, it was possible to step through code.

Jess: Can't debug them all at once.

Charity: Back in the day when most engineers could sketch their architecture on the whiteboard.

Ken: Yeah. Right.

Charity: Now who can?

Ken: That ship has sailed.

Jess: But you don't get to control a word. You bring it in, you brought it in, you made it part of the conversation and then it goes off on its merry way. And Kent Beck says there's a trade off between a word that people like. People liked the word observability. It's a great word. And so they start using it to mean whatever they're doing right now.

Charity: Yeah.

Jess: He uses the example of agile, for instance. That was a word everybody liked. So a zillion things have been defined as agile, whereas extreme programming, he also coined that one. And he coined this one knowing that people wouldn't like it. But then it still means what he wanted it to mean because nobody co-opted it.

Charity: Oh, that is great. Oh, Kent, well done. Well said.

Jess: Yeah. Oh, earlier you said something about everything was really hard and then you got your ADHD diagnosis and now you get to think about things.

Charity: I got my ADHD diagnosis and Christine and I swapped jobs.

Jess: That's important.

Charity: Very important. I was the world's worst CEO.

Jess: And Christine is a fantastic CEO.

Charity: She was born, bred, raised, like she is amazing at this. Yeah.

Jess: Yeah. But I was wondering, what does it mean for observability to get its ADHD diagnosis? Does it mean that that cognitive load is like reduced for you to the point where, "look, here's the information you need now you can think about it."

Ken: It's pretty good.

Charity: I love that, Jess.

Jess: I just keep having to bring it back to the kids.

Charity: Yeah.

Ken: It starts getting A's in observability, everything starts getting better.

Jess: Starts doing its homework.

Charity: Starts turning it in on time. Yeah. I will say this though. When I sat down to start writing the second edition of the book, I was like, "yeah, this needs to be done. I'm on board. Good idea."

But I was also kind of like, I've been writing about this topic for nine plus years. Is there anything left for me to learn about it? I mean, yes. AI, etc. And I have learned so much.

I am such a big proponent of writing your way into complex topics. You learn through writing.

I love William Zinsser. He wrote this book called On Writing Well. And as part of it, he talks about writing your way into complex topics like nuclear engineering and stuff. And then he wrote a whole book on learning through writing. He says, "writing is thinking on paper."

Jess: Oh, nice.

Charity: You put your thoughts in front of it. You look at it, you see all the holes. It's like if you can understand something, you can write about it. And my understanding of observability when I wrote the first book was very technical, right? It's high cardinality, it's fast, you know, it's this and that and all this shit.

And now I think about it so much more through the lens of: Observability is the sense making apparatus of complex systems. Observability isn't a feedback loop. It's what makes the loop feedback.

Jess: It closes the loop. Right? It's a component.

Charity: Exactly. I use the phrase "observability is a feedback loop of feedback loops" sometimes. I think it's kind of clever. But it's actually the sense-making thing that makes a loop a loop.

Jess: Yes, yes. Although I will argue that observability, like telemetry and being able to query it in Honeycomb is, does not itself sense make.

Charity: No. You, the person, are sense making.

Jess: Or the agent.

Charity: Well, I would argue that the loops are-- I mean sure, anything connecting cause and effect is technically a loop. But what matters, I think is the person who is synthesizing what happened and what you learn from it so that it changes the next step that you take based on that information.

Jess: Yes. And the agent can help with that. Right? The agent can actually look at data in Honeycomb and make some sense out of it. Now whether what comes out of that is relevant in the organization, I mean it might be incorrect, it might just be unimportant. It's only the person that can take that sense making and broaden it to: "This makes sense in the whole socio-technical system, this makes sense to my team, to my business," because it's only the person that's part of that. It's only the person who's participating in the wider system.

Charity: People create meaning.

Jess: And together, we have to share that meaning with each other to be part of something like a company.

Charity: Socio-technical systems, man. That's not just the software and it's not just the people. Socio-technical systems are systems where you can't actually tell what's going on by looking at just one of them. Right? All of the work is social and technical. It's technical and social. You can't separate them.

Jess: Right. I like to think of observability as teaching your software to talk to you, s o it can participate in the conversation, which is like when your baby learns to sign.

Charity: I knew it, I knew it.

Jess: Right? So I think Honeycomb takes software from "wah, something's wrong" to "hey, I'm hungry. Can I have some peas?" But then when you add AI in, it becomes a teenager and it can argue with you.

Charity: Haha. This is so much fun. Let's do this again before two years.

Ken: Please.

Jess: Yeah, because you're getting more involved in social, so you'll be out here arguing more. Right?

Charity: Me and my stick art, we're going to be out there on Substack, shaking it up.

Jess: Great. Charity, what is your Substack?

Charity: Charitydotwtf.substack.com where the "dot" is spelled out.

Jess: Oh, okay. Charitydotwtf.substack.com.

Charity: Right.

Jess: Okay, wonderful. Thank you so much.

Charity: Thanks for having me.