JUN 11, 2025

36 MIN

Ep. #83, Observability Isn't Just SRE on Steroids with Dan Ravenstone

GuestsDaniel Ravenstone

light mode

about the episode

In episode 83 of o11ycast, the Honeycomb team chats with Dan Ravenstone, the o11yneer. Dan unpacks the crucial, often underappreciated, role of the observability engineer. He discusses how this position champions the user, bridging the gap between technical performance and real-world customer experience. Learn about the challenges of mobile observability, the importance of clear terminology, and how building alliances across an organization drives successful observability practices.

about the guests

Daniel Ravenstone is a Senior SRE at Windscribe with over 25 years of experience in observability and monitoring. He is a passionate advocate for OpenTelemetry, helping teams integrate best practices in observability to improve system performance and reliability.

show notes

Dan’s Blog
OpenTelemetry
Monitorama Conference
Jaeger (Open Source Tracing Tool)
Prometheus (Monitoring System)
OpenCensus (merged into OpenTelemetry)
OpenTracing (merged into OpenTelemetry)

about the episode

about the guests

show notes

transcript

Daniel Ravenstone: To me, an observability engineer is one that is not just doing DevOps, it's not doing SRE type work.

I think a lot of people think that's what they do, but I feel like it goes beyond that. Because in my conversations over the years, they've not been just with the engineers, not just with the developers, it's been with the entire group of people.

So we're talking about product, product owners, those who are like looking at the features, customer success.

At the end of the day as an observability engineer, your key focus is understanding the user experience, and you need to talk to everybody when you do that. You can't just have relationships with one particular team and focus on DevOps because it's more than that.

It's more than just development work and understanding how to instrument an application, because we can, for the most part now just set auto-instrumentation and hope, you know, that should cover everything.

But if you don't actually understand what that user journey looks like, and if you maybe are missing a certain instrumentation on certain parts of your code, then you're missing that story.

You're missing that kind of experience that they're looking to get. So you have to have conversations with everybody and make sure they're all on the board with that.

And so as an observability engineer, you're almost like a champion for the user, really. You're trying to get their experience to the forefront.

Jessica "Jess" Kerr: Like your customers?

Daniel: Yeah, the customers, yeah, definitely. So I mean, like CI/CD plays into that, but it's not the full story.

You know, understanding your tools is not the full story. Understanding the platform is not the full story.

You have to kind of talk to everybody and have a feeling, how do all these different parts of things play into what the user experience looks like?

And I think an observability engineer has got to know a lot of that, and if you want to be sort of at a senior level, and then you got to actually understand all the different components, which is hard because you're like more of a jack of all trades at that point and not really a specialist in say Golang or Java or maybe some platform stuff like AWS or Google.

Jess: It's almost like observability is a property of complex systems.

Daniel: Oh, yeah.

Austin Parker: You know, one thing that I think is interesting. You kind of touched on it, is the idea of observability engineers as being able to kind of flex outside of the actual engineering org, right?

Because like I think product people are such an interesting example of this, right, like PMs and designers whose jobs depend on like making that connection, right, of being able to synthesize user feedback and getting it out to the right place in the org and delivering things that your users like to do.

Like, that's the stuff that more or less they get promoted or fired on. And I think an important part of like observability engineering is being able to sit between those people and sort of the performance side of the house and making those connections explicit, right?

Front end I think is a little bit less-- Like, this is something that maybe the front end people kind of grok on a different level because they're so used to being in a loop like that.

Jess: The importance of performance?

Austin: The importance of performance, but more so like having that like really tight loop with sort of the product side of the house, right? Like, I think a front end engineer is going to be talking to-

Ken Rimple: The UX person?

Austin: UX and PMs and all that, and but that's not a skill that we, like, we don't necessarily think of like the backend. Like, the DBAs usually aren't talking to like UX leads, right?

Daniel: No, yeah.

Austin: But I'm wondering, yeah, so with that in mind, like is that almost a underappreciated role of like observability engineering as being that person that kind of connects those dots for people, especially like leadership?

Daniel: I would think yes. I mean, so one of the things that when I first started in to sort of the whole modern observability scene, it was like 25 years ago I guess now.

It's been a while. Man, I feel old.

But I got into it because I was asked to set up a NOC, and I started using the open source tools, the old things, I think I've talked about this before, but that's where I got a passion, because one of the things I love was that it's an underdog kind of role.

Like, nobody appreciates the NOC analysts or the monitoring experts or anything like that because it costs money, and that's the first person you kind of say, "Well, why don't we have monitoring on that?"

I mean, that was always that kind of like blame thing, which we don't do anymore fortunately.

Jess: NOC here is Networks Operations Center?

Daniel: Network Operations Center, yeah, which is a kind more of a telecom background, but I mean I have worked in both so I got a little intro into it.

But yeah, you know, it's like AT&T and like others would have like these massive sort of rooms with a hundred monitors on the wall, and they'd be pointing at things and say, "Oh, well, we can see where it's going, things are going on here."

But, yeah, that was an underappreciated role for the longest time, and I think observability is still in that sort of place because I think leadership forgets that there is some value in having people who know their way around this to say, "Hey, listen, if we do it this certain way, we'll get better feedback as to what the user's experiencing because we've got these things done in a certain way now. We're following these best practice and these standards."

And now we can say, "Hey look, in about half an hour or four hours or whatever because we've got our SSLs all figured out, we've got all that, we know those triggers where it starts to be bad for a user. We can now alert on that and get action on it before it becomes a problem."

But it takes time. It's not one of those, like that's the other thing too. I think people because they forget how long it does to sort of take to sort of do that and build that cultural mindset into everybody.

Not just, again, not just DevOps and SREs. I think it goes beyond all of that.

And I think one of the greatest successes I've had in my roles is when I start reaching out to those other teams, not just engineering. So with customer support, with product managers, because they need to know about this, and they are the ones who actually have the pulse on the users more than others because they're getting that feedback loop.

And so, you know, building these things out helps I think, because you're not going to get everything from a survey you send out quarterly, unfortunately, you know, which I think some people do.

Ken: So you started off in that network supporting, wiring, configuring performance of the NOC and things like that.

What was your journey to get to the point where you started using tools like observability and Honeycomb?

Jess: Also, who are you?

Daniel: I am Dan, the o11yneer, Ravenstone. I've been in this for what, 25 years, I guess, yeah? Something like that.

I have been trying to coin the term o11yneer for a while, observability engineer. I think it started to take though, oh.

Jess: O11yneer, like O-1-1-Y?

Daniel: Yeah.

Jess: Nice.

Daniel: Well, it plays into both sides of like my skateboarding background and, you know, my love of observability and engineering.

That was a long road because I think we've seen a lot of changes throughout the years when it came to monitoring. The tools have never kept pace with what was out there for the longest time.

And even one of the ones that did were sometimes overwhelmed or, you know, we had little ones every once in a while, but, I mean, a lot of people know Nagios or Nagios, however you want to pronounce it.

I know that some companies out there who still use it, you know, which is surprising. But, you know, things like Prometheus, which has start to, you know, when Kubernetes got released we saw sort of like that upswing of some tooling that was actually based around an offering and would provide a little bit more information.

And I think what really kind of triggered my love for observability was Monitorama, and it was Charity Majors who did a talk there, and that's when I first heard her mention that word, and she even suggested that the following conference might be called Observability Con or something to like that effect I think.

And I thought that was amazing.

Jess: But Monitorama is just a great name.

Daniel: Well, Monitorama is a fantastic name. One of my favorite conferences of all time.

Jess: Yes, fantastic.

Daniel: Yeah, but that is where I kind of like, I've been part of like this whole journey for years.

I've been watching it grow as it matured, and I think finally this is now we finally hit a level where we're starting to keep pace with some of the things that are out there.

I think the OpenTelemetry community has been huge in that regard, right? They've been really kind of, I mean, they're still a bit behind in some areas, but.

Jess: So you cared a lot about monitoring, and then you learned about observability.

Did that appeal to you because it's broader than just the network operations, because it widens its impact on the business?

Daniel: The reason why I got into monitoring in the first place is I got tired of customers calling up tech support saying, "Hey, we can't do this. Your service is down," or something to that effect.

And that was what kind of clicked in for me. Like, if the users see it's down before we do, then we have a problem. And what would be even better is if we can actually see it before it becomes a problem.

And so monitoring has always been a point in time that this is happening. And so we always got a lot of false positives.

We got a lot of alert fatigue over the years because people would just, you know, throw things in and go, "Well, it's high CPU, we'll get an alert on it," but is the CPU being high, is that really the problem? What is the root cause?

And I think what observability did was kind of like try to peel back the layers of the onion and expose the underbelly of what was really going on. And the whole point was to sort of say we know what's going on without actually have to go and poke in and add more code or do other things. We kind of have an idea what's going on right now at any given point and what the trend looks like.

And that's the difference that we have with observability is like it almost gave us less reactive and more proactive.

I think there's still work to be done in that space, but it at least, you know, when you start putting some of the things we have in place, it gives us a lot more tools and a lot more information to sort of get that understanding.

That's why I think the three pillars of observability has always been a misnomer, because it's not really, like those support observability, but just because you have logs doesn't mean you have observability.

Just because you have metrics doesn't mean you have observability. Just because you put traces in-

Jess: What does mean you have observability?

Daniel: When you are able to understand what those memes mean in that context, being able to have the context that goes around it.

So just because you have logs doesn't mean you actually have observability, but if you know what those logs mean and they're giving you valuable information, it provides you some context, then you're starting to get into the point where, okay, I know right now the system's not behaving properly because we're seeing maybe some errors in our logging, we see some errors in our traces.

Austin: Something like, a word I've been tossing out is this idea of like intent, telemetry intent, right?

And you talked about it earlier, you know, it's not just about throwing auto-instrumentation at the problem and creating some dashboards and alerts and calling it a day, right?

Like, telemetry like anything else, needs design. It needs intent. And it's not just, I mean, hell, I see Honeycomb customers are like, "Well, we had a tracing and so we have observability now." It's like, eh, but do you, really?

Like, we're really quick in this industry to like jump to one thing as a panacea for all of our ills, and it's like, "Oh, all that stuff sucked. We need to put the next thing on," right?

Like, "Oh, we tried log analysis, and no, we can't find anything on logs, screw it. Time to do metrics. We need metrics for everything because metrics are cheap, and we can keep forever and it's great."

And then it's like, "Oh, we can't actually find anything useful because we have to collapse the context too much. We lose all of these details."

"Tracing, we need tracing," so we're going to do tracing. And it's like, "Oh God, it's so expensive, and there's so much data. Now we need to sample very heavily."

And now you just like blindly sample and you've lost like a ton of resolution and then it's like, "Oh God, we can't actually."

"Oh, yes, like this part's good but now we can't find anything. We need profiling, let's get profiling. Let's throw some eBPF on this," you know?

And you can't just keep stacking signals on there blindly and expect to get anything, right, like-

Jess: There's something about at some point it's not the tool, maybe it's you.

Austin: Sure. But, I mean, a lot of it is, I think this goes back to the original point or this goes back to a certain point of like part of the job I think of being an observability engineer is being the person that provides that intent.

You're a designer for not just how engineering team X, Y, Z like write code or giving people checklists for production.

It's being able to go to the business and saying, "Hey, what do we care about?"

And figuring out the infinite complexity of those answers because we care about a lot of things.

But even if you narrow it down and say, "What are the three things we care about?"

You know, we can care about three things, and we can really track and understand those three things.

That's a lot of depth, you know, to really drill down and understand what is it that needs to be monitored, what needs to be measured, how do we collect that data, how do we store that data, how long, how do we make that data actionable for developers, for PMs, for leadership?

Like, the role I think of the observability engineer is to provide that intent, that design, that direction to the org.

Daniel: I would agree. I like the word intent because it is what you're trying to do.

You're trying to shape and mold and direct these folks who don't have a lot of experience within what observability means.

And let's be honest, I mean, that's not when you go to university, my son just finished university, and, I mean, that's the last thing they teach 'em.

They teach 'em how to, you know, write code and to do best practice and all that.

But they don't ask like, you know, how does your code actually do in a production environment? How do you understand what's going on, and like what are the tools to use to sort of what bubble that up to the surface so you can see what's going on? And if it actually is working the way you expected it to.

I mean, every time something goes out in the wild, it's going to act differently. And then users, there was a phrase or a quote from one person, I cannot remember who who said this, but they were like, "You can never predict what a user's going to do with your code in the wild."

So they'll push it to boundaries that are unknown. In fact, and I'll just use my dad for an example for a second. He's in his eighties and then if he hears this, I'm sorry, Dad, but it is true and you know it.

There was one time he was having trouble with his Zoom. So my mom calls me up, I come up. So just to lay the ground work, I happen to live in the same apartment building as my parents.

He calls me up and so he has my one of my old MacBooks, but I mean it still works fine, but somehow, I don't know how he managed to do it, but he made the Zoom like his wallpaper of his desktop.

Like, it was in the background, I couldn't click on anything. I couldn't alt + tab to it, I couldn't do anything. I was like, "How did you do this?"

Like, there was no way I could actually get to the Zoom application, and everything's still going.

Like, you know, whatever they were on, it was still happening, but they couldn't control the thing. Like, they couldn't get to the volume, they couldn't get to the mic, couldn't get to the video.

How did you make this happen? And so basically I just restarted the thing and everything went back to normal. But that is just like a case of my father's abilities.

But I swear that he could make a career where just send him a brand new product, give it to him for five minutes, and then he'll set it broken. You have to figure out what he just did.

And he's not the only person out there that does that.

Jess: Not the only person with that particular talent, yes.

Daniel: No, no, yeah, and it is a talent, and I applaud him for that.

Jess: So observability helps you figure out what the heck that that talented user did.

Daniel: Yeah, ideally it would because if you, you know, otherwise like, I mean, like that shouldn't happen. In that use case, that should not have happened.

My only recourse really was to sort of reboot the thing. But, I mean, really, if you have proper observability in place, you were able to actually understand what happened.

Maybe you still have to reboot, but at least you know how to prevent it from the happening the next time, right? Or know how they got there.

Jess: Well, you had the opportunity to learn from the disaster.

Daniel: Exactly.

Jess: Instead of only fixing it and walking away and shrugging.

Daniel: Which is what I ended up doing. But learning from it what the key thing is because we have to learn from these things.

This is a part of the thing with running like production environments, you're not always going to have all the things in your hand and understand what just happened.

But if you're able to actually learn from it and say, "Oh, okay, this is the little piece of the puzzle we're missing," and that turned out to be a bigger thing than we thought it was. We better add more weight or more whatever around that to understand that.

And it happens all the time, but we got to stop just this checkbox, slap it in, and walk away kind of attitude. It just doesn't work anymore. You got to spend some time in understanding it.

Jess: So when you come into a new organization as an observability engineer, how do you get started on that journey?

Daniel: Talking to people, really. I mean, as an introvert that's hard, but, you know, you know you have to.

You have to start talking, building relationships, and you got to start understanding what are their biggest problems. And how are things set up currently. How is instrumentation done? What is the sort of the philosophy behind it?

Where's the priority of it as well? In some companies they have like, you know, like where I've been, where you walk in, they have instrumentation, but it's just like really basic rudimentary.

They slapped it in three years ago into their sort of boiler play code and that's been it. And they don't really even look at it.

So when you go in and you go, okay, so you have this tooling, you have this instrumentation, what do you do when there's an incident?

How do you like troubleshoot? How do you understand when things are going a little bit sideways? And getting involved in incidents is also key.

Like you have to, you have to be part of that. I mean, sometimes it's late nights, but it gives you an idea how the company as an organization culturally does problem solving.

And how do they understand what is going on, what tools they are using, where are their gaps? You have to be involved.

It takes work, but it's important because you get so much more out of it at the end of the day when you do that and then you go, "Okay, I get where we're coming from now."

Jess: Do you find yourself able to contribute to the organization quickly when you're having all these conversations, when you're involved in the incidents?

Daniel: Yeah, totally. Sometimes, like I was at one place for a while, and there were so many problems with a RabbitMQ cluster, because it wasn't set up properly.

You know, there weren't a lot of folks who understood RabbitMQ at the time. Fortunately, because I had a background running RabbitMQ clusters, I was able to, when I came in I kind of was actually the tech lead for most of the time.

But it gave me an idea of what the tools we were using to actually look at understanding what the problems were, how we're going about doing that.

And then I was able to make notes like, so, okay, once we sort of get this fire under control, these are the things we can start improving upon. How do we actually handle the incidents?

In some cases like, you know, there was tuning things we could do that we could actually tune on our instances to get them to be a little bit more smoothly.

How we looked at the problem and what tools we're using. Where can we sort of start to open things up and get a better understanding of how that is?

And then you're getting involved with people who are actually in there working as well. So you're able shoulder to shoulder with them, you're actually understanding how they think and how they work.

So it gives you that relationship building too, right? So you get a feel of their thinking, and then, you know, okay, well, this person looks at things this way so that way you can help, like, you understand how they think about it and you can help model the approach.

Okay, if this is how observability is done, if we did this, this will add a lot more to what you're doing.

And so, you know, it's not just a matter of, because you don't force this, you can't force this on people because it's not, you know, they don't like it either.

They got enough on their plate. You don't need to be another person saying, "You need to do it this way or else."

That's not the way. You got to work together. It's collaboration, cooperation.

Jess: So it sounds like your 25 years of industry experience helps you contribute immediately while you're understanding the complex system.

Daniel: Yeah, yeah, totally.

Jess: And from there, how do you get people to become your allies in this quest for observability?

Daniel: There's been a couple ways it's helped me in the past.

One is getting the right tools in their hands so they can see immediate, so like if I make a suggestion, you know, if we did our logging this way, you get this kind of immediate feedback.

Way back in the day I was at another company, and they brought me on to address a monitoring problem. They had none. And they were moving from bare metal into AWS.

And so they needed tooling and they needed to do, and I thought, well, and one of the things they had a big gap in was sort of logging.

So I just put in a logging infrastructure, built everything out for them, and then it went right to the developers and I went, "Here, start feeding your logs through this. This is how you do it."

And they did and they were able to get immediate feedback as to what their application was doing through their logging.

And this was just before OpenTelemetry became a thing because it was like OpenCensus and OpenTracing, and they were still kind of figuring their things out.

Prometheus had just started making it over the rise too, but, you know, so it was still like a lot of work to be done there, but at least we had logging.

And they actually got immediate feedback once that happened. Last role I was at, I was talking to the customer support team.

It was fantastic because I was able to sit down with them and say, "What is your biggest problems?"

And then saying, okay, they're looking for this particular information, giving them the tools within Honeycomb, actually. We just migrated to Honeycomb.

I got them at Honeycomb as well to say, "Hey, this is what you need to see for those issues that tend to come up often. Now you can get that information right away."

So empowering them in how to use the tools and then making yourself available to have those one-on-one conversations.

And they appreciate that because you're not just throwing things at them hoping they'll understand it.

You're kind of working with them and sort of setting that tone and getting them involved as well.

And you got to pick the people that are into it. I mean, there are folks who are just going to be, they got different priorities and I get that.

You got to find your own champions when you're talking about observability culture, because you got to find them within the other teams and then work with them because then their passion will soon become a little bit infectious with the tooling and how to see things because they'll start to see that value in what they're doing and they go, "Oh, well, I can do it. I can see this. Or we could do that if we had this, so I know who to talk to and get that done."

Austin: You know, one thing I've always seen along those lines is it's really impactful like when you come into an organization and you learn like their pain points, right?

Like, there's always momentum from whatever the way we've always done things, right?

And coming in and having that fresh eyes and being able to find like, oh, these are the people that like want to make a change or, you know, how do I help empower them, right, as like kind of using that cushion of being the FNG as it were.

Like, what are some ways you found that it, you can-

Jess: Austin, what's the FNG?

Austin: Flipping new guy.

Jess: Oh okay, thanks.

Austin: Like, you have that honeymoon phase where you feel like you can take on the world.

Like, one thing I think is helpful is, how do you kind of help people that are in that boat where they're feeling frustrated, or, you know, they want to help make that change, but they just don't see the avenues for it?

Like, what are ways that they can kind of identify the FNGs and help them work together, right?

Like, is that something that you can do when you come in, and like put a sign up that says like, "Hey, let's go storm the castle!"

Jess: Join my crusade.

Austin: Yeah.

Daniel: Well, I try to. I do come in quite loud. "Look at me!" You know, I'll get involved in things.

I mean, I'll get right off the mark, sort of like, "Hey, look, this is what I'm here for. This is what I do. Come talk to me."

I found too that people won't approach you so you have to go approach them. So you got to kind of listen a little bit to what's going on, who seems like who's got some something and may mention something.

And that's the other challenge too is we're all remote for the most part. Even the hybrid models I don't think is completely, you don't get everybody involved because when you go to hybrid, it's only a couple people in the office for that time, right?

But the remote models, you kind of kind of pay attention to the engineering calls, whether it's got everybody in there or the org wide calls.

You kind of pay attention in there. Watch the chat, see who says what, who kind of flags things. I mean, there's got to be a lot of observing.

Jess: Oh my gosh.

Daniel: You need to observe, right? It's like the OODA loop from John Boyd, you know, observe, orient, decide, act, you know?

You got to observe for a while and then figure out where you're going and then you decide on who you're going to talk with first, you know, and build those things out with them.

I don't think, it's not that easy. Because, I mean, not everybody wants to talk to you about observability.

But when you get hired as an observability engineer, that means the company has some interest in doing things properly.

Maybe they don't know what exactly best practices look like. Maybe they're just thinking we're using this particular SaaS provider for monitoring and observability, and we want to cut those costs. How do we do that?

I mean, there are some out there that, you know, some platforms that cost a lot of money that people don't want to use anymore. They want to get off that. And how do they do that?

Jess: And are you able to help them cut costs, while increasing value at the same time?

Daniel: Usually, but it can be difficult because there's growing pains, right? Because, I mean, in some cases you're dealing with vendor lock-in.

And that's always a huge challenge. Because, you know, my first thing is like, you got to go to OpenTelemetry, because that really removes a lot of the overhead on the developer side because now they've got one library to work with.

And if you decide to go from some open source solution, you know, like Jaeger and move to Honeycomb, it's not like you have to go refactor the code again.

You can like, oh, well, you know, we use OpenTelemetry Collector. Now we just like redirect, boom. That's easy. And we can even have both at the same time to compare.

So getting off that vendor lock-in is the biggest challenge there. But then you say, "Well, you're going to save so much money because then you can start doing this, this, and this."

And then you can be choosy about who you work with moving forward.

Ken: What do you think of mobile and observability? I know we were talking before the podcast started a little bit about this, but what are your thoughts on observability in the mobile space?

I know there's already the web observability we're focusing very much on as well. Curious what your thoughts are.

Daniel: Mobile I find is a challenge, and this is only in my conversations with the mobile developers I have talked to.

They always feel like they're always the last to come to the party.

You know, like they were not even like invited, like, oh, they heard it through a friend and, you know, maybe they'll show up, but then they're off in the corner and nobody's talking to them.

And, you know, I feel for them because observability is not really part of their mindset at all.

I mean, it's hard enough with like, you know, backend engineers to get them involved with it, but they actually have some, like they're starting to get, you know, with S3 and DevOps concepts, they're a lot more familiar with it.

But for mobile, this is brand new ground. And so understanding like they want to have these tools but their sort of criteria I find is different as well.

So one of the challenges I found was with auto-instrumentation even is not as easy going as it was, like, for Python, it's pretty much out of the box. It'll work.

I mean, you still can go in, do customer attributes and things like that. But when you're using Swift, it's not the same.

I don't know if it was with the Android or iOS, but I remember one of our developers I was working with showcased like where the app start showed up in the trace, which was like way buried deep down well after, so from the root trace down to, it was one of the spans way buried deep.

Well, I mean, there was maybe 20 spans, but it still was down in the near the bottom, which is kind of weird because the preamble to the app start was not what you would normally see.

So there was like this weirdness that they would see, and it got them really confused. So we would make the assumption, and, again, this is based on my conversation, I'm not saying this is how all mobile apps developers are, but just from my conversations of recent, they don't have the same backgrounds others do.

So how to go about instrumenting an application in mobile I think needs a little bit more tutoring and more boiler plate examples.

And it's more ideas like, here's how you would do it and in case this doesn't work, you can try doing custom, because, you know, these are the things, and what things are important within mobile.

I for one am not, I'm not a mobile app developer. I'm not even a developer. But I don't know what would be important from the mobile perspective of what's a good performance, what is a good user experience or not.

Jess: Right, because it all comes back to, what do you need to know, right?

Daniel: Yeah, exactly.

Jess: Well, and what might you need to know because you can't anticipate everything.

Daniel: Yeah, and I think that's where they're struggling because they don't have even sort of like a roadmap to sort of follow.

Like, you know, it's kind of a lot more detailed for other libraries as well, but for yours it seems to be a little bit still new. I mean, it's been very recent that this has all come out.

And I think a lot of companies, and I don't want to get into this conversation, but a lot of companies are more focused on the AI side of things than worrying about what they're, you know, so they kind of, yeah, we'll worry about observability for mobile later, you know, which just get an AI feature out instead.

And I just hate to see our mobile folks get not invited to our parties anymore because they should be invited as well as guest of honors.

Jess: Yeah, because the mobile apps are so close to the user experience.

Daniel: Exactly. I mean, you look at how many people actually use their devices over a laptop, especially the kids these days.

Jess: Yeah, that's true.

Daniel: With their TikTok and stuff, you know?

Ken: We just put a blog post out today actually about sessions.

So now the movement in front end observability is not only the traces, which are really important, but it's the flow through the system for a user.

So, you know, you want to see what that click flow or interaction flow is and you want to plot that somehow and look at those as a series of traces and events.

And I think that's probably going to play the same role on the mobile side, just as important.

Daniel: And I think it's important too that we have a clarity around the sort of the nomenclature or like what words are used here, because I saw some confusion between spans, events, sessions and traces, and I felt like they were overlapping.

So it's very interesting to see how that plays out. Like, it's just like, we've got this world of overlaps and stuff, so.

Austin: So it's funny you bring up that like the confusion between like sessions and spans and traces and logs and events and all of that because it's like, this is actually a huge problem in OTel right now where there's so many things that semantically seem like they're the same thing.

Like, I just wrote a blog for the OpenTelemetry website that's just like, "Hey, here's the difference between logs and events and between those and spans."

And a lot of this stuff is like super nuanced, and you need someone that like kind of lives and breathes it in order to explain it to you I think.

Daniel: That's a huge challenge. And that's, I think, it'd be good to start kind of cleaning that itself up, because especially when I get asked a lot of newbie questions.

You know, like, how do I do this? How do I do that? And my focus is like you're still, like I find, at least in my world or my experiences, it's like you're still sort of like, "this is what this piece is, this is what this piece is," and, you know, over the years I've been like accumulating like a glossary of information just to sort of say here what these terms mean to help, you know, clarify that.

Because it's not, like metrics are metrics, but we should have a specific way of explaining what the definition of a metric versus a trace versus an event versus a session versus, you know, everything else.

Ken: Yep.

Daniel: Always a challenge.

Austin: Always.

Ken: So, Dan, where can we find you and your writings and information? How can people get in touch with you?

Daniel: Well, I do have a blog posting. It's blog.ninjabot.net. So ninja, B-O-T, .net, blog.ninjabot.net.

I haven't written something in a while, but I do have something I'm going to be writing soon.

I was working on a three part series called "Tales from the NOC" and some interesting real-life experiences I had in my days in the network operation center and after.

So my next one I'm working on right now I think it'll be quite interesting. So I'm quite excited about that. But yeah, there's that.

And then I also have a podcast I'm trying to get out called "Two Hard Problems" in the year Alpha. That is quite long though. It's out on YouTube.

Ken: All right, well, Dan, thank you so much for coming on the o11ycast.

Daniel: It has been my absolute pleasure. Thanks for having me.

Content from the Library

Visit library

Jul 14, 2025

Podcast

O11ycast Ep. #84, Maddy Montaquila on .NET Aspire

In episode 84 of o11ycast, Ken Rimple and Martin Thwaites welcome Maddy Montaquila, lead PM for .NET Aspire at Microsoft. This...

Jun 26, 2025

Podcast

Generationship Ep. #38, Wayfinder with Heidi Waterhouse

In episode 38 of Generationship, Rachel Chalmers sits down with Heidi Waterhouse, co-author of "Progressive Delivery." They...

May 28, 2025

Podcast

O11ycast Ep. #82, Automating Developer Toil with Morgante Pell of Grit

In episode 82 of o11ycast, Ken and Jess chat with Morgante Pell, the visionary behind Grit, an AI-powered agent designed to...