JUN 30, 2022

39 MIN

Ep. #54, Cloud Native Observability with Alex Boten of Lightstep

GuestsAlex Boten

light mode

about the episode

In episode 54 of o11ycast, Liz Fong-Jones and Jessica Kerr speak with Alex Boton of Lightstep. They discuss Alex’s book Cloud Native Observability with OpenTelemetry, OTel documentation and community, vendor lock-in, and the pain of instrumentation.

about the guests

Alex Boten is Senior Staff Software Engineer at Lightstep and author of Cloud Native Observability with OpenTelemetry. Alex is a contributor and maintainer to OpenTelemetry and was previously technical leader of cloud infrastructure engineering at Cisco.

show notes

about the episode

about the guests

show notes

transcript

Alex Boten: Three years ago when it was announced at KubeCon in 2019 that OpenTelemetry was going to merge OpenCensus and OpenT racing, I think everybody had a great idea that it was going to take five or six months to merge the two projects and we would have 1.0 in November, and everybody would be having drinks and party at KubeCon in November.

And here we are three years later, and we're now reaching the point where metrics are stabilizing and logs are stabilizing, and I think a lot of people are asking, "It's been three years. What's taking so long?" And I think since the original announcement of OpenTelemetry the community has grown quite significantly.

I think from the onset it was the Open Census folks meeting up with the Open Tracing folks and going from there, but since then we've had folks from Open Metrics join the community, from Prometheus, from Elastic Search, and it turns out it takes time to build a solution that includes everybody. Building consensus takes a long time. That's what we're seeing really.

Liz Fong-Jones: Yeah. So it sounds kind of like the scope grew almost, like it was initially, "Lets just mash OpenCensus and OpenTracing together." And instead it became this bigger, broader thing.

Alex: Yeah. I think if we had just mashed OpenCensus and OpenTracing, we would've ended up with something that works for a large portion of people that were interested in distributed tracing, and for some people that are interested in metrics. But the majority of folks who are using Open Source and producing metrics today are very interested in Prometheus, and so if we hadn't brought the folks from Prometheus to the organization or to the project I think the end result would've been very different.

It's maybe even likely that we would've ended up with still two competing projects for different signals here, and I think that's what we were trying to avoid with the project in the first place, right? Trying to avoid this fragmentation that was happening between Open Tracing and Open Census.

Liz: Cool. Now would be a good time for you to introduce yourself.

Alex: Hello. I am Alex Boten. I am a Senior Staff Engineer at Lightstep and I'm also a contributor and maintainer to OpenTelemetry. I have contributed to OpenTelemetry Python where I was a maintainer for some time and I'm now a current maintainer of the OpenTelemetry Collector project. I've recently finished writing and publishing a book, actually it was published this week. I've written a book called Cloud Native Observability with OpenTelemetry, and that's me in a nutshell.

Liz: That sounds like I get to strike that word that says it's an upcoming book in the observability engineering book that references your book as further reading. So congratulations, we both got to the finish line this week then, I guess.

Alex: Yeah. That's so exciting. Congratulations as well.

Liz: Yeah, congratulations to you. So we're just talking about this idea that developing this big tent is really hard. What were some of the things that surprised you about how difficult it was going to be?

Alex: I guess one of the things I'm always surprised by is how hard it is to come up with something that works for everyone. Most of the time that's not going to be possible so you just have to find something that everyone can agree on and move forward from.

But I think one of the things that I've seen in the OpenTelemetry project is a lot of the time people will put together proposals and debate it for a really long period of time and come to a place where it's almost good enough to be agreed upon by everybody who has bene involved in the discussion.

Then someone who maybe hasn't been involved in the discussion, who just noticed that maybe a specification change was about to be merged or something like that and say, "Hey, my opinion is this is a terrible idea and we should do it a completely different way."

And we really try and be inclusive in the community, so you don't want to flat out reject people's ideas, but at the same time it's really hard to get the attention of everybody who needs to be involved in a specific discussion. And to do it consistently is really, really difficult, especially in an Open Source project where some people are paid to participate in the project but a lot of people are doing it in their free time.

So it's been interesting, it seems like it's more common than not to see this kind of thing happen where people are just really feeling strongly about a particular feature that they only just heard of yesterday or something like that. It's been good, it's just been really hard to find ways to reach people to be able to make forward progress, I guess. Finding time on everybody's day.

Jessica Kerr: Are you saying that the people who come in and want to start from scratch on something that's already been through a long process, are these people that you wish you could've gotten involved earlier?

Alex: Yeah. I think in most of the cases it's people that would've had a really good idea and we wish would've gotten involved earlier, because then we wouldn't have maybe gone down a particular direction with a specific design or whatever. I think about some of the early day work in metrics, for example. The metrics specification has gone through several iterations and it hasn't been until the very last iteration where we had a pretty big kick off that involved...

I can't remember how many people were at this kick off. I think it was upwards of 50 or 60 people on the Zoom call. That's when we were able to get enough attention from the different stakeholders to really move forward with the specification there in a way that would encompass everybody's needs.

Jessica: So you want all this input, and it's really hard to get that input at an appropriate stage of the work?

Alex: Right. Because if you get the input so early that nobody else is paying attention in the project, things just stall out a little bit. If you get it too late then it causes a different kind of stalling, right? Because instead of getting something that's pushed through and merged and people can feel like there's progress being made, things stall out because there's yet another thing that we try and bring in to the fold here.

Liz: Right. No one's expecting, like, "We just finished this design up. It feels perfect. We've addressed everyone's feedback. Oops, there's a stakeholder that now wants to get involved." I think that's a challenge that a lot of big companies go through and OpenTelemetry is not a company, in fact as you mentioned, there are people who contribute for work, there are people who contribute in their spare time. These are also many of the struggles that the Kubernetes project went through too, where if you're building this massively growing thing that's taken on a life of its own, it gets away from you.

Alex: Right. I think some people have referred to it as The Tragedy Of The Commons a little bit. But it's not necessarily a bad thing, you're looking at something like Kubernetes and it's wildly successful, probably beyond what everybody had expected. And even earlier today I got a message from someone saying, "Hey, there's this CMS that nobody had really heard of in the community who's now implemented OpenTelemetry ." It's a project in node.js, and it's like, "Cool. We've reached a point where people are just adopting it because it's so dominant in the space," which is just exciting.

Liz: Definitely a big change from when we had to do these workshops, introducing people to the basic APIs and instrumenting toy applications. We still do need to do this, but originally that was all there was, was just, "Here's the scaffolding, let's see what you build with it." As opposed to the more advanced users just running with it.

Alex: Right. Yeah. I do think there's still a decent amount of education to be put out there, which actually led me to writing this book that I published. But you're right, I do think we're seeing more advanced use cases, we're seeing a lot more emphasis on things like RUM which people are really excited about, or client instrumentation, messaging systems, which have always been complex to implement with distributed tracing.

Yeah, it's exciting to see it evolve for sure. I do think we could be better and I think there is a huge push in the documentation side of things on OpenTelemetry.io to make the content a little bit more approachable for everybody. I mean, I think for a long time we relied on documentation that was written by the people that have been so deep into the project that they forgot what it's like to be not so deep in the project.

Liz: Yeah. That loss of the beginner perspective. Fortunately, we're bringing new people into the project all the time so that does compensate for it.

Jessica: Yeah. And as a beginner to you have something valuable, especially whenever you're looking at the documentation and you see something that confuses you, you have a contribution to make because the people who are deep in that project don't know that that's confusing and you do.

Alex: 100%. You will never have that perspective again.

Liz: It's the opposite of the Known Unknowns, Unknown Unknowns, right? When you're new there's a lot of things you don't know yet, but on the other hand there's a lot of things that the people who do know about their project don't know that they know. So that's the Unknown Knowns.

Alex: Right. Once you've entered a project, that's it, you've lost that chance to contribute your very fresh set of eyes to either documentation or readme files or whatever it is that people are struggling with initially.

Jessica: Yeah. I look like I contribute to a dozen Open Source projects on GitHub. But it's all one line in the readme that they missed.

Alex: That's a total, valid contribution. 100%. I wish more people would continue to make those very small improvements right.

Jessica: Yeah. It's small but it scales to a lot of people that it helps.

Alex: Yeah, absolutely. It's almost unfortunate that a lot of the time it's intangible because the next person that comes along and sees the clearer documentation, they benefit from it but it's hard to know that, "Oh, it's because this person added that one, small line in the documentation that everybody's having an easier time." I don't know if anybody's capturing any kind of metrics to see how much better onboarding for a particular project is after every kind of change along the way.

Liz: It's one of those things where it doesn't necessarily have to be in the repo itself. For instance, I've been really amazed by the quality of Lightstep's OTel documentation, I regularly refer people to it, it's just really, really good.

Jessica: Yeah. And as a person, just your blog can show up on Google and help someone.

Alex: Yeah.

And that's something that's also been exciting, is just seeing how many more people are writing on their own blog posts or on Dev.to or whatever platform they use, about scenarios that they're using to either troubleshoot applications using OpenTelemetry or implement even simple apps that don't have any examples in other places, using OTel. I think it's pretty cool to see how much it's grown. It's gone from this very fringe thing to a much more omnipresent kind of project. It's just about everywhere you look.

Liz: It's got the critical mass, right? You know that you're going to have a robust community, and I think that it also helps to have these standards alignment around the W3C trace context as the standard way that we propagate information between processes and OTel being the easiest implementation of that.

Alex: Yeah. And the alignment of the W3C solidifying things like the trace context specification and when the tracing signal reached stability. It was almost perfect, right? I think W3C reached stability at some point, I can't remember if it was 2020 or 2019.

Liz: Well, this is one of the things that we and our customers run into frequently is if only the Amazon ecosystem would adopt W3C trace context as quickly as possible. But it's hard for them to do so, because of the independence of each one of their teams so even if the X-ray team would like to see trace context adopted everywhere, they can't just magically make this appear on the roadmap of all of the teams that has context.

Alex: Yeah. I guess coming back to your earlier question about why is it so hard for a project like this to get to where it is or why is it taking three years? I think some of the things that have been built into OTel around giving it as much flexibility as possible is something that's going to give it longevity.

For example, you mentioned AWS and X-ray headers, but the ability to combine headers or combine formats for propagation across different vendors or different formats that have already been Open Sourced or whatever, and allowing people to use OpenTelemetry where they are today I think is part of the reason why it's doing so well as far as adoption goes.

Actually, that's one of the exciting things abut the OTel Collector is, again, with all of its existing receivers and exporters, it really gives people the ability to use it as they are in their current state of observability. Rather than having to migrate to something more complex or something brand new or whatever.

Liz: Yeah. It may have made sense to build your own collector or logging pipeline or tracing pipeline five years ago, or even two years ago, but it doesn't necessarily make sense to build one from scratch today. You still might want to customize it and manage it yourself, you might want to buffer things in Kafka but the OTel Collector can act as both the thing that puts things into the pipeline and takes things out of the pipeline for sure.

Jessica: Yeah. But don't worry, configuring the Collector is going to keep you in a job.

Alex: Right. It's just YAML. How hard can it be?

Liz: Right. What are your actual needs? How many places are you trying to route this data? Are you trying to transform it or are you just using it to kick the tires? When I originally asked the question in the beginning I think what I had in mind almost was the dichotomy between how do you support these simple use cases but also support the full flexibility to do anything in a YAML file? That's the problem with the Collector, is that it's all YAML.

Alex: Yeah. I think YAML was the coolest thing for everybody to use nowadays. But yeah, I guess that's another aspect of OTel that we've heard a decent amount of pushback from, was just the complexity of setting up initially right. I think because the project focused so much on the flexibility aspect, there hasn't been a ton of time on maybe the user friendliness or the UI aspect of configuring OpenTelemetry , and I think that's something that new users really struggle with.

But it does give that flexibility for adapting to any kind of scenario that you potentially need, but I think that's probably maybe the next frontier of the project once the signal's reached stability and there's implementation in all the languages.

Making it easier for people to use is something that would be great, and there's even been some talk of supporting a configuration through YAML and the different language implementations. I don't know if that's the right way to go about it but something along those lines would be great to see.

Liz: People definitely have been using environment variables to configure things, which works to an extent. But yeah, it's hard to manage at scale and YAML is never going to be a perfect substitute to just writing code if you can just write code, but sometimes you don't want to write code and you want to manage it like an object instead.

Alex: Right. And using environment variables is great in the sense that you can reuse the same variables across languages because the implementations all use the same specification for variables. But it does have its limits, like trying to define arrays or maps inside environment variables is just a nightmare in BASH waiting to happen. Eventually people are going to run into problems.

Liz: Yeah. I think the other direction with regard to making things simpler is Lightstep has launchers, we have our OpenTelemetry distros. This idea of providing Batteries Included implementations in case you know that you want to send things to one specific provider as opposed to, "I want to keep my options open to begin with."All of your instrumentation is fully portable, but the question of what is your default config can be eased by one specific partner.

Alex: Right. And that's actually another thing that's been interesting, is seeing the feedback we've been getting around something like a distro or a launcher. I guess for the newcomers to the project, it really does help them get up and running more quickly.

I think there's some questions that we've seen from folks around whether or not the project is so complex that it requires some kind of a layer like a distribution to wrap it.

I don't necessarily see it as a bad thing, because having the defaults preconfigured for users is not necessarily a bad thing. But I do hear what the concern is there around, "Well, if it's so complicated to configure out of the box, why is it so?"

Jessica: The Collector has some lovely abstractions in it, of a pipeline with the inputs and transformations and then the exporters. Which yeah, is non trivial, to have to define yourself but it does have that conceptual elegance and flexibility for later.

Liz: Once you wrap your brain around the verbs and nouns, right? I think it's the suburban noun that people haven't had to think about before.

Jessica: Yeah. You have to get those concepts. Fortunately, you can do a lot without a Collector just for trying it out.

Alex: Yeah, absolutely. It's funny that you mentioned getting your head around the names or the verbs that are used in the Collector. Naming things is hard. I think that's really what it comes down to, forever and always. There's things called receivers in the Collector. Well, the receive data but they also go out and scrape data from different targets which is really confusing conceptually.

Jessica: Active receivers.

Alex: Right, right.

Jessica: Yeah. The one that gets me is resource. Right? Which from metrics historically has meanings, but coming at these concepts from tracing, resource doesn't mean like host until I have to figure out that that's usually what it means.

Liz: Right, exactly. The notion that the resource includes the service name and that it includes all of these... The thing I wished we'd said is these are the default set of tags that will get applied to everything. That's what it means.

Jessica: Right. Yeah. Effectively a resource is just another key value store where they do get added to every span, but you can only have one of them per process.

Alex: Yeah. If you think of the resource as the source of the telemetry, then it begs the question why wasn't it just called the source of the telemetry instead of a resource. But I don't know, I wasn't involved in the early days of those debates. I'm sure there were lengthy OTEPs around this.

Jessica: I'm sure I could've jumped in at the last minute and commented on that change process. "I don't like this name, this name must be wrong."

Liz: But we can definitely educate people now that these things are stable, right? I think that's the other piece, is the stability means that when you publish a resource, that the resource can be reliably and repeatably used instead of having to constantly update the resource as in the educational resource.

Jessica: See? See? But I can say OpenTelemetry resource and I can scope the word in something that is a proper noun that refers only to this concept and is an established concept with stability.

Liz: Yeah. Which I think brings us back around to the subject of the book. So what's in your book? What can people expect to see when they pick it up and read it?

Alex: Yeah. So the book is broken into four different sections. The first part is really about talking through a little bit of the history of observability, talking about the building blocks of observability, the concepts around metrics and traces and logs and where it came from. Really just helping the readers getting their bearings on what this whole observability thing is all about.

The second portion gets really hands on with the implementation, specifically around the Python implementation but it dives into the nomenclature that OpenTelemetry uses for things like providers and tracers or meters. So the second part really focuses on implementing the signals using OpenTelemetry. There's a little bit even on auto instrumentation and how that works in Java and Python as well.

The third part is really about the collector, what it is, why it's useful, why would people want to configure it. It talks a little bit about its extensability and the different components like the exporters, receivers, extensions and whatnot. It also dives into the OpenTelemetry protocol, just what it is and why it exists.

Then the last section is about, okay, so you've generated all this data, what do you do with it? And so it explores different Open Source backends that you can use to visualize data and to collect it. It also talks a little bit about different scenarios that you might be able to use telemetry to troubleshoot an application or get to the root cause of a particular problem.

That's the different sections. I tried to write it in a way that's approachable for people that aren't deep into the land of observability because I felt like a lot of the resources out there can be... I don't know if intimidating is the right word, but to people who haven't been in observability for a certain amount of time it could be hard to get your bearings a little bit. So I've tried to approach it with fresh eyes.

Jessica: They assume a lot of knowledge, a lot of terminology and some philosophy.

Alex: Right. Definitely a lot of terminology and I tried to explain the concepts in as clear a language as I could. Hopefully it worked. I don't know. I'm sure the readers will tell me if it worked out or not.

Liz: Yeah. So it sounds really interesting, this juxtaposition of both of our books because the book that Charity Majors, George Miranda and I wrote is much more about the organizational philosophy of observability, and yes, there's a chapter in there about using OTel to create traces. But you really should look at a more in depth resource for how to do it in your specific language.

I'm just showing you what this looks like with O as an example, but I'm not going to get into the details of how the meter works and how to configure it. Instead, it's just this stub that says, "Hey, when it comes time to implement it, here's some ideas. But there's not room in this book to expand everything about OTel." So I think these two things go very nicely together, the how versus the why.

Alex: I'm really excited to read about it, to be honest. I purchased a copy some time ago, I'm still waiting on it.

Liz: Yeah. So your book has been in the works for a year or so, our book has been in the works for three years. Ahh. Did you get to work on this at work or was this a side project?

Alex: The folks at Lightstep have been super supportive of the project. I think I unfortunately didn't really lean into that support as much as I could've.

I ended up spending a lot of weekends and evenings writing this book. But yeah, it's been super interesting and in the past year or so the challenge for me was really focusing on ensuring that whatever was being written wasn't completely out of date by the time it was being published.

Liz: Yeah. We had the same challenge. In a lot of ways I'm glad that we had two extra years to marinate on the book while the pandemic broke out and while we were adapting to all of that because that also allowed OTel to mature. It also let us collect a wider variety of stories from people who had successfully adopted the observability organizational strategies. Yeah. Once it's in that tree format you can't really change it. You can change the online version but you can't change the print version and that can be hard.

Alex: Right. And a huge portion of the book that I wrote relied on things like the metric specification and that was constantly changing along the way so I think I rewrote the chapter on metrics maybe three different times. It's still probably going to be out of date by the time people really get to reading it, but you try and capture that moment in time at which you write it and you have to move on from there. If there is one thing that I've learned about technical books is that, more often than not, they tend to be just behind where they were when they were originally printed.

Liz: But that's okay if people can go to the link and go to the Git repo because it's not like people are typing verbatim off of the page anyway.

Alex: Right, exactly.

Jessica: With your book I'm excited to read the part about the collector and the part about the OTel standards and the concepts involved because the concepts are stable.

Liz: The concepts are stable, and also I've not been keeping as much on top of the metrics and logging APIs as I should've been because my primary focus had been on bootstrapping the tracing efforts. So yeah, that's going to be a very good route for me. One thing that I was curious about is you mention orienting around the signal types that people already know in terms of tracing, metrics and logs. How did you wind up framing the separation between the datatypes versus what you use them for? Alex: You mean within the context of the book, or?

Liz: Yeah. Within the context of the book. How do you get people to think about when do you use each signal type, avoiding Collect All of the signals impulse that people often have?

Alex: Yeah. I think I tried to capture it in common use cases for each one, I think that's generally how I tried to frame it as, "Here's what distributed traces are and here's roughly some common use cases for when you would use distributed traces."

I tried to steer clear of being too prescriptive about when people should use a particular datatype because I've been in the industry long enough now to know that there's always nuances of when one particular signal or one particular datatype makes sense and when it doesn't. And so I tried to give people general guidance without being prescriptive, and that's how I tried to approach it.

I think one of the last chapters which talks about using the signals, I tried to describe how you would use correlation between the signals to really solve problems using the telemetry that you're generating. I think if nothing else, I hope that's what people get out of reading the book.

To me that's really the powerful thing about OpenTelemetry , is that aspect of correlation between the signals. It's never really about those three pillars that people talk about non stop. It's really about how do you navigate from one datatype to another as you're exploring your telemetry?

Liz: And in my opinion, that is the thing that is strongest about OpenTelemetry, is that ability to not have to change contexts, not have to change methods of measuring, not having to change your verbs and nouns. Awesome. I'm really glad that both of our books are coming out the same week, that is so exciting.

Alex: Yeah. That'll be exciting. I've never written a book so I have no idea what to expect as far as feedback from folks out there so I hope people read it and reach out on whatever format they use, whether Twitter or whatever to talk about the book.

Liz: So what are some of the next milestones that you're hoping to see for your professional career and for OpenTelemetry?

Alex: Yeah. I guess I'll start with OpenTelemetry because it seems easier. I hope that we can get to this GA implementation for metrics across the different implementations. I know there's some big pushes from both the folks in Go and at Python to try and get a release candidate that supports the GA, or a release candidate before GA for the metrics signal.

Then I guess just wrapping up the last of the signals with logging and ensuring that that's in a place where people can count on it. Just to get to that place where we can start making progress towards seeing a wider adoption of OTel in those signals. One of the challenges of OpenTelemetry is making sure that the instrumentation libraries out there are meaningful to people in the sense that there's a bunch of instrumentation libraries for third party libraries that have been created as part of the OTel project.

I think one of the challenges there is there's always going to be more libraries, third party libraries, than there are instrumentation libraries. I would really like to see OTel be adopted by Open Source projects by the experts in those projects using the OTel API rather than-

Liz: Right. Yeah, we've talked for so long about the idea that the contrib repos should not need to exist forever because they should be baked in into each of these libraries.

Jessica: So the PostgreSQL client should have its own OpenTelemetry span creation that you can turn on or off?

Alex: Right. And we've seen this with, I think Spring is one of the frameworks that came to the OTel table. I know that on the Python side, the Celery folks came to the sig to talk about using the API directly. I think if we're successful as a project, those instrumentation libraries will just go away and disappear completely. At least for the languages where it's possible to do so.

Liz: Yeah. I can see a lot of parallels to the typescript world where for a long time there have been these definitely typed, everyone contributes these things that are outside of the main repo that are like, "Here are the types for this library." But eventually you just bake them into the library because they're expected and default and commonly supported. Typescript is a large ecosystem and there are definitely a lot of projects that don't yet have types, but definitely that kind of momentum has clearly shifted and I hope we get there with OTel as well.

Alex: Yeah. I think tracing is something that not a lot of Open Source projects have adopted, but metrics we know for sure that that's something that's pretty common. Just about every database, for example, implements some kind of metrics that it emits, right? And if that could be translated into something that's OTel compatible, that would be wonderful. If it was rather than having to have even these receivers that scrape different database types, if they could just emit OTLP and you can just ingest this one format, it would be wonderful.

Liz: Yeah. And SQL Commenter too really makes me excited, the idea that we can get context propagation through SQL statements and be able to trace slow queries back to the app query that initialized them. Yeah, there's definitely a lot of exciting work ahead and I think to answer for me the question I asked you, I think the thing that I am hoping will become possible at some point in the future is the idea of the...

So far the vendor and end user pendulum mostly has shifted towards vendors as far as... At Honeycomb we tend to hire a lot of people who used to use Honeycomb as our clients and now work at the company. One day I would like to see that flow the other way too, to have people who are really excited to use Honeycomb or to use OpenTelemetry and to contribute back to OpenTelemetry, but employed by an end user company.

I have Shelby Spees, who previously was a host of o11yc ast and previously was a developer advocate at Honeycomb, is one example of those folks. But I'm hope that there are going to be many more folks who migrate over the coming years from vendors to users and are still able to contribute to the project and deliver a lot of value.

Alex: Right. Yeah, that actually brings up another exciting thing with OTel, it's this notion that people will be able to take this tool and carry it with them wherever they go. When people change organizations, or even teams within the same organizations, not having to relearn the tools or adapt to whatever is currently existing I think is super exciting from a developer standpoint.

Jessica: So one of the benefits of not having vendor lock in, it's not just about what that vendor might be costing you. It's also about portability of your knowledge.

Alex: Right. And not having vendor lock in on the side of instrumenting code or emitting and producing telemetry, I think it's kind of a win for everyone involved. It's a win for vendors because we don't have to maintain these really cumbersome libraries that we've written or supported for a really long time. From the user's standpoint, they don't have to relearn the tools whenever they want to try a different vendor and I think that's incredible, right?

That's one of the really exciting things that Kubernetes achieved, for example, where instead of having to relearn your different platforms every time you switch teams or whatever it is, you can just take your Kubernetes knowledge and apply it to AWS Cloud or to GCP or to wherever else people are supporting Kubernetes.

Liz: Faster onboarding is such a huge benefit because a lot of teams spend so long trying to onboard their new folks, on, "This is the way we do things." And standardizing that is just so great.

Jessica: Yeah. Because Kubernetes incorporates a lot of concepts that you don't know as just a developer. You don't know about network interfaces and a lot of other words. But then you learn those with Kubernetes and they go everywhere, instead of having to learn platform specific ones. It's the same, I don't want to read a book about Prometheus, but I do want to read a book about OpenTelemetry concepts that are drawn from dozens of projects worth of experience and coalesced into one really well thought out API that I'm going to be able to use everywhere.

Liz: I do think the interesting thing about Prometheus specifically is that, sure, I might not want to read specifically about the Prometheus implementation details, but I do think that we're entering this interesting world of, "Okay, OpenTelemetry is for generating data, transmitting it."

But the Open Metrics query spec is really interesting for being able to interact with a metrics backend, for being able to fetch data out of whether it's Wavefront or it's Prometheus or it's CoreTech to use a metrics backend in a consistent fashion, regardless of which one you've chosen.

I'm very curious to see what that effort is going to look like with regard to the query side of tracing or the query side of logs one day. I guess the universal factor for the query side of logs is grep and regular expressions.

Alex: Right. The universal tool for logging is always grep. Yeah. It's interesting to see, I'm excited to see what comes out in the next couple of years around unified queries. I think with OTel, for example, because the data is just about always correlated across your signals, making it effortless for users to be able to put a single query for maybe a resource or maybe a particular Trace ID and seeing your data come out across your different signals, I think that's super powerful.

I think there's a lot of exciting things happening in the observability space around that. I think we've seen something come out, I think it was announced last week around Tempo or something from Graphana that was announced around being able to query across different signals. That's really the future of observability, right?

Liz: Yeah. And also I think I'm really excited about Open SLO, which has its similar idea of how do we take this observability data and produce high quality SLOs from it?

Alex: Yeah. Actually, coming back to OTel with high quality data, I think the one thing that people underestimate is the value of semantic conventions, being able to know what to expect as a user, as a vendor, being able to know what you should be emitting as a brand new user to OTel or to observability, I think it's so exciting having that guidance there. It's codified and people can use libraries to produce and use the semantic conventions, I think it's super exciting.

Jessica: You don't have to debate with your teammates what to call this and whether to use a dot or an underscore.

Alex: Yes. How many different ways can you misspell Keys to a key value store?

Liz: Awesome. Well, thank you very much for joining us, Alex. It was a pleasure having you on the show.

Alex: Yeah. Thank you so much for having me, it's been great.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Jul 14, 2025

Podcast

O11ycast Ep. #84, Maddy Montaquila on .NET Aspire

In episode 84 of o11ycast, Ken Rimple and Martin Thwaites welcome Maddy Montaquila, lead PM for .NET Aspire at Microsoft. This...

Jun 11, 2025

Podcast

O11ycast Ep. #83, Observability Isn't Just SRE on Steroids with Dan Ravenstone

In episode 83 of o11ycast, the Honeycomb team chats with Dan Ravenstone, the o11yneer. Dan unpacks the crucial, often...

May 28, 2025

Podcast

O11ycast Ep. #82, Automating Developer Toil with Morgante Pell of Grit

In episode 82 of o11ycast, Ken and Jess chat with Morgante Pell, the visionary behind Grit, an AI-powered agent designed to...