MAR 19, 2024

47 MIN

Ep. #68, Observa-What? with Michele Mancioppi of Dash0

GuestsMichele Mancioppi

light mode

about the episode

In episode 68 of o11ycast, Jess and Martin speak with Michele Mancioppi of Dash0. This talk examines what it takes to make observability more accessible to non-experts in the space. Additional topics explored include the differences between monitoring and observability and Michele's shift from staff engineering to product management.

about the guests

Michele Mancioppi is Head of Product at Dash0. He has expert-level experience in distributed tracing, Kubernetes, containers, and OpenTelemetry, and was previously a product manager at Lumigo and Canonical.

show notes

about the episode

about the guests

show notes

transcript

Michele Mancioppi: I was quoting a colleague of mine at the time, that I had to explain to them like, "Yeah, I'm here for observability." "Observa-What?" I think it represents faithfully the reaction of a significant demographic out there.

Martin Thwaites: And do you think that could be to do with bubble? I think a lot of this does come down to bubble at the time, so when you say observa-what, do you think that might have been the people who follow Canonical?

Jessica Kerr: What's Canonical?

Martin: The people behind Ubuntu.

Jessica: Thanks.

Michele: It's a mix. I mean, definitely there is a strong component of culture and geography in the literacy about observability and monitoring. In 2021, when I wrote that article, there were stark differences between the UIs in Europe in terms of search partners, in terms of observability versus monitoring. There is also an age component to this, people that started our profession more than one decade ago, they were exposed to monitoring, right?

And despite observability being a different thing, it's such a nuanced proposition for people that are not deep into the matter that it may be hard for people to switch over. For people that start fresh, that is observability. It's the word we use nowadays, which is both good and bad. On the one hand it's a very cool term, and it does provide some very interesting insights.

For example, I love the fact that observability is a property, not an activity like monitoring. That's great. It's has been co opted by previous generations of vendors in the industry, and had lost a lot of its nobility in meaning along the way which is unavoidable. In the market, it works.

Martin: Yeah. It is one of those things that I like the idea that it's an activity. I think that's a-

Jessica: Wait, wait. We just said it was a property, not an activity.

Martin: The monitoring is an activity, whereas observability is a property. I like that idea, that there's a distinction between the two, that they're not the same thing, we're not doing both of them. I think that's quite interesting as a differentiation between the two because we need that, we need to know what the difference is because otherwise we're just doing the same thing, we've just given it a new name, which feels pointless.

Michele: So there is a very strong correlation. In most cases you achieve observability by performing monitoring, but observability is not the only reason to monitor your systems. For example, governance or use cases that are beyond DevOps, or even ensuring a good quality of service. They can be perfectly well achieve with monitoring. And on the other hand, monitoring is not the only way to achieve observability.

Although, in software systems it tends to, right? I like to think that if as an industry we eventually reached the nirvana of treating observability as a first class functional requirement in our applications, then the nature of monitoring would change dramatically.

Jessica: Really?

Michele: Whether we're going to get there is open to debate.

Jessica: How would the nature of monitoring change?

Michele: The activity around monitoring is, in no small part, the collection of telemetry and signals, and you put the data together. It's a mix. What we do today is a mix between observing the system as a black box, for example, you go there and scrape metrics. Or the application side is emitting telemetry towards you, it's sending you logs, it's making metrics available, it's pushing spans, it's collecting and providing data, making it available somewhere nearby where you can read them.

I like to think that a more mature industry in terms of software development would result in higher level telemetry that is being sent with a much higher ratio of insight versus noise.

Jessica: Okay. Describe higher level telemetry?

Michele: So today, when you create, when send spans, when you meet metrics, when you meet logs, it's effectively an ascended level of observability. You are putting data out there. The data themselves, they don't mean anything. It's just bits, right?

So with logs, they describe events that happens. Most of the time in natural language. It takes you, the human on the other side, to read the log and understand what it meant.

There are some codified aspects to the data we send out there. Logs have severities. Metrics have shapes and different instruments with expectations about how the data they convey behaves. For example, a gauge can go up and down, a counter does not. There is, and I welcome that because it's a great improvement for a student of the art, an increasing corpus, an increasing amount of semantic conventions that help us not only in making telemetry data easier to consume for humans, but also from machines.

Now, this being said, great improvements over the past two decades, we still provide very fine grained information. It's almost data. The application doesn't have an opinion whether the amount of 500 requests serving is good or bad. It does not tell you, "Look, I'm not doing fine because I'm answering so much, 500 requests."

It just says, "These are how many, 500, requests I have," with no judgment attached. If we treated observability as a first class requirement, we would get closer and closer to what could apply, as it has been all along, where the power on button turns red with potentially a small selection of relatively well understood error codes that immediately lead you to a run book of what will fix it.

We're not there yet, so we are making strides in generating telemetry from our applications and infrastructure, delegating, still, most of the interpretation, aggregation, evaluation, reaction to this telemetry to backend systems. Then afterwards see what's what. There's nothing necessarily wrong with this model. There is some nasty caulking in there, but it is, to some extent, wasteful.

There are people in the industry that are vocal about how much network traffic can be devoted to observability. The reason why, for example, distributed tracing was borne with sampling built in is because of the scale of Google. Without that, you would not trace things very fast. Tracing one out of 10,000 or 100,000 was enough to give a statistically accurate view of the system and the behavior.

But not every company is Google, so there are different trade offs in terms of how much telemetry you want to collect, how much you want to pay for it, for example, in terms of egress traffic or storage, and how much utility you get out of it.

Jessica: So by higher level telemetry, are you looking for something lower volume but higher usefulness?

Michele: No, the nature is a bit different. There are some interesting experiments here, which I don't think they're ready for mass consumption or they might never become. They try to describe the application talking about itself in terms of high level concepts like, "I need to throttle." Our applications don't know they need to throttle, throttling is something that they don't really look for it more than a counter that says, "Enough requests, cut it off."

Think of it a bit like trying to distribute the interpretation of monitoring across a system, rather than centralizing everything to the one component that consumes the function and then decides whether it's good or bad. It's a bit hard to visualize because we are very far from it, in terms of the industry and it may never happen. But I would like to see applications that come with, effectively, a much higher grade of built in understanding of how they're doing.

Martin: So I think what we're kind of saying here is applications that can use their own telemetry to make decisions in realtime, as opposed to humans making judgment calls afterwards. I've seen some of these things where one example was a GraphQL server that would use telemetry data to auto-optimize the paths that it took for resolvers, for instance.

I think there's some interesting concepts there about how developers can use that telemetry inside of their applications at runtime in order to understand that, and using that same telemetry data would then allow people to do two things at once. Also get the human analysis piece, so get the telemetry out for human analysis, the thing that humans can do where we can notice patterns, we can notice anomalous patterns, the unknown patterns.

Whereas an application can do the known patterns, if this happens, then do this, which I think is a really interesting concept. It's one of the things, one of the reasons why I wanted to bring you on, because it's how we bring telemetry to developers and make it more meaningful for developers. Which I think is a really nice segue for you to introduce yourself and tell us who you are and what you do.

Michele: Before I introduce myself, it's worth noting that in parts of computer science and industry that are more consolidated, more well understood and, to some extent, have higher average quality of how things are done, telemetry is used already today to modify dramatically the output.

For example, you can feed to the Go compiler profiling data, and if you optimize the code it generates to be more efficient for the kind of logic paths that is available in the profiling data. I think it's a function, very much, of the maturity of the technology more than it is of a sort of philosophical point of view of how things should be done.

Martin: Cool. So talk to us a bit about you then. Who are you? What do you do? Why is it you're doing these sort of things? I'm assuming that introducing yourself and your company should give us a little bit of context.

Michele: So my name is Michele Mancioppi. I have been a product manager in observability for the best part of a decade for now. Before that, I was a staff engineer and I love observability because it helps me help people do software that works better. I am personally offended by software that sucks, and I think observability, providing good observability tools is the best way that I can help alleviate that.

So now I'm with Dash0. We are a bunch of people in love with observability. Most of us were at Instana before. In between Instana and Dash0 I have had a few detours by Canonical where I was doing observability, mostly based on JuJu, so their orchestration. Then Lumigo, serverless observability.

Before joining Instana I was a staff engineer at SAP and I fell in love with observability when one day my product owner came to me with a startup, showed me a LinkedIn page of somebody in Brazil that said that they thought they logged into someone else's account on the SAP Cloud Platform, which was operated here.

I spent the best part of a week crawling through access logs to figure out what they did, and access logs were not enough so I had to walk across the floor and talk with the one person in the authorization database, where he found out immediately that in reality this person had two accounts and one was a trial, and he forgot of course about it or was confused.

So he misinterpreted that. Then I vowed that I would never do something like that again in my life, and went down a dark path of figuring out what is APM and where we need it. Then I realized that I loved doing APM more than using APM, so I joined the industry on the provider side and here I am.

Martin: Seems like a bit of a journey. I think it's very familiar for a lot of people as well, because that idea of there's that inflection point where you understand that that is so important to you doing your job, that you want to do more of it and then it becomes addictive because it's like, "But I want to do more of these things.

I want that euphoria of being able to see that something happened." I think that's really core to developers, that we like that, the graph or the investigation. Personally, debugging is my favorite thing to do. I mean, it's not while I'm doing it, obviously because then you get frustrated and pull your hair out. But then that euphoria you feel once you've done it is really amazing.

Michele: The dopamine hit.

Martin: Exactly. And then taking that into production and seeing those weird and obscure problems that only happen when you've got 10,000 requests happening at once. But yeah, like I said, a very familiar story for a lot of people.

Michele: I also see there is an ethical imperative. So if you're providing software, with the importance that software has nowadays, people will get frustrated if the software is not doing very well. It's giving them a bad experience, and observability is the best way to reduce that, to understand from the point of view of the end user how well your software is performing, and it gives a chance to make it good to the people using your labor.

Jessica: As a user of software, I love that because, yes, I get so frustrated and it affects the quality of life of my whole family as I yell and scream and they're like, "Jess, are you okay?" And I'm like, "It's just computers!"

Michele: Imagine you're there, about to finally manage to land the figures with the tickets and it crashes on you. Oh my!

Martin: I love that as an observability provider slogan, it's like, "We provide observability so your family doesn't need to suffer."

Michele: No, we provide observability so that your family will suffer less, because that's the reality. Bad software is here to stay, you just tell people that want to better themselves to make a better software. I'm going to mention I'm a bit cynical.

Martin: I mean, there's a scale and I think all developers at some point are on that scale of cynicism. Does it take cynicism to be a good developer? No, but cynicism makes you a better developer.

Michele: I think that cynicism and the constant nagging at the back of your head that this will fail, is just a matter of how often, and it actually actively helps you in reducing the amount of failures.

Martin: I wonder if you can measure how long somebody has been in a developer career by how cynical they are. It's like, "I've seen some things."

Michele: I think you can probably gauge the amount of trauma they underwent based on their humor. I don't think it's a matter of how long they have been, because there are some companies that succeed in packing a lot of trauma in a short timeframe.

Jessica: That's true.

Martin: So you've been doing this for the better part of a decade. How do you think that we're doing in terms of making things better for developers?

Michele: Oh, that's a very nuanced question. There is so much to unpack. So I'll start by saying that it has never been better being a geek or an observability geek today. We went from observability being, in large part, a commercial play for anything that was not relatively simple systems like metrics from System D or Sys Clock.

We find APM technology has become ubiquitous, almost universally accessible thanks to OpenTelemetry. Now it's been said. It's not universally available as a product, we have a long way to go to package the very powerful observability tools that are available in open source in a way that feels like a product more than a awesome ego-technic kit. It's only a phenomenon of the past few months that I've seen people obviously new to the field struggling with the difference between delta and cumulative metrics.

This was not an experience that people had before because it was well hidden behind the sausage that is commercial observability. That being said, if you are interested in observability, it has never been a better time to jump in because there's so much available. A lot of technology that was only commercially available a decade ago, now you can look how it's built. It's pretty great.

It has a dark side, of course. For all the faults of large observability vendors, also of the past, it was their best interest to support you and make your journey easier, and the very well intended SaaS community doesn't have that amount of resources that they can take everybody by the hand and making sure that they are wonderfully onboard and they get to create value out of the technology. It's because also it works, right? Open source is much more self-motivated.

Martin: I think what you are touching on there is this idea that... I was going over this car analogy in my head of the idea that people just drive cars, that's the APM stuff. They don't care about the engine, they care if they need to put some fuel in it and pay for it, and that kind of stuff. But now what we've done is said, "Here's the engine. You need to build it all yourself."

And people are like, "Uh, okay. Yeah, I didn't need to know all that. Now I need to know what a..." I don't know, is a carburetor a thing? I've heard that. I don't know what one is myself, but I think it's a thing in a car. But that idea that some people don't want just to drive their car, they kind of want to tweak it, make it faster, that kind of stuff. We've gone from one end of abstraction to no abstraction whatsoever.

Like you said, those cumulative and delta metrics that people now need to understand how they emit them in their applications, they need to understand what temporal aggregations are and all this kind of stuff inside of their applications. Realistically, most people don't want that. I think metrics is one of those things where it does get incredibly complicated, which is why I love tracing because it's not anywhere as near as complicated.

Michele: Tracing is pretty difficult. Go and try to explain to somebody where they lose the trace context because they're doing a weighting in OJS and all of a sudden they lose the active span and then you see their heads explode.

Martin: Oh, I've seen Jess do that for the last couple of weeks.

Michele: The last thing is that at least you can pretty well visualize the pursuit of tracing if you have a notion of the operational map that was planned and you kind of get there. There are gotchas, but at least it's something that you can visually make. It's either a tree or a there's a graph, but you can see it.

Go and explain the metrics, time windows and aggregations over that and, yes, it can make charts. For example, when you go on PromLabs, there are very nice courses about it. But it takes much more math to appreciate metrics and why they work the way they do, than it does tracing or logs.

Martin: I think tracing is harder when you try to visualize. Metrics is harder when you try to generate.

Michele: Oh, really? Is that a problem in metrics?

Jessica: I want to graph these metrics that I'm getting from the Kubelet Stats Receiver, right? So there's OTel collector standard stuff and it's emitting these metrics, and in order to graph them correctly I need to know whether they're delta and cumulative, and I need to know the timeframe that they're emitted on, and what the units are and stuff like that.

And I cannot find that anywhere, I don't know how to make a correct graph of the metrics because I'm not in control of both ends. I'm not both the one emitting and the one displaying. Like you said, if one provider is doing both of those then you don't have to worry about it. But I need to worry about it.

Michele: And that also already assumes that you are in control or are very well acquainted or fluent in the query language.

Jessica: That's true.

Michele: Because metric is in a language, and then to expose the full complexity of the very complex data source behind in ways that tracing doesn't, because both of the query languages with maybe the exception of TraceQL, they're pretty limited in the amount of querying and correlation they can do on the spans.

It's still very much a matter of looking at single spans, how many spans these are, but it's very seldom that you have a tool that allows you to say, "Give me all the traces where this happens and within five minutes something else happens downstream of a consequence of that."

So that, the whole nature, the whole graph nature with parent-child relations, and the implied causality of the child span happening because of the parent span is still mostly hard in the industry. So when we get there, probably through the tracing querying, it'll get as hard as metrics. It's just not there yet because of limitations of the technology.

Martin: Yeah. There is some complexities that come with the trace querying stuff and, in my experience, I've been in quite a few different tools. But from a Honeycomb perspective, for instance, we treat each span individually and you can query all of those. The idea of being able to do the relationships up and down the trees and adjacent spans, and stuff like that, that's when things start to get really complicated.

Jessica: And we haven't announced anything about that yet.

Michele: Interestingly enough, the way that technology has developed cumulatively in the past 10 years has made the importance of causality in a trace much bigger than it used to be. When we used to build our systems, mostly synchronously where the client would wait until the server was done, then you could more or less imply the causality just by timestamps.

But now our systems love to do stuff asynchronously, putting messages in queues, consuming batches of messages, losing the trace context all at once from all the messages, and it's very hard to find a system that even can trace data, describing asynchronous processes to give you the guarantee that the message that you already received resulted in that unit of work being done within a certain amount of time from the message being received in a completely different component.

That is very relevant for systems that have queue economization, probably use queue SLOs of those that they do nowadays, expectations for the customers because saying to the customer that, "Queue UI, I accept your request" and then nothing happens, it's not the best experience.

Martin: I always love the idea that people think that just because I'm using a HTTP API, I've built a distributed monolith that things go through, that everything could be just correlated by time stamps as to what happened in which order. Then you say to them, "When was the last time that you synced the time between all of your servers?" And they go, "Say what now?" And you're like, "So the time will skew between all of these services, when did you make sure that they were all in place?" And they're like, "um, you can do that?"

Michele: You know what you should be doing, not like in a scout camp where people gather around the bonfire and they tell each other scary stories. Cautionary tales in that part, we should do the same at KubeCon, reading LANport articles on clock skew to newcomers. Let's put some fear of time into them.

Martin: Yeah. I love the idea, we've been toying with the idea a few times of creating the idea of Incident War Stories, but I love the idea of an observability camp at KubeCon where there's a little imaginary fire where we all tell our incident stories to all of these new people.

Michele: We're living in a world where the level of abstraction has been increasing, right? Now we can deploy workloads that work more or less like on our machine, thanks to containers and infrastructure on silicon that is on the other side of the world, and it mostly works. That has made doing simple things simple, and it has increased exponentially the amount of very complex failures that can occur.

When those occur it's worse than ever before because one thing is the rack exploded, it's a very tangible thing, you know what to do, you get another one and plug it in. The other is all of a sudden I cannot really debug my lambda, I don't know what's going on in there, I cannot profile it, I don't know where the execution time is going, I cannot touch the compiler in there.

I can try to fire some things and god knows what happens to my customers, right? So there is a price to pay, easy things are getting easier, and dealing with failure is getting harder. This is a curse for new people.

Jessica: And that's why we need more observability.

Michele: You need observability, but it's also something that will not be fixed only with observability. Sometimes I lay awake at night like a man in observability does and I look at the ceiling and think just how scary it could be to enter the industry as a freshman today. When I started, the world was new and Java had no collections, VMs were this exciting, exotic new topic, we could have multiple operating systems at your disposal.

Nowadays, the amount of layers you need to peel from your application to understand why something is not working is daunting. I was lucky enough to be able to digest that complexity little by little as it was deployed in front of my face. But the newcomer today? He's getting full frontal complexity, thrown into the deep end of the pool, and I'm not surprised that it's so hard to make observability as a first class requirement for their applications because there is so much else that also is, and we advocated shift left to an extent that could be maybe described as Pile-It-On-To-The-Left.

Jessica: Nice.

Martin: I like that. Pile-It-On-To-The-Left, I love it. It's going to catch on. I love what you're saying there around this idea that there is way too much complexity, but the thing I don't agree with there is that actually now where we've got so much complexity, observability being a first class citizen is something that it should be a first class citizen now because that's how you unwrap that complexity.

We talked recently about boot camps and how boot camps go through a whole process and then right at the end they teach you about observability. That's just the wrong way to do it, in my opinion. The first thing that you should be teaching people is how do you observe them, and then everything else that they do is then understanding that complexity.

Jessica: But until we make it easy, until somehow emitting really useful telemetry and immediately getting use out of that is straightforward, then it's not a shift, it's a Pile-It-On.

Michele: Also, we should not assume that we can really convey the right way of performing observability when the recipient, the person listening to us, does not have an accurate model of how things fail. So I am pretty skeptical.

Jessica: Which we'll never have a complete model of how things fail.

Michele: But there are so many types of failures, there are so many things that can go wrong. Go and explain to a new boy that although the instrument is on their applications perfectly, telemetry is not there because the process crashed before the exporter could flush.

Jessica: Or the telemetry, you're trying to emit it from the browser, and good luck.

Michele: Yeah. It's even better when we start talking about user monitoring as distributed tracing, you realize that you effectively had no means to ensure that the span has actually ended so it becomes not really a trace. It's great fun. But it's the kind of great fun that requires a nuanced understanding of the technology, so I am a bit skeptical that unless as an industry we decide that we should be more deliberate in the way we welcome complexity and we start getting a bit rigorous in using the simplest thing that works, that we will actually be able to level up and live up to the title we star ourselves with, software engineers.

I mean, let's be honest, unless you're working at a top company, if people built bridges the way we build software, nobody would cross a pond without fear of death. So I think we have some systemic issue about complexity and that observability is one of the areas where it shows.

Jessica: Right. Because what we need to teach the newcomers is not how to ignore the complexity, which many of the layers are there so that you don't have to know where it runs and we teach people to ignore it. We need to teach them to cope with it.

Michele: Maybe it's even a little further than that. In Italian we have a saying that says, "Learn the art and put it on the side." This means in context that in order to be able to make the correct engineering decisions about, for example, what database you should be using for your use case, you must have a very nuanced understanding of the use case and a very good mental model for different databases or what they are good for and what they are not, so that you can get the best one.

But then again, this is something that you can expect only with people with a significant amount of experience. I put to you in the past where they used the word seniority, it is almost devoid of any meaning nowadays. But let's say in terms of maturity in their craft.

Jessica: Or amount of trauma.

Michele: You know what? Trauma is the way that we process bad things that happen to us. I have met people that I cannot understand. They don't get traumatized with software development. They learn, they are thankful for the experience and they move on. I envy them and I do not understand them.

Martin: So I need to go on me LinkedIn profile and change from Senior Engineer to Traumatized Engineer since that's the new title we're going for.

Jessica: Yeah. Well traumatized.

Michele: Nah, the whole thing about seniority is it's a very inflectional phenomenon in the industry where you get senior into your years. I don't bloody know. After two years of profession, I should not have been left alone with a compiler?

Jessica: Yeah. So more than cope with complexity, we need to work skillfully within it. I really like what you said about learn the art and then put it on the side, because your database example.

If at some point you have a problem that requires you to get to know several different databases or several problems, even better, if you have that in your past, even though you're not a DBA, that's not the focus of your job, it colors what you're able to bring perspective to throughout your future. This is one way that we get breadth of knowledge and that that's really useful, and yet it's something that quantity of different years of experience matters a lot.

Michele: And it's not just about how much experience you had, but with whom you have had it with because if you spend tens of years in a sweatshop where people looked at everything they had was a hammer so every problem was a nail, you probably were not exposed to the kind of deep, senior expert stuff, level kind of thinking of looking at the problem long and hard, and then find the right compromise for that technology to use.

So it's as much how long you have been doing the job, but also who you had to learn from or with, and whether the environment was supportive of this really expensive process of acquiring a lot of very specialized knowledge.

Jessica: I like that you said that the expert is sitting back and thinking really hard and finding, not the solution to the problem, but the right compromise for the situation because there are not solutions to this.

Michele: But there is not always one solution. There is several. Some we come to regret.

Martin: Or somebody who comes after you learns to regret them.

Jessica: Right. What new problems does your solution engender? Yeah.

Michele: Something that may be a solution today, you wake up tomorrow and it is your worst nightmare come true, through your own labor. It's a very humbling experience.

Martin: I love those moments where you go back and look at the code and there's some profanities that are being thrown at the code, going, "Who did this?" And then all of a sudden everything goes quiet in the corner because you realized actually the other person was actually you.

Michele: I developed a habit of having Git-blame active at my editor, not because I like to disparage my coworkers but because I know I was the idiot.

Jessica: There's something wonderful about knowing that something is somehow my doing, because then I can do something about it.

Michele: At the very least they can try not to do it that way the next time and take responsibility, also the accountability, and learn.

Jessica: Yeah. I'd much rather learn something than be mad at someone else.

Michele: Yeah. It's also a wonderful feeling when you can express your complex feelings about bad software with somebody who learned it from you. It's also something that lays mostly outside your control, so try to better yourself and not others. I tend to have complex feelings about software.

Jessica: As we should. I have to ask, what are you doing at Dash0?

Michele: I am a product manager by day, engineer by night.

Jessica: And what product are you managing?

Michele: We don't have a product today. We are evaluating what we should be doing. As we have been discussing observability, it's in a very interesting state.

OpenTelemetry has to a very large extent reduced the barrier of entry for, for example, new observability tools because now having telemetry is pretty much commoditized.

I remember at Instana, we had to create our own set of tracers, interpreted in all languages and code in a consistent fashion. It was very expensive. So now we are looking at the state of the art, seeing what the promises are, and then we'll decide what to do about it. Meanwhile, we actually made a welcome present as in, "Hi, it's us. We're back."

And we are playing with OpenTelemetry and some of us have been building OpenTelemetry professionally for a bit, and for all the wonderful capabilities that are available to the OpenTelemetry collector, an agent with the complexity and amount of use cases it can solve that I've sort of seen before also in commercial use cases.

It's a bit unwieldy to configure and it's not very welcoming to newcomers to visualize for different pipelines, how they fit with each other, and the experience of converging to a correct configuration is a bit too trial and error for my taste. Especially when you need to have the connector branding of Kubernetes because otherwise some processors don't start because it's missing some specific environment variables for the Kube API authorization.

So we made this small tool that allows you to upload your configurations and validate the syntactical correctness and to some level the semantics in the browser. If you want a bit more assurance that your specific distribution of OpenTelemetry or OpenTelemetry collector will work with your configurations, we also built a backend validation where Instana is the configuration tool, the lambda function running AWS that has inside a natural collector of data to distribution, data conversion, and feeds you the configurations if it starts.

It's just a first step. To jump, we were talking about what it would take to make observability easier for newcomers. I think we, in the industry, we get for granted the amazing strides that the programing languages people could make, just because of the quality of the integrated development environments. The stuff that we have today with VSCode or IntelliJ, I mean, when I started it was already mind blasting to see Jbuilder.

Jbuilder was cool, but it doesn't hold a candle to what you can do today. Then came Eclipse. But in the world of observability and monitoring and, to some extend, cloud native development, it feels like working with Notepad and that is not helping lower the barrier of entry so we did something about it.

Martin: And where can we find that tool?

Michele: It is on OTelBin.io. You are also very welcome to host your own. It's open source with free, we will never monetize it. It's just our love letter to the OpenTelemetry community.

Jessica: It definitely expresses your expertise in OpenTelemetry and the collector.

Michele: It is an amazing piece of technology. It is not a piece of technology that was built for novices. It's for experts, by experts, and it shows.

Jessica: Yeah. You said OpenTelemetry has reduced the barrier of entry for tools, but now it's up to us to make those tools reduce the barrier of entry for developers and software creators everywhere.

Michele: I mean, there are several aspects to the impact that OpenTelemetry has had on the industry. On the one hand, there has been a crop of new startups that effectively built a UI for OpenTelemetry because it's much cheaper than having also to build all the telemetry collection. On the other hand, now that DevOps and cloud native has effectively spawned the concept of platform team, more companies have people that are dedicated to doing observability.

That is like the first people since a while that may have observability as their main requirement, and their mission is in many cases to offer an internal product built on the open source project. So that's a very nice improvement. It scares me a bit in terms of mass market because it's setting expectations that every company that is not a Mom and Pop shop should have a platform team to make wieldable the platform for internal developers.

It's a bit sad, in that it's something that the way we build technology should enable not to have effectively techno priests inside the cult and they're doing that for us. But it beats the alternative of having to debug why somebody said that they logged into someone else's account by just using access logs in the year of our Lord, 2015.

Martin: I have lots of feelings around the devestification of platforms and providers and building abstractions and all that kind of stuff. But that's probably an entirely separate podcast.

Michele: That's the kind of thing that you used to be able to work in up an ivory tower with. Yeah. I had very nuanced opinions about the way, for example, Kubernetes would be the best for the cloud. To some extent it's true, but I don't think we've done enough as an industry to build products, rather than also technologies. There is a significant gap, and a significant gap tends to be how easy it is to be successful with your technology.

Jessica: Is that why you moved from staff engineering to product management?

Michele: To some extent, yes. My move from staff engineer to product manager was empowerment in return. I realized I could have a higher impact by programing via Outlook than I could programming via VSCode. So that allowed me to scale my contribution a little higher, and during the process I acquired a love for the end user and deep empathy with what you're doing and I've never looked back.

Martin: I think that's something that we do as developers, we want to increase our impact and the idea of being a senior or being a manager, I think stems from this underlying desire to increase our impact. I think then moving into maybe product management or developer experience and all that kind of stuff, it's all about this desire to have more impact.

Michele: But there are, as always, a light side and a dark side. On the one hand, there is impact as in I'm doing good for the people using my tech. The empathic, altruistic thing of, "I do software that is helpful, do software that feels good using." The dark side of this is the impact done by doing things either because of the pleasure of doing things in creating new technology because of the interest that one has... Although if you put that technology in the wild it's a bit your responsibility.

Or the season of Perf where you have to increase your impact to create a salary. When you look, for example, at the very complex history of messaging solutions done at Google, one could argue whether impact is a healthy measure for driving productization. A lot of those messaging systems happened because somebody needed to be promoted.

Jessica: Right. Their incentives are so hard.

Michele: Yeah. But incentives are not limited to the end user. When, in order to get promoted, you need to launch something instead of landing something, instead of solving a problem, you end up with a lot of cathedrals in the desert and somebody else will be left behind to pick up your pieces.

Jessica: Right. So there's impact as measured by your performance review, and then there's impact on real people. Michele, it's been a wonderful conversation and I want to ask if there's anything else that you want to talk about before we wrap up.

Michele: I'll be around at KubeCon with my Dash0 pals.

Jessica: That's KubeCon in Paris this year?

Michele: In May in Paris.

Martin: It's going to be awesome.

Michele: Yeah. Those buy-ins get higher.

Martin: Yes, I'll be there too and we will definitely meet up and chat.

Jessica: Where else can people find you?

Michele: I am regrettably back on Twitter. I am on LinkedIn. I am on Mastodon. But I tend not to put much meaningful content in there, I prefer my software to speak for myself.

Jessica: And they can find your software at OTelBin.io.

Martin: Awesome.

Jessica: All right. We'll put the other social media links in the show notes for everyone.

Michele: Thanks, everyone.

Jessica: See you all at KubeCon, I hope.

Michele: Bye, everyone.

Content from the Library

Visit library

Jul 14, 2025

Podcast

O11ycast Ep. #84, Maddy Montaquila on .NET Aspire

In episode 84 of o11ycast, Ken Rimple and Martin Thwaites welcome Maddy Montaquila, lead PM for .NET Aspire at Microsoft. This...

Jun 26, 2025

Podcast

Generationship Ep. #38, Wayfinder with Heidi Waterhouse

In episode 38 of Generationship, Rachel Chalmers sits down with Heidi Waterhouse, co-author of "Progressive Delivery." They...

Jun 11, 2025

Podcast

O11ycast Ep. #83, Observability Isn't Just SRE on Steroids with Dan Ravenstone

In episode 83 of o11ycast, the Honeycomb team chats with Dan Ravenstone, the o11yneer. Dan unpacks the crucial, often...