APR 27, 2023

46 MIN

Ep. #60, Customer-Centric Observability with Todd Gardner and Winston Hearn

GuestsTodd Gardner, Winston Hearn

light mode

about the episode

In episode 60 of o11ycast, Jess and Martin speak with Todd Gardner of TrackJS and Winston Hearn of Honeycomb. This talk explores customer-centric observability, Request Metrics, Core Web Vitals, and insights on optimizing observability across different browsers.

about the guests

Todd Gardner is the president and co-founder of TrackJS and Request Metrics, as well as the Producer of PubConf.

Winston Hearn is a senior product manager at Honeycomb.

show notes

about the episode

about the guests

show notes

transcript

Todd Gardner: I care about frontend observability, because it is the true representation and reflection of what the users see. Everything else we do in the monitoring, observability, logging, all of those spaces, are our perspective as operations and developers of how your systems are performing. But ultimately, for a lot of times that doesn't matter.

What matters is what the user sees, and so sometimes our backend observability efforts are really accurate reflections of it and we can get a lot more details to understand. But if the user is having a bad time, it doesn't matter what our other observability tools are saying. The user had a bad time, and so we need to actually look at that as the real source of truth in how the user perceives our systems.

Jessica Kerr: Yeah. Lately when I introduce SLOs, I'm like, "We can measure the customer experience and alert when it's bad." Then I go, "Step one, approximate the customer experience by status and response time." Yeah. I kind of feel the need to apologize sometimes.

Martin Thwaites: I've never heard you apologize, ever. One of the things I've been talking about a lot is customer- centric observability, because the idea of, like you say, logs, metrics, they're all decent things. But we need to think about observability in the context of a user and that thinking about the user is way more important.

Todd: Yeah.

The user is not sitting in our clouds. They're not in our data center. They're in a different place, they're using a weird Android device on some mobile network that you've never heard of and there's a lot of things of real technical challenges that are totally invisible to us, but the user has to go through.

So what sort of networks did they hop through? What sort of environment are they using to interact with our application? Those all can cause their own set of problems.

There was a case a couple of years ago where we detected this with TrackJS. There was a browser extension called Honey which was like a coupon thing, it would detect, look up coupon codes or whatever, and they sent out a bad patch that incorrectly patched the Promise API on web browsers and it caused errors on hundreds of thousands of sites. Just broke for these bizarre reasons.

It's none of our fault, it was none of their peoples' problem that this happened but Honey was such a widespread used thing that the fact that they didn't have that visibility to know that that change happened... Their customers still had a bad time, their customers were still pissed that something didn't work.

Jessica: Right. So to solve this problem today, we have two guests. Todd, do you want to introduce yourself?

Todd: Yeah. My name is Todd Gardner. I have been working in frontend monitoring, observability and metrics for 10 years. I built a service called TrackJS, JavaScript error monitoring, which has widespread use and lately I've been working on a new service called Request Metrics which is trying to unite a lot of different concepts of frontend monitoring together to give a more holistic picture of what our end users experience.

Jessica: Okay. I really want to ask about that, but first I want Winston to say hello.

Winston Hearn: Hello.

Jessica: That was a start, that was a start.

Winston: Should I say more? Okay.

Jessica: One or two more words.

Winston: Hello, I'm Winston. I'm a product manager here at Honeycomb. I work on the API and partnerships team, but I also focus on developer experience which has come to expand into frontend observability.

Jessica: And you have a frontend developer background?

Winston: Yeah. And part of the reason I got handed this whole purview is because I spent nine years as a frontend engineer, working up from working at a Ruby On Rails dev shop, learning jQuery, also you know like, writing custom vanilla JS frameworks later in my career. Then I switched into product management, and it's funny because the switch from engineering to product management was the more senior I got in engineering, I liked architecting but I wanted an even bigger picture view.

I liked the customer focus, I liked thinking about all of the systems. Then I get introduced to Honeycomb and observability, and it's like, "Ooh, this is the engineering side of that big picture thinking, trying to take in the scope but also being able to zoom into the nuances," the things like Todd just mentioned. A browser extension shipped an update and now hundreds of thousands of websites don't work. So I think that's that big picture thinking that's brought me to this realm.

Martin: I think that's something that we miss as a backend engineer because we control so many things. We control the environments where our service runs. On the client side, you really don't.

Jessica: Yeah. We think the cloud is far away, but at least we know which kind of computer it's running on in the cloud.

Martin: I just got the Galaxy Fold.

Jessica: That's a phone?

Martin: It's a fold phone, which is a completely new form factor for a lot of people. Then I took it to the Honeycomb teams and said, "Here's what the dashboard looks like on a fold phone." I got a no in non equivocal terms about whether we're going to support that as a form factor, but this idea that there are millions of different device combinations from orientations to resolutions, to hardware.

Jessica: To browser plugins.

Martin: Yeah, and that's just mobile. Then you get onto the desktops.

Todd: Even browser runtimes, there's different codebases behind. They've all implemented the spec of the web, but there are subtle differences between how Safari and Firefox and Blink act, and those are just the big ones. There's several dozen other implementations that get smaller amounts of use.

Martin: So tell us a little bit about how you see Request Metrics solving that problem then. What is it that that's going to give the real user, front end type thing? What's the thing that's missing? What is the gap that you see?

Todd: There's lots of different tools that care about the frontend, but they're all very aligned to a particular hat that somebody would wear or a particular department in a large company. Right?

Jessica: Okay. So like Amplify feeds the marketing funnel kind of thing?

Todd: Yeah. Or Google Analytics tells you about where your data came from, or a product analytics tool like Heap might tell you about custom events. Then an error monitoring tool like TrackJS will tell you when your JavaScript blows up, and a performance tool like SpeedCurve might tell you how fast something loads.

Then there would be other tools that you would use to monitor whether or not your APIs are performing quickly. Then there's security tools like ReportURI or something like that that will tell you about when bad JavaScript makes it into your page. But these are all different things, they're all different things that operate in different ways at different ways that they price it, and different agent requirements that they put on the page.

Jessica: And more JavaScript loading in your browser.

Todd: Yeah, they're all JavaScript running in your browser, and that makes sense when you have a large organization that has all of these roles specialized. Everybody wants the tool for them. But what kind of gets overlooked is for smaller organizations that can't necessarily justify putting six agents on their website for all these different purposes, and maybe for a small cross functional team, it's some of the same people wearing these hats. They don't necessarily want to learn multiple different tools. And so there's some of that, but also that the data in one of these concepts really informs the data of the other one.

Jessica: Oh, so if you're looking at how far did a person make it into your website, how fast the page loaded matters?

Todd: Exactly.

Something that we learned with TrackJS and an earlier version of Request Metrics was there's lots of problems that happen on the web. Nobody gets zero errors on a website, everybody has lots and lots of errors.

Martin: Nobody gets zero errors ever, that just doesn't happen.

Todd: Yeah. But understanding which errors actually matter, which errors are actually important is a really hard problem to solve. The same way with performance, as the web slows down in weird ways for all kinds of reasons.

But not every slowdown is important, answering the question of how fast is fast enough is a really hard question to answer. But if we pair that data up with web and product analytics data to say, "Hey, this error happened. And for the users that had this error, their conversion rate is 50% lower."

Or their session tends to exit within 10 seconds of this thing happening. Or a correlation between slow load times and bounce rates, or something like that. By pairing all of this data together we can say, "Here are the problems that we think really matter, that are actually slowing down your user's, keeping them from accomplishing their goals."And so you can focus on those and not the noise that we all tend to deal with.

Martin: And that's really important because computers are really good at doing that comparative analysis stuff. We're good at recognizing patterns when we see them visually, but crunching through loads of data and saying, "Here's 20 users that failed," and then working out what they did before, that's not a good use of our time. My brain capacity is limited, so I'd really like for a computer to do that for me.

Todd: Yeah, computers are really good at looking at that load of data. The trouble that we've had in collectively solving these problems before is that they were all sitting in different tools with different data models, used by different people. That's kind of what we're trying to do with Request Metrics, is bring these different kinds of data, the product people together with the developers, together with the security people and use that data to further the product as a whole.

Winston: Yeah. That separation, I gave a talk that Mark was at last year where I lead it off with one of the appealing things to me of frontend observability is the context that comes with it. All of the signals being a part, and so the goofy meme I made said, "We have a bug in production." And the next panel was, "We know how to replicate it, right?" Blank stare. "We know how to replicate it, right?" Like, "Oh no." And so often it's some alert system is saying, "Problem." And that's it, that's literally all you get is, "Problem."

And then you have to opt to another tool and try to figure out if you can correlate the problem with a little bit more signal, and then dive into another tool. It's just like this big guesswork, and that is so many frameworks and frontend tooling is trying to abstract that away because they don't know how to solve it as a problem. They're trying to say, "Well, it's consistent across all things, so now you just go in and fix the bug in your component and ship it, and it's good." And it's like, "Mmmmm, that's not how the real world of the web works."

Everything is in this dynamic, weird system and there's humans involved that are clicking things in ways that you didn't know or imagined can be clicking on devices you've never seen, on outdated browsers, with browser plugins you don't know. You can't actually ever get into the simplified environment that's going to be like, "Ah, pure." Or whatever, you're always dealing with the real world of humans and how messy they are, and observability helps you pull all that context together and say, like, "Ah, here's the things that are unique about this. I wonder what of these are the important ones."

Martin: Yeah. I think one of the things I've been talking about a lot recently is that the web's evolved. Back, let's say, 10 years ago we weren't dealing with single page websites, single page applications. We were dealing with a page that loads server side, and maybe there's some interaction that happens on that page with a bit of JavaScript.

That is very different to where we are today with React SPA applications where in some cases the SPA application has way more code than your APIs have. There is way more code in the frontend than there is in the backend. However, the tooling, the what we're trying to do is basically trolling basically the developers, going, "There's a problem somewhere. See ya! Go fix it." Whereas on the backend we were like, "Okay. There's a problem. It's right on this line of code right here, go fix it." But on the frontend it's just hand wavy, there's a problem.

I don't think the tools have caught up quite yet with that idea that we need to get really in depth information about what happened before, what happened after, what caused this error to happen, what were the immediate steps before, what were the medium steps before. I think that's a big problem at the moment, the industry just needs to catch up.

Todd: Yeah, totally. You're talking about exactly the problem space that initially brought me into this whole industry with TrackJS. At that 10 years ago point was the very early days of single page applications, and I built these large applications in Angular 1.0 and Knockout and Backbone and the early web frameworks, and the thing that we learned really quickly is JavaScript really sucks a lot of the time.

We'd have all these things that would break in ways that we never would've ever predicted, and so during the course of these projects we'd figure out how to build error logging tools because they just didn't really exist. TrackJS, the idea of it was borne of the idea that, "Hey, an error by itself on the frontend isn't enough information to really understand anything that happens, and we need to capture analytics, telemetry, about what was going on in the minutes leading up to this error to really understand it."

So what were the network requests? What was happening in console? What did the user do? Those sort of things. That solved a lot of problems for a lot of people. But the step that we're taking now is that getting into this situation is beyond just those user analytics, and not every error, not every problem manifests itself as an error. And so understanding this broader picture of experience is, I think, what the next evolutionary step in frontend observability is going to be.

Martin: At the end of the day, a failed conversion is an error if you really think about it.

If you're thinking about what is an error in my system, well, if somebody fails to convert, isn't that an error? If somebody's process is too slow, is that an error? Errors aren't just HTTP requests, and that's historically where we've been.

So taking that error definition and making that, "Well, things don't work," they don't work as Jess has said previously, tried to measure the user experience and say what's good and what's bad, is hard. It's squishy. It's not a, "Here's a error code that's between 400 and 599. Therefore it's classed as an error." We don't have that really granular level to say what is an error.

Jessica: Yeah. There's degrees of error-ness, there's degrees of how much do we care, and it is in context. Todd, I'd love to hear about how you approached this with Request Metrics.

Todd: So with Request Metrics, the difference is we're just capturing a broader set of data. TrackJS captured an error and then it listened for some analytics, but it only ever sent any sort of data when an exceptional event happened, so when there was something blown up.

Request Metrics has to capture a far larger data volume because we're capturing everything all the time about sessions that don't necessarily break, but we need that baseline knowledge of what does good look like, what does a good performance experience look like, what does a good conversion experience look like? So that when something unusual happens, we have something to compare it against and be like, "Oh, this is a weird anomaly. This error or this group of errors is spiking up and that's correlated with this missing conversion event or this missing step that a lot of the users are taking."

And so the volume of data that ends up needing to be captured is much bigger, which has its own kind of problems with ingestion and querying and all of those same kind of problems that Honeycomb and every other data gathering platform has to deal with.

Jessica: Right. You can get the data, the question is can you do anything with it? And how much is it? And how much does that cost? I got to ask, is it OpenTelemetry compatible?

Todd: Not yet.

Jessica: Ooh, I like that answer. Okay, that's better than I expected.

Todd: All right, so here's our position on it. Right now what Request Metrics is, is working for us. The people who we're really trying to help the most are small to medium size businesses, the people who are like, "Look at all these different tools," and are feeling a little overwhelmed by it and they might have an analytics provider and they might have some sort of uptime monitoring thing.

But they don't really have a broad sweep, and so we're trying to help them by bringing all of this data together for that smaller business because the installation path for request metrics is, "Hey, you dropped this JavaScript on your page and now you get this broad concept of seeing what's going on." Now, this group also tends to not really know what OpenTelemetry is.

Martin: More importantly, I don't think they care. That's the thing.

Todd: Yeah. Well, their systems just aren't as complicated as the typical user of that. However, I think that's going to change and I think more kinds of organizations would get value out of layering Request Metrics alongside other tools.

Jessica: Exactly, because the frontend and the backend, they're related?

Todd: Yeah. And I think Request Metrics sits in this really unique position where we could start a trace at a much higher position than anybody else. It's not about the first API call that happens, it's not about where the trace starts with the API call. It's about what the user intent was when they came on this page. So did they come on this page from a Twitter referral with the main goal being, "I'm trying to signup for a webinar."

Or, "I'm trying to buy this product." Or trying to whatever. That's the real, core thing that is happening. Then within that, they might click something which triggers an Ajax event that calls an API that does a whole bunch of downstream things. So by listening on the client side, we have this other layer on top of what is commonly considered the trace of user intent, and what they're thinking about.

Martin: I like that, intent. I like intent as an addition to what we talk about with tracing because tracing being causality, this thing happened because of this thing. User intent is very different to components causing things to happen. I really like that narrative around capturing user intent, so the trace happened because the user wanted to do this.

Todd: Yeah. And it makes it easier to communicate with stake with non technical stakeholders. When there's an issue and it's going to take some time to fix it, it's not about saying, "Oh, hey, such and such API has such and such a technical problem and it's going to take this long to work on it." We can have a conversation that our product people might have a way better understanding of, like, "Hey, when users are trying to do this search, it's failing and we need to fix that."Because we can talk about that initial intent in terms of what a user or a product person would think about the product.

Winston: So I feel like the thorniest question in frontend observability is what's a session? So if you're saying user intent, someone lands on the page and it's going to be a product page, and they land on the page, you start a trace with the intent of shopping, and then they add to cart, and then they go through a checkout flow. Is that multiple traces in your purview? Or is that a session and a signal trace?

I'm curious how you handle that, because it just seems like that's also one of those things where analytics products have a certain way of handling this, and dev tools historically, frontend dev tools, have handled it like page load. You have the concept of a page load and what happens on a page load, so how have y'all navigated that?

Todd: Yeah. That's a great question. Being that we are straddling these worlds, these words mean different things to different people. But in that analytics space-

Jessica: Words like session?

Todd: Yeah, words like session, user session. The strongest kind of meaning around that usually comes from that analytics space. So for us, anyway, the term session is at parity with what a Google Analytics or other web analytics tool would be when it describes a session. So the user lands on your domain for the first time, the session has started.

Then as long as the user continues to interact or go to other pages on your domain, and less than, in our case, 30 minutes has passed, that session continues and everything is kind of grouped together.

If they come back a day later or two hours later, that's a new session. A new session from the same user, but it's a new session. In our tool, anyway, there's a hierarchy. There's a user at the top, we've fingerprinted the user and we know all the sessions that the user owns.

Then within a session, there's N number of page views that they might have.

Then within a page view there's Y number of events and that event could be a click, a scroll, an error, an Ajax call, an API cal, or any number of other things. We can have all kinds of properties and metadata associated with anyone of these levels that further let us describe these things and what's unique about them in our environment.

Martin: So do you think that the page load is something that triggers me at the moment around frontend observability? Because page load really was this idea of "I click a link and I go to a new page and that page then loads." When we talk about SPA applications, and this is the thing because I'm not a frontend developer, I'm a backend developer-

Jessica: Oh, did you say SPA like Single Page App?

Martin: Yes, yes. So in the setting of the single page application world, the idea of React and View and all of those tools that build these really complex applications, page load is an overloaded thing because is page thing I've loaded the app because I've hit the page and I've loaded the app? If I just do a transition between two React pages, is that a page load as well?

We're kind of mushing these paradigms together a little bit by, well, is it a transition? Is it a page transition, rather than a page load? How do we say that somebody landed on the site and that was the first page that they hit, so it was a page load? But actually, no, they transitioned through to this page. I can't get my head around that, personally.

Jessica: This is a domain driven design kind of problem, of there are domains in which these terms have a specific meaning, like Todd mentioned Request Metrics views user sessions from the analytics domain, which the canonical solution there is Google Analytics. So session there has a specific meaning. You're not going to get the same meaning between different domains, between the backend developer's perspective, the frontend developers, the users. The user certainly has a different idea of what the session is. You just got to know which one you're in.

Martin: But I want consistency, I want everything to use the same name.

Jessica: Yeah, yeah.

Martin: But I want it, I want it.

Todd: But naming things, it's one of the hardest problems in computer programming. Our names are over used and over leveraged. To your earlier point about what does page load mean, I haven't heard a consistent way of talking about that difference. I don't think there is a universally agreed upon name. I think of it as a hard page load versus a soft page load. But there's all kinds of differences.

But it is an important difference because they're different kinds of things, in a hard page reload, the JavaScript environment, the browser environment is discarded and rebuilt which means any sort of memory issues, any sort of state that existed is completely discarded and then rebuilt from any storage system that might exist. Whereas in a soft page load, all that really is a UI affordance.

Nothing was discarded, we just rebuilt part of the dom and so the user might feel like, "Oh, this is a different thing." And for our purposes we record that, hey, this is a soft page load, the user has a different thing. But there's no real objective numbers around performance to record around how fast that happened because essentially you just redrew part of the UI.

Winston: This is why every few years, Google redefines page performance.

Todd: Continues to.

Jessica: With the latest version of Google Analytics.

Winston: Well, no. With Core Web Vitals and before that it was speed curve first tested.

Martin: Wasn't lighthouse a thing?

Todd: Lighthouse is still a thing.

Winston: Lighthouse is still a thing.

Martin: Which means everybody is in charge.

Todd: I mean, Google is defacto in charge because of their largess, right? None of the things that Google defines around performance are in any way standardized or approved or anything. They just say, "Hey, the Core Web Vitals are a thing," and they're big enough that they can put it in blank and they say that your search rank is going to be dependent on it.

Jessica: Yes, they control your search rank which means they control a significant part of your business.

Martin: "It's our sandbox, so if you want to play in it, it's our rules."

Winston: Yeah. This has nothing to do with what we're talking about, but this is the whole detour of AMP, where accelerated mobile pages which anybody who worked on the frontend is impossibly very aware of just because it was a subset proprietary spec that Google created that was supposed to be fast web pages where they were like, "People are doing too many terrible things to page load. We've noticed data that says it's bad, and so we're going to create a spec that only allows for certain things to happen." And over the course of a few years, it became bloated because everybody who had leverage at larger companies was like, "Well, we need to be able to do this," and so they had to rewrite HTML in AMP.

Jessica: Wow.

Winston: It's been a mess. It's very deprecated now. But it just goes to show that when you're dealing with subjective measures, which is how humans experience things, we're often looking at live indicators of, "Over the past 30 days people have reacted negatively to something on our website, and now we must figure out what it was," and Google had the biggest set of data about that. They tried to codify that, and then of course whatever you measure becomes what people optimize for, and so they keep rewriting it.

I think for me, when I was first exposed to the observability world, what was exciting was instead of trying to come at it from a metric point of view of like, "Here are the three key metrics that show that a user is having a good experience," you can ask a holistic question of, "What are people actually encountering on our website?"

We're no longer looking through these simple data collection tools of page loaded in .5 seconds and UI showed up in 1 second. Instead we're able to think more comprehensively of, "This page wasn't really legible for a broad variety of users until 5 seconds. It seems like we should be giving them that faster, let's dig into causes and then maybe multi variables."

Now you're going to be like, "The JavaScript didn't load." It's going to be, "The CDN was having a bad day that day, we shifted a change when the component was making too many API requests sequentially before it showed up. You're just going to find much more complex reasons because you now work with a much more complex set of data.

Martin: I think context as well, this all comes down to context, doesn't it? The more context you've got, the more questions you can ask. The richer set of data, all of that kind of stuff helps you to be able to ask more insightful questions about why did something go wrong. If all you've got is, "Here's how long the first paint took, here's how long the last paint took, and here's when it became usable," you haven't really got a huge amount of information.

You've got maybe the query string, but you want tons of context to be able to say, "Well, was it the user who was in, I don't know, Norway with a french language pack running on a Galaxy Fold?"That that's the reason why they didn't convert, because nobody cares about my Galaxy Fold phone. I'm still bitter about that.

Jessica: It's almost like boiling everything down to one number isn't the best.

Winston: If only we had some hard science around that.

Jessica: Unfortunately, hard science needs one number, needs it. Todd, I'm wondering, you said that Request Metrics is like you drop one thing into your page and you get a lot of the things that otherwise you'd have to install a bunch of tools for. And, yeah, we're also talking about how every page has different considerations, every site has different considerations, and different things that are important. Are there ways for people to customize the events that go to Request Metrics to express that?

Todd: Yeah, of course. So what I meant by you drop one thing in the page and you get this level of visibility, is that rather than having to figure out how to put six different agents onto a page or if you're unable to deploy agents to your backend stuff, or your cloud, or you need to make code changes to add events to the system, that can be intimidating for some organizations.

And so what we're meaning is like, "Hey, if you're stuck in analysis paralysis of getting any kind of observability into your system because it all seems so overwhelming, maybe we can get your foot in the door or get you something by pasting this script."Then it's not necessarily perfect, but you have something. Now, obviously every site is different.

There's some core things of this is how the web works that we capture automatically, so connection types, user agent strings, what Ajax requests are going off, all these things are fixed things that I can record and capture, clicks, inputs, scrolls sort of things. But of course you can set up custom events to talk about, "Hey, here's something that's unique to me. The user starred my podcast, the user paused, the user did a checkout or changed the color of the shirt that I'm trying to sell, or whatever it is for you.

Martin: The user tried to hit the call settings button.

Todd: Yeah. Or you want to describe your events in some way. You want to say, "Hey, this is a VIP user. Here's the user's set of properties. As far as my server side session concept, here's their session properties." You could send all of that up to us as arbitrary buckets of data, which then you can filter and sort and we can look for commonalities with that as well if the basics aren't there. But what we're really trying to do here is make it so that there's as little thought to get started as possible because we don't want you to sit and overthink the perfect observability solution for you and end up implementing nothing.

Jessica: And when it becomes compatible with OpenTelemetry, it'll be even more expandable later.

Martin: I think the thing that I'm really interested in is easy mode buttons. The how do we make it really easy? It's something that we do with OpenTelemetry is-

Jessica: And also expandable.

Martin: Yes. Because you said it yourself, it's about getting started. It's about how do I get started really quickly and get an on ramp into getting some rich information? But then what we've got to try and stop doing with people is, "Right, I've got the on ramp, I've got that information right now. I need to rip that thing out and put something else in that's way more complex once I get to that stage." We want to be able to just go, "Right, we'll add a few bits of information," and ramp people up gradually and bring people on that journey of going, "Here's more context. Wouldn't it be great if you actually had this information? And this information? And this information?"

And then they get this really rich set of data and they start to really understand what their applications are doing, and that is that really Holy Grail, if you like, of how do we get people onboard with observability is about easy modes. That one line of JavaScript, just add this, include, and then this little initiate thing. Just do that and then let's see what you've got.

We've had it a lot with the backend stuff where people do that really small bit of information, that really, "Let's just ingest EngineX logs." And then they get this, "Oh, wow! Here's a load of information that I didn't realize I had." It's like, "One backend is down and I didn't realize that." But then they said, "Oh. Well, let's start adding more information and more information."

Because they get that bug, that observability bug which sounds a little bit wrong, but that buzz of going, "Oh, I've got more information. Oh, I can take the user, I can work out who the VIPs are, I can make sure that I always get all the data for the CEO because the CEO is the one that I want to keep happy." I've totally not seen somebody put the CEOs requests onto their own server so that they always get a good experience. I've totally not seen that ever, anywhere.

Todd: I think you're absolutely right. Internally we refer to that as helping developers fall into the pit of success. It's that even when they screw up, even when they do something that we don't technically support or whatever, how do we make sure that we understand what they meant and try and help them down this path? We take a little bit different, we tend to opt for simplicity versus control in that kind of balance.

Simply because we're concerned about giving people too many options upfront, it leads to that, "Oh my god, this is too big, too hard, too whatever." So we fully expect and hope people will outgrow us at some point, that, "Hey, you can start your observability story with us. We can get your foot in the door, we can give you this thing really easy and fast."

Then as you grow, as you become successful, as you specialize and deepen your system architecture, you're probably going to outgrow us, you're probably going to need to go to some more complicated tools, and that's awesome. We're happy we helped you on that journey. But for the company that is too small to have ever really even... Honestly, most of them have never even heard the word observability. For those companies, there's not a whole lot of options for them today.

Jessica: And that gives you the freedom to not try to be the Best at page load performance, and The Best at product funnel metrics, and instead make it work for a wider range of people who aren't experts.

Todd: Right, exactly.

Jessica: And wet their whistle for observability.

Todd: Yeah.

Martin: So I think the talk that Winston mentioned earlier that he gave at a conference a while back, but that was all about how measuring page load performance, and it was very specifically about Core Web Vitals and how you can use observability to get even deeper into those Core Web Vitals. Which I found really, really interesting about the... it was my first bit of information about how frontend developers actually think about these things because they'll run the page load for Core Web Vitals and they'll see what Core Web Vitals says on the Google Analytics pages and that kind of stuff.

But then not have that information, and that was a bit of a revelation for me, that people were blind. They were blind to being able to do that and I think there's the angles that you're looking at, Todd, of around the observability angles but there's even something more fundamental than that around Core Web Vitals that I think has been missed.

Winston: The wild thing to me is that for most people who are... and it's a lot of smaller orgs who are probably going to encounter this Core Web Vitals and the widely best practiced, how to go about debugging them, is to add a script that pushes your core web vital scores to Google Analytics. Just a clear signal that this is a frontend observability app, that this is a technically related metric but you track it in your analytics tools because that's the easiest way for most orgs to get access to it.

Todd: So the Core Web Vitals is where Request Metrics started. When we first started building it out, that was our initial goal, is to start capturing performance and user experience data and so we pulled that in. Everything else kind of came after that, where when we would see information about the Core Web Vitals, like somebody would see my page has a slow, largest contentful pain score, the two questions that everybody would ask was, "Does it matter?"

First, like, "So it slowed down, do I need to fix it? Who cares?" Which we couldn't really answer without diving into that product and analytics space. Then second, "Okay, it slowed down. Why? I can't recreate this in my own dev tools, I can't recreate it in Lighthouse. Why is it slow for this percentage of users?"And we couldn't really answer that either by just capturing that data.

We really needed to broaden our perspective, and so answering that Core Web Vitals is a really important set of metrics, really because Google says it is as we talked about before. But it doesn't really matter because it's still important and we still need to think about it, and not a lot of organizations are yet.

Martin: But yeah, I think it's about then, yes, Core Web Vitals is important, but what are the things that are affecting it? And I suppose I want frontend engineers to demand better and not settle. That's not just about the users that you're talking about, which are the users who they build line of business applications, they build applications because they're told to build them and they just... Well, I'd kind of like to know a little bit about the frontend but I don't really care that much.

I want all those frontend engineers to really, really care about this and say, "I want more information because that isn't good enough." And I think we did that on the backend, and people have started to get on that bandwagon of, "No, no. No, no. I need more. I need correlation, I need causation, I need high cardinality, I need high dimension."

And people are getting on that bandwagon and I want frontend engineers to start saying the same thing, to start using a tool like yours as an on ramp to go, "Oh, this is what more information looks like. Oh, I really like that. Oh, this helps me. Oh, yes, can I have more than that?" That's what I want frontend engineers to take away stuff like that and say, "I want more. How can I have more?"

Todd: Yeah, I think we're on the cusp of a big change in a lot of observability tooling, where we expect the tools to do more of the thinking for us. It's not about being able to write custom queries and get at my data and make charts and stuff like that. It's more expectations of our tools to figure out where are the anomalies, what things should I pay attention to and let me spend less time thinking about how to answer a question and more time telling me what questions are interesting. Where is there weird things happening? And how do I not spend all of my time here and more time moving my product forward?

Winston: Yeah, and I think that becomes a stepping stone into questions as a way of making sense of things. Especially in a production environment. It's like suggest questions, sieve visitors in from rich data sets, contextual data and then eventually get to the point where, yeah, you want to ask newer questions, better questions, more niche questions, more relevant to your current ticket questions. I think that's what becomes fun, is observability is a continual opening up of the possibilities there.

Jessica: Observability is a continuing opening up of the possibilities. I love that.

Martin: It's profound.

Jessica: It's a pretty good place to end a podcast.

Martin: Indeed. It's been amazing to hear about Request Metrics. I get really excited about where frontend observability is going, and all that kind of stuff so it's been amazing to talk to both of you about where you're going and what you see as the way forward for frontend Olly.

Todd: Yeah. Thank you so much for having me. I loved this conversation, it was fun.

Winston: This was a blast.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Jun 10, 2021

Podcast

Jamstack Radio Ep. #80, Educator Empathy with Alex Trost of Prismic

In episode 80 of JAMstack Radio, Brian speaks with Alex Trost of Prismic. They discuss the growing prevalence of headless CMSes,...

May 26, 2021

Podcast

The Right Track Ep. #4, Data Stewardship with Claire Armstrong of Fender

In episode 4 of The Right Track, Stef speaks with Claire Armstrong of Fender. They unpack the digital products Claire’s overseen...

Jan 29, 2021

Podcast

O11ycast Ep. #33, Information Accessibility with Katy Farmer of CircleCI

In episode 33 of o11ycast, Charity and Shelby are joined by Katy Farmer of CircleCI. They discuss learned helplessness,...