September 12, 2017
Heavybit Welcomes New Member: Greta
We’re excited to welcome our latest Heavybit member company, Greta - a company dedicated to helping users increase site performance via an...
In episode 19 of O11ycast, Liz and Charity speak with Shelby Spees of Honeycomb. They discuss Shelby’s diverse engineering background and how the software community has gradually pulled back the curtain on observability.
About the Guests
Liz Fong-Jones: So, Shelby, how'd you learn about observability? What got you really excited?
Shelby Spees: Actually, I was having coffee with one of my old managers, and he told me about Charity and told me to go and follow her on Twitter, so I went and followed her.
And I was already spending way too much time on Twitter, so all of her rants and all of her threads and amazing stories, just totally clicked with me.
I had a lot of frustrations around,
like, I want to understand my systems better
and I'm fighting against the tribal knowledge on my team,
I'm fighting against, you know, this is the way it's done.
A nd so to encounter someone who's just like, "Throw that all out, here's a whole new paradigm," it was really refreshing.
Liz: That makes a lot of sense. It kind of brings to mind this idea of you can't really fight something until you actually name what the problem is.
And that's something I've experienced a lot in my work,
in my personal life.
Giving something a name helps you talk about it in a way that you couldn't really do before.
And so, I still didn't feel like I understood observability until I read all the White Papers I could get my hands on.
But there were several months that I was like, "This seems like a direction that feels really important." And I've learned to sort of trust my gut on that sort of thing.
I have a document somewhere where I just have, like, a bunch of links to Charity's old tweets that I've been meaning to comb through.
Charity Majors: Oh dear! Sometimes I look through my old tweets and I'm like, "I said what?" Well, this feels like a good time for you to introduce yourself.
Shelby: Yeah! So I'm Shelby Spees, and I'm a developer advocate at Honeycomb. And I started last month.
Charity: Yay, welcome! Woo!
Shelby: Thanks. And I'm really excited.
Charity: And you're fairly new to being an SRE, aren't you? Like, how did you end up here?
Shelby: Yeah, so, I went back to school, I actually studied linguistics for my bachelors degree, went and taught English, moved back home, and decided to study computer science.
And I don't love school, I'll be honest.
So as soon as I had a GPA that I could put on my resume, I went and looked for internships and hustled and snuck into tech job fairs that didn't check ID.
And I told them, like, "I'm a online student,
"please hire me, I just want to work."
So I got a couple internships, and one of them turned into a full-time job at the Aerospace Corporation.
And because of, like, accidents of history and stuff,
I ended up being the person
with the strongest software background
like a year into working at this job,
two years into studying CS,
so I became owner of the build and release process,
but I didn't know what a build and release process was.
So that got me just sort of, you know, I totally dropped the ball on this job. We didn't release for, like, six months, and people were asking about the new version--
Charity: Sounds like a typical build and release project.
Shelby: Yeah, I remember feeling like I was disappointing everybody because I also just didn't have hours allocated to do this work.
Liz: They threw you into the deep end, and then they didn't necessarily give you the time and space to work on all of the existing things and do a really great job of it.
Shelby: Yeah, and I realize that I was bending over backwards keeping these projects alive, as the most junior person on the team, and the only person with this, like, vision for--
Charity: What was everyone else doing?
Shelby: I mean, like, because everyone's responsible for, like, all these different projects at the same time, and also they sort of just thought, like, because I seemed confident, I had all these ideas, I'd ask for help and they'd be like, "No you're doing a great job."
Liz: Oh, that's the worst, when you don't get feedback, right? Like, when your only feedback is, "You're doing a great job."
Shelby: And I don't think, like, I won't blame everybody, like, there's a thing in these large organizations where everyone's spread thin, everyone's, you know, doing the best they can, and also just the way projects are funded, it ended up falling on me.
So when I decided to leave that job, I was looking at all different jobs on the backend, like, full stack engineer, different things, and I was like, "I don't know how to map "my experience to these jobs."
And I saw one for an associate SRE position, I was like, "I don't even know what SRE is."
So I applied anyway, the manager seemed to like me,
talked to me on the phone, we had this great conversation,
then he went over to his internal recruiter
and said, "I want her on my team, get her into the office
as soon as possible."
I really just stumbled into it, he wanted someone with a development background more than an ops background, and he had just been promoted to manager and wanted to invest in growing the team, and really invest in juniors.
Charity: Oh, that's awesome! It's very rare in this industry.
Shelby: Yeah. I got lucky. And I think it was his first role as a manager, so it was very ambitious.
Liz: Very idealistic, at least.
Shelby: Yeah, and I really appreciate that, like, what he did for me. I learned a lot in that role, but also there was an acquisition and there were layoffs and I ended up not staying in that job for very long.
Charity: So why did you stay as an SRE?
Shelby: I wasn't planning to, because when I got laid off I was looking for work and I said, "Well, I have five months of being an SRE, I'm not qualified for any of these roles," so I was applying to all different stuff that I also didn't feel qualified for.
Charity: You hadn't yet figured out that none of us are, none of us are qualified for these roles.
Shelby: Totally! I mean, that's the other thing, like, what does it mean to be a junior SRE? Like, there's almost no such thing.
Charity: It means you're willing to figure stuff out.
Liz: Yeah, it's this interesting thing where large organizations, Google sized, have basically said, "The path to a junior SRE is that we pluck you out of a CS program."
And then, in other places it's like, you know, a junior SRE is someone who's spent a long time as a sysadmin, right?
But there's no blueprint of template to it, it's just kind of people figuring it out.
Charity: It is really a grab bag of oddballs as a profession, I say that with love.
Shelby: Totally, and that's been my experience
with all the, just, people from all different backgrounds
on the teams I've worked on.
And so, my manager at the job I ended up at was looking for someone for the DevOps team who had a stronger software development background, again.
And so both of these managers wanted someone with the dev work that I'd done to come in and improve the developer- facing tools, and sort of keep things clean and keep things better designed than what someone with a ops background might have the skill set to do.
So I was learning Linux sysadmin from scratch, but I could go and maintain the code and keep it, at least keep it sort of nicer quality.
Liz: So, kind of the impression that I'm taking away from this is that you started off kind of being a professional learner, right? Like learning everything. And eventually that path brought you towards being an SRE and learning the ops side, and also learning about observability at the same time.
Charity: Yeah, because it turns out, you didn't actually stay in SRE for very long, 'cause now you switched careers again. Welcome to your first career as a developer relations person.
Shelby: Yeah. And that was something I was nervous about in this job change, is, like, I'm a perpetual beginner almost.
And it gets a little bit frustrating, every new job I've taken has been such a steep learning curve, but I'm taking solace in the fact that because of all my experience with learning, I can smooth out that learning curve for other people.
Charity: Yeah. Totally. This is what I've really enjoyed about, like, your blog posts and stuff already.
Those people who are listening who haven't read Shelby's blog posts on the Honeycomb blog should go and read them, they're so empathetic and they're so, like, you're clearly, you're comfortable in the role of beginner to the point where you don't freeze up, right?
A lot of people get terrified and overwhelmed
when they don't know things.
And you're just, you feel very comfortable there, it's clear, you're just, you're led by your curiosity and you figure things out step-by-step and then you're able to, like, share your knowledge with people, instead of getting, you know, turned off or afraid by the vast quantities of unknown information.
Shelby: Yeah, and that process wasn't easy either.
On a blog, like, five years ago I wrote a post about, like, stop trying to be smart and embrace being a learner, because it's so easy to feel that gut punch when you don't know something and it shatters that, like, image of yourself.
So I had to, like, go through that whole, like, ego destruction to be able to have a career like this.
Liz: So, what's really surprised you the most in terms of helping people on board into understanding observability?
And what are the steps in that journey that you went through, and now you're seeing other people go through?
Shelby: It's been actually really interesting talking to Honeycomb users and talking to the observability community overall because it's a new topic for basically everyone in the software industry.
So, we're all at different stages of the observability, like, learning process.
And, not only that, it's not linear,
and that was the surprising part for me,
and it really clicked the other day,
I was talking to someone who had instrumented
their application and they had never used this one feature,
but they had this really sophisticated
dynamic tracing going on,
and they spent all this time on it.
So it's like there's all these different axes that you can go down to understand, like, what observability is and how to incorporate it into your work.
Charity: Yeah, for sure.
It's like that quote from Tolstoy, or whatever,
you know, "Every unhappy family is unhappy
in its each unique way."
Every app is unhappy in its own special way.
And that's where understanding comes in, like, you can't just, like, you know, as much as vendors try to simplify and just be like, "Do this," you know, "Install this library, whatever, you're done," like, anything that's that simple is going to be simplistic and it's going to be a crapshoot.
Is it actually going to solve your problem or not?
Like, at the end of the day, someone somewhere is going to have to understand this code. And the sooner you embrace that, and you build with that expectation, the better off everyone who has to understand that code is going to be.
Liz: That reminds me a lot of the conversation around resilience, right?
Like, the idea of we need to embrace the idea that things are going to break and think about how we're going to fix it, and, you know, preventing things from breaking.
Charity: You know, abstractions are great, we couldn't get through the day without, you know, these abstractions, these simplifications, you know, but we can't confuse them for the real thing.
And I think that a lot of people get into trouble when they confuse the two.
Shelby: This kind of parallels what we've learned from accelerate and the door reports, that you want to focus on what your business cares about, and then outsource for the things that's not your core business--
Charity: Otherwise you're going to be distracted.
And, ironically, like, you're accepting them as abstractions, right?
Anything that you're outsourcing to another company, you're like, "Okay, this is the abstraction. I don't have to know how it works."
But whatever is your core business, you have to know how it works, and you need the tools that will help you understand how it works.
Not that we'll tell you that you don't have to understand how it works.
Shelby: Absolutely, and so, the point of observability is to get a better understanding of that core business logic, what you actually care about.
Shelby: And what your users care about.
Liz: So, Shelby, how did you get to the point of being able to understand that yourself and advocate for it on your own teams, kind of what was that moment for you that really helped?
Shelby: I remember, I read the Observability Maturity Report that you and Charity wrote last year, and I remember seeing, like, that first section on clean code and being like, "Well, you know, we don't have that!"
And it felt a little bit like you must be this tall to ride.
And then later when we started our Honeycomb trial I was working on cleaning up our deploy tool that's, like, 12 years old, and, you know, super old rigid rails, and it had a bunch of endpoints that hadn't been used, and I was just instrumenting it in Dev and just seeing, like, what the code paths look like to understand what the code's actually doing, and I remember being like, "Oh!"
Liz: Yeah it's this idea of, like, if you can't refactor what you don't understand.
Shelby: Yes, absolutely, and so that's where
I felt like, "Oh, okay, you don't have to have,
to get value out of instrumenting the code that you have."
You can get answers from wherever you're at, and then iterate from there.
Charity: I think of it like a headlamp.
Like you put on a headlamp,
like, you don't start instrumenting,
And this is a mistake that a lot of people make when they're starting with observability, if they're like, "Oh, I don't want to touch, you know, the stuff that's broken. I want to go instrument something that's safe and out of the way, and working."
And that's exactly the opposite approach that you should take, because you should look at the thing that's the biggest painful thing in your day, right?
The thing that's sucking up your time and energy, the thing that's breaking all the time, that's what you need visibility into, you know, you need to put the headlamp up so you can see what you're doing with your hands in front of your face.
Right? And then as you move around, you know, to other parts of the system, you know, fix what you're working on, you go to some other part, you instrument as you go, right?
You instrument two feet in front of yourself so that you can see what you're doing. 'Cause the fact is that you move so much faster if you start by instrumenting, if you start by making it so you can see what you're doing.
I think that this is a story that we don't necessarily
do a great job of telling, because we tell a lot of the,
like, you know, smash success stories, right?
But not necessarily the, like, "I did this tiny thing, "and it made this, "you know, incremental difference in my life." People are always looking for that kind of shiny.
Shelby: My colleague at my last job who is basically,
like, the main Honeycomb user and the Honeycomb
and observability evangelist on the team,
and he's been talking about,
before you knew the word for observability,
he was talking about being able to ask those novel questions
and be able to look back at history and stuff.
And so I introduced him to the White Papers I was reading, he was like, "Oh, yeah, that's exactly what I've "been talking about!"
And so he really embraced that. But he definitely felt that, like, fear about, you know, "I'm just going to instrument the code that I've written to be instrumentable, and I'm not going to bother sending data from our legacy systems that we were trying to replace anyway."
Charity: You know, sometimes it's worth doing, sometimes it's not.
Like, only you know your systems, but, like, the Stripe Developer Report showed that, like, over 40% of most developers time is spent on bullshit, stuff that doesn't move the business forward, stuff that, it's the work you do so that you can get to the work that you need to do.
It's the stuff that, you know, you're trying to orient yourself, you're trying to figure out where the thing is, you're fixing something and then you realize it was the wrong thing, like, half of our time as engineers, like, that is an enormous amount of time and money just wasted, and I think it contributes to, like, so much of the burnout that people feel.
Liz: It really is so weird that people dread bugs rather than looking forward to the challenge that a bug faces, right? Like--
Charity: Well, and it's because they associate it with so much frustration, you know?
Just like trying to figure out what to fix in the first place!
And it's because in my mind, like, I just picture them debugging with their eyes closed, and they're in there with a blind fold on with their hands tied behind their backs. 'Cause that's what it feels like!
Shelby: Yeah. And that's exactly what my colleague was experiencing, where they spent three weeks trying to make sense of this bug, and then he spent an afternoon adding a bunch of spans in Honeycomb from just that service for just that part of the logic, and debugged it in, like, a day.
Charity: Most of these problems are not hard problems, it's you can't see what's going on, so you're trying to build this castle in your mind.
The funniest thing in the world to me is when I see people try to debug code by reading it.
You know, when they're like, "There's a problem here," and they just sit there intently looking at the source code.
And you're just like, "Do you think that that's reality? Do you think that what it says it's doing is what it's actually doing?" Like, it just breaks my brain!
Shelby: And I mean, we're sort of coming up against teams where they don't have the support, they don't have the leadership or management behind, like, you should have tests, and you should have builds that aren't broken, and--
Charity: Or worst yet, it's not even necessarily
leadership or management support,
it's that they don't know any better.
They haven't seen how much better the world can be if you embrace these things, they see it as yet another layer of process, or a layer of bureaucratic bullshit or something, they don't understand that the reason people encourage them to do these things is because their life will be better, because it will take less time to do this stuff, 'cause it will be easier.
Liz: I think the other challenge is that these problems
sneak up upon you, because in the olden days
where you could attach a debugger to a system,
it absolutely made sense to debug one system.
But as your systems grew and split apart it just makes it harder where those tools stop working at a certain point, and identifying when you've crossed over that is kind of challenging for some people.
And it's almost like a leap, or like a hurdle, to jump where you have to be confident in your individual components being correct before you can worry about the interactions between your components being correct. Except you don't. You can get the former from observability.
Liz: Yeah, it's kind of this idea of debugging from the first principles, right?
Like the idea of, "If I start from symptoms of user pain
and I can trace it down through my system,
I can figure out which components are broken.
Or which interactions are broken."
As opposed to startlingly verifying the correctness of each component, right? Like, that mentality shift is hard for people to adjust to sometimes.
Shelby: What I find really exciting about being able to observe systems as I instrumented them, is I was jumping into code bases that I had never worked on.
I instrumented in, like, two or three days of,
you know, my spare time, Python code base
that I had never touched.
And then, you know, I sat and I was reading the traces and I'm like, "Oh, okay, I get how this works now, "that talks to that!" And so it's even useful as a learning tool.
Charity: Yeah, it's the only way to understand what your systems are doing. Like, in my mind--
Liz: Yeah, kind of any pre-drawn system diagram gets out of date, right? Like, you have to actually understand based on--
Charity: It's out of date, or it's wrong,
or it represents some assumptions
that the person was drawing it
made and didn't realize were assumptions.
You know, the only thing you can trust is what your instrumentation actually says.
You know, I feel like we went through this huge push in the software engineering community to, like, get people to "comment their code," you know?
And to write docs and all this stuff, and I'm still waiting for the same sort of energy to get put behind "instrument your shit," you know?
Like, instrument it, and, like, I feel like we're so behind where we ought to be as an industry when it comes to instrumentation because--
And I blame the vendors for a lot of this, you know, because vendors have been out there saying, you know, "Ah, just give us tens of billions of dollars and you'll never have to understand your code, we do all the instrumentation for you! You'll never have to think about it again."
Which was a lovely fairytale. But it turns out to be false! And now, we're like a decade behind where we ought to be in terms of, like, having--
Liz: And people are selling you, in addition to ingesting all this data that you don't need--
Let's sell you an AIOps tool to help you analyze all the data. It's like a self-generating problem.
Charity: Oh my God.
'Cause now you're admitting so much of the automatically
generated data that, like, is useless,
so you need AI to, like, sift through it for you?
I mean, come on! No, you need to actually understand your shit, you need to debug, you need to like instrument.
But, like, we don't have these brightly shared conventions
for what it looks like when something's
been instrumented well, you know?
People don't have that gut instinct, they don't have that intuition that comes from having done it for years and years.
You know, and I'm kind of resentful about that!
Liz: But then that also creates the problem, thought, right?
Like, it feels like an insurmountable hurdle, right?
If all we have is examples of it being done horribly, or examples of it being done well, right?
Like, where are the examples in the middle?
Shelby: There's a couple things here that you reminded me of, one of them is that when we're coming from metrics, and we're coming from, like, you get one axis of information, that doesn't tell you much, and so, people are scared to instrument things if they don't know if it's going to be useful.
Liz: If they don't know if it's going to be useful, or if they think it's going to inflate their bill because they're charged by the distinct time series.
Shelby: Yeah, and so people are scared of high cardinality, people are scared of just having things instrumented all the time, even though that's what's going to give you the answers.
Charity: You know, I talk a lot about instinct, but, like, I feel like we have these very emotional ways of reacting to building our systems based on scar tissue, you know, based on things we've experienced in the past.
And so, like, a lot of the reactions that we have to new things that, they're not really based on any reasoned argument, it's just like, you know?
So, like, I feel like a lot of the reactions that people bring to the whole topic of instrumentation come from their time writing logs.
Shelby: I wonder if that's why observability clicked for me in a way that it might not have for more experienced people, is because I was still very early in, like, you know, this is how to Linux sysadmin--
Charity: I think that there's real truth to that, I think that people who are early in their career, this clicks for them much quicker, much easier, than people who have spent a decade or two or three, like, wrestling with time series, and, like, log lines, and, like, you learn all of these quote/unquote things about instrumentation that aren't true, or shouldn't be true, or don't need to be true.
And we've kind of like wiped the slate clean
in a bit of a way, because we're not really derived
from the heritage of, you know, the last few decades
of time series.
It's in my mind, not just my mind, it is objectively an easier, simpler, more truthful way of interacting with your systems.
But it is not intuitive to people who've spent a career, you know, learning these other--
Liz: Working around the limitations of other systems, right? So I think that if you don't have to, and learn those things, you have a lot easier of a time progressing with observability.
And I think also, one thing that's really exciting for me about observability versus these tools that promise to, like, give you the answers, is, it's actually really exciting and empowering for engineers to take ownership of what their code is doing in production.
And that's something that really clicked with me in all of your tweets, Charity, especially, is that, you know, see what's happening with real users and real data, and, you know, real systems, not just, like, the toy version, or some, like, singular metric that we're tracking.
And I've seen it, I experienced it just with the handful of things I instrumented at my last job, and I've seen it with my colleagues, and with users.
Charity: This is where the whole socio-technical thing becomes so interesting, because I feel like we talk a lot about empathy, right?
Developers should have empathy with their users, blah, blah, blah.
But if that empathy is being mediated by tools that aren't rich enough, like, how much empathy can you have with your users if you have, like, these five metrics that you're allowed to track?
Or if you only really interact with them, you know, you don't interact with them using your tools at all, you only interact with staging, and then there's a hop, and everything you hear from your users has been mediated or, like, triaged by an ops team, or by support, or whatever.
Like, I feel like that connective tissue of empathy between you and the users becomes so diluted, and so, you know, it's been transformed by, you know, all of these stages before it gets to you. And, like, the tooling matters, right?
Like, I feel like it's been great in the last few years in this issue we've really started talking about the impact of culture. But the impact of tools on culture is, I think, something that we haven't even really began to develop a language for. Like the ways that tools that you're using and the data that you consume, the way it distorts or elevates, or enriches the actual social connections.
Liz: It's kind of similar to Conway's law, right? Like, that if you have an organization, your organization will build software that matches the organization.
Liz: And I think that the same thing is true for our viewpoints about how we interact with our users, and if we interact with our users as individual data points in a time series, rather than as, kind of, rich user journeys, then everything we do is going to be tainted by that.
Shelby: Yeah, and the same thing with, you know, having a machine learning model between you and the user, is it's a layer between the feedback loop, and, you know, your model's only going to be as good as your data, and your instrumentation generating the data, you can't iterate on that if you're not understanding the output.
Charity: But AI will fix it all.
Shelby: Oh, surely. I actually have a question for both of you, especially coming from the, like, Charity and Liz fan club, and now, you know, being able to work together.
My question is, like, within the observability community, and SRE community, and even like the software resilience community, you're sort of these pillars and these leaders on all, like, what is observability and how do you improve your systems.
And so, how's it feel to, like, be paving the way for observability in software?
Charity: Well, it's pretty easy to stay humble, because if you ask literally any of our competitors, they're the leaders.
And they have orders of magnitude, more money and people than we do.
I am so glad that what we say resonates with so many people,
I feel very grateful that I'm not edited,
I can say whatever I think or feel and it's nice.
Like, there's nobody at this stage who's going to be like, you know, glaring at me and saying that I'm misrepresenting something. I don't know.
Liz: I think the thing that I have sensed the most profoundly is that, like, Charity has been out ahead of the pack for a very long time in kind of being that lone voice in the wilderness.
And therefore, like, you know, on one hand it's super empowering to, like, kind of lead the charge forward.
At the same time there are a lot of people who are really resistant to that, right?
And that kind of opposition, you know, results in a lot of, you know, "Oh Charity's, like, off on her own, like, no one understands what she's saying," right?
And that's not true, right?
But I think that kind of finding a community
and coalescing this community around observability
is really, really important to have it
not be a singular voice, right?
Instead, being a community that builds up broader, kind of, stories out of everyone's experience.
Charity: There is such an amazing community that's forming around observability, like, the folks in the Pollinators channel, all of the people who are starting to write.
And, like, it's been so cool, like, watching people build careers out of starting to teach other people about this stuff.
Liz: And you, our listeners, we really appreciate that you are here listening to this podcast, you're a part of our community.
Charity: Really appreciate you.
There's so much energy here, like, a few years ago when we started ranting about this stuff, like, it was considered very controversial that software engineers might ever be put on-call to support their own code, right?
Liz: Because everyone was used to just, you know, on-call sucks, therefore we're just going to push all the on-call onto the off people, right?
Charity: Yes! Yes!
Liz: And it's like, no!
Charity: Because they're lower on the totem pole they have less power.
And like, I love that, like, that curve has bent, and now, like, that's not even a question.
Of course everybody expects that software engineers are to be on-call for their stuff, and we've started to, like, lift the low, you know, the flip side of that is that it's not allowed to suck, right?
It can't be something you have to plan your life around.
Watching people, like, get increasingly empowered
to own their stuff, and figuring out that
that is not a responsibility that has to suck,
but it can be something that brings
deep meaning to your life.
That brings you much more closely in connection with the people who are impacted by your work, that has been, just, fucking unbelievable.
I'm really, really stoked to have had a front row seat to that part of our history.
Liz: Yeah, on my part, kind of, I wasn't originally
focused on observability, I was focused on SRE first.
And I still very much consider myself an SRE, right?
Like, I think that my role here is to help make the SRE practices more accessible to people who aren't doodlers, to people who don't have fancy degrees, right?
And to bring those improvements to quality of life to every software engineer, and to every ops person.
That, I think, is kind of my mission.
And it's been really, really great for me to get outside
of the Google bubble and kind of figure out what works,
what doesn't work in that context.
So, to kind of round things out, we're really excited that Shelby is joining us as a co-host of O11ycast.
Liz: We're really excited to be doing O11ycast more often as well.
Shelby: I'm really excited, it's such an honor to join, and I hope that our discussions today and going forward will help a lot of people really grock observability.