Ep. #58, Game Development with Brenna Moore of Second Dinner
about the episode
about the guests
Brenna Moore: I think like a lot of probably intro game studios or startup game studios, we had some folks who were great at writing game code and weren't infrastructure engineers so we wanted to solve the problems we were good at solving and ignore, or at least foist off on other people, the problems that we weren't good at solving.
Jessica Kerr: So AWS is a good bunch of people to foist problems onto.
Brenna: I mean, if you're going to build an infrastructure product that let's people write code and deploy it and not have to worry about where it's running or how it got there quite as much, then yeah.
Liz Fong-Jones: So when we talk about doing game code and specializing in the things that makes that best, what are some unique things about game developers as opposed to other kinds of software developers?
Brenna: Depending on the part of the game stack--
I think there's a lot of undeserved mystique that the game industry has around things that are required to be a game developer. If I redescribed a lot of the stack as we have a client that's full of art assets which uses HTTP and web sockets to connect to a backend and make database calls, I could be describing basically anything in the tech industry.
But for whatever reason, if you say that it's a game client there's this mysticism that shows up about how that came to be and, to be fair, I think any section in the tech industry has things that are mysterious or complicated. I don't know a lot of how some of the DBAs get extreme performance out of queries that I just cannot make work, and I would consider that just as mystical as how game designers develop things that are fun as opposed to tedious.
Jessica: Yeah. That is a standard I hold games to and not most apps.
Liz: I would argue that it's not just about fun and tedium, right? But I think there's also this element of continuity in long-lived sessions that is not necessarily true in other places. If I go and instantiate shopping cart, that's just one transaction, maybe the state lives in a cookie or something. So it feels like there is something that's a little bit challenging about the kind of state involved in games.
Jessica: Especially in serverless.
Brenna: That's true. If we're going to talk about the serverless aspect, then yeah, having to refresh state in memory and then use it and then abandon it, it definitely gets into... You develop a regular pattern, make a request to the database, make the mutation, store it, and it's pretty similar to how you might write any code that's involved in handling a one off request.
But you have to get that state from... It's guaranteed to be a cache miss in memory, right? You have to go to the database to get that, so the form of writing it is actually pretty similar to what I'm used to seeing in virtual machine or containers or anything that we would write as a full application, I guess, beforehand. But now it's just that we're making a database request instead of just getting something out of memory.
Liz: That's interesting. So essentially what you're saying is that in the past people have used persistence in memory in the games industry and not had to shovel it off to a database, whereas for you you're just always flushing it and always hydrating it.
Brenna: Yeah. Memory is obviously ones of the most performant things that we get to use as a tool, but it's not durable so we make a lot of protections to, "Oh, if we lose this, we have to be able to get it back or the player loses some kind of state." We've already written it, it's definitely there. We just have to get it again. We're definitely trading storage and compute around for how fast things need to be and not every game is going to be able to take advantage of that architecture.
Jessica: Ooh, ooh. So what games are taking advantage of that architecture then?
Brenna: We make a game called Marvel Snap which is a turn based card game. So a turn based genre, especially mobile where you're already expected to have a little bit of latency through mobile clients and the internet. We get to take advantage of a lot of things that have this built in latency tolerance. We wouldn't be able to do this on a first person shooter or a platformer that we had to run physics on, for some reason, so the design definitely lends itself to this whereas others certainly wouldn't.
Liz: That makes a lot of sense, that architecture is very context aware, that the business requirements drive what you can choose to do and the trade offs that you can make.
Brenna: Yeah. I think that's true of anything. We go through those decisions for any technology that we pick, regardless of our niche in the industry.
Jessica: Is Marvel Snap multiplayer? Are you matched with someone on the internet?
Brenna: Yeah. Marvel Snap is, I guess, PVP, 1-V-1 card battling game and so you queue up with a deck of cards and match make into some other person somewhere on the planet who is also doing the same thing, and play a match together. Matches take about three minutes, we use a persistent web socket connection to keep the two of you connected to the same service for that duration and as the two of you play cards, then those things get reconciled on the service and we keep that game state for as long as we need it.
Jessica: The same service?
Brenna: So since we were talking about serverless earlier, that web socket connection is actually going to AWS API Gateway which is holding that connection open for us.
Liz: Ah, I see. So it's not that you have one lambda that's running for three minutes, it's that the API gateway will call your lambdas if the players take an action.
Brenna: Yeah, exactly. And that allows us to ignore any time that people spend idle and only invoke our stuff whenever someone actually sends a message.
Liz: I definitely see now why we're talking about outsourcing things to AWS. When I originally heard that you were using a serverless architecture, I thought you were just using Lambda but now it makes clear that you're using other AWS primitives too, like the API Gateway, like the database. All these things play a role, it's just that the thing that your developers interact with is the lambda functions themselves?
Brenna: Yeah. Our backend engineers spend most of their time writing code that gets executed in lambda and the rest of us that are around that on cloud and infrastructure focuses make sure that everything else is in place for communication and storage and things like that.
Jessica: So how do you know it's working?
Brenna: Arguably that Twitter and Discord aren't yelling at us.
Jessica: The real metric.
Brenna: Yeah. Before either of those things happen, we also keep records of things like, a lot of the same things that anybody running another application would keep track of.
We keep track of function invocations and duration and input parameters and output results and exceptions thrown. It's really not that different of an approach from what you might find or need on a VM or in a container.
It's just that we're paying less attention to things like CPU and memory and infrastructure concerns. We actually get to focus on application metrics instead of other things.
Liz: And how do those application metrics map to user outcomes? So just to maybe clarify for listeners, if you see the latency of one of your lambda functions going up, that means that people are not getting their moves confirmed, right? If they press a button in the UI and it's unresponsive?
Brenna: Yeah. Or that it appears to take a little bit longer for cards to resolve or things to happen on screen. That's, yeah, absolutely true. So obviously there's a little bit of time in public internet that we're not recording, at least from the backend perspective. So we always have to kind of be aware of that, but yeah, in general as latency goes up on the calls, the player experience is a little worse or a little degraded. And so keeping track of overall duration for a request or a type of request, or a series of requests is pretty indicative of what the player experience is for playing a game or exploring the shop.
Jessica: When you say "a series of requests," do you mean across the session or lambdas calling lambdas calling lambdas?
Brenna: We've shied away from lambdas calling lambdas calling lambdas and we tend to have everything resolve in a single handler if possible. It's certainly a thing you can do, but we kind of made the decision pretty early on that we didn't want compute invoking compute so that we weren't paying for the same time twice. If we're going to spend the time, we can do it in the same invocation rather than having two invocations.
Liz: That makes a lot of sense that it's a little bit different from the other microservice patterns that you might see where, with microservices that are persistently running, you want to specialize each microservice in one thing, you want to have them call each other. Whereas there is overhead in lambdas, so that totally makes sense that you want to have each lambda function fulfill one type of request, rather than branching out, indexing on the functionality that's being invoked.
Brenna: Yeah. I would say that we treat things really similar to a microservice architecture as far as our code is laid out, and so we have things dedicated to handling the game or matchmaking or account or your shop state or your collection and decks and stuff. We deploy those things pretty independently so we have a collection of lambdas that are related to game and we have a collection of lambdas that are related to matchmaking, we have a collection of lambdas that are related to progression. So it feels very much like a microservice approach, it's just that each maybe API endpoint is going to a particular handler rather than a whole service worth of things.
Liz: Right. And those handlers are empowered to do anything that they need, right? They don't need to talk to a separate database management service, certainly I've seen patterns where people are decomposing a monolith into a bunch of microservices where every read and write to the shared database has to go through one service, right?
Brenna: Yeah. In our case we keep separate tables so there's a table for games, there's a table for matchmaking, there's a table for your collection, shop, progression and things like that. And so those services are empowered to talk to the table that's relevant to them, and there's other AWS primitives for cross service communication, things like SNS Topics and SQ SQs that we can use because they're event driven and so we're not waiting on one compute to finish while we're... We're not sitting there waiting for a message to come back or directly talking to another thing. We're just sending a message off and then eventually a message comes back.
Liz: Oh right. That makes sense because if you finish a game, then the updating your score, your global score updating your achievements, that doesn't have to happen synchronously. That totally makes sense, that the event driven architecture and serverless goes so well together for your use case.
Brenna: Yeah, exactly.
Jessica: And also that it would be something of an anti pattern to have lambdas polling lambdas synchronously, because then you're paying for the compute on both of them at the same time. Not just the spin up time, but actually the waiting time even. And so it sounds like you have a lot of bounded contexts for the different sub domains, and at the same time each handler can do what it needs to do to handle that entire request.
Brenna: I think that's a pretty fair description.
Liz: So in this context where we're observing this or trying to understand how it's working, does your architecture actually require tracing or does it just require flat wide events? How are you thinking about getting the telemetry out that you need?
Brenna: So we've actually done a couple of different approaches to this, and we started with something that was very close to tracing where every function invocation had the duration and the name and the input arguments and the output arguments, and if it threw an exception or had an error and this created very, I guess, narrow and deep traces. So because we were actually tracing by .Net function call rather than by service to service function call-
Jessica: Not like the lambda function?
Jessica: But function in your code?
Brenna: Yeah. So a lot of the same patterns that you would use to say if one handler is calling another, or if one service is calling another, we just apply that at the code layer instead of the service layer. So when one .Net function invokes another .Net function in a lambda, then we want a lot of that same information so that we know where time is being spent.
Jessica: So you're approaching profiling?
Brenna: Yeah. We ended up getting very close to or basically implementing what effectively was profiling, and that gave us a lot of events that didn't have a ton of data. But for things like tuning lambda runtime was super useful to know how things were running in this environment, that it's otherwise very difficult to get information out of and so we turn on things in that mode when we're exploring new features or we're making changes that we think are going to affect a lot of things, or if we're using something that's very outside of what we've come to think of as standard.
Jessica: Wow. So you have a mode where you can get really deep traces with a whole lot of events but you can get super granular information on where time is spent?
Brenna: And what we typically run in production not near a deploy is something that's a lot more similar to just one single, very wide top level event where we're looking at this is the function that was invoked, this is the time spent. We've effectively implemented a way to bubble up attributes from lower level traces to make them appear on this top level root trace so that things that are super important that are normally captured maybe on one of those really low level, very specific function traces actually bubble all the way up to the root span so that we can get them when we're not in a full profiling mode.
Liz: Oh, interesting. So you're kind of dynamically calling the visibility. What does that look like from an API perspective? Are you using Open Telemetry? Are you basically just not creating the trace spans and then it just parents to the current root context? How does that all work under the hood?
Brenna: We are using Open Telemetry and with Open Telemetry's tie into .Net's system .Activity. Yeah, we're dynamically creating the activities and so if an activity is created, obviously it's going to be anything is reporting to that scope. But .Net code normally allows you to access that thing with a pattern of activity.current which is whatever the current activity scope is. And if it's your local scope, then great, you're reporting for something that's very close to profiling. But if it happens to be a scope that was generated three, four function calls above you then what you end you, you and all of your sibling functions just end up making a very wide event from the top level.
Liz: That's really, really powerful. Ordinarily I'd be like if you're just making these wide events, you don't have to use the heavyweight activity tracer method. But being able to share that instrumentation code is so, so critical for not having to rewrite. I'd imagine it introduced some challenges though around naming of things because the names can potentially collide, the runtime control of it. How are you managing schema and handling the toggling?
Brenna: Yeah. Conveniently there's some things that don't really collide. There's things that are pretty similar. For example, account ID. It's pretty much always only ever going to be account ID. There's generally one actor because of how everything else is implemented so for a certain set of things we only need to worry about... There's not really any duplication in those tags.
For things where there is some kind of duplication, it does require a little bit of manual effort to make sure that we're either specifically creating a differently named attribute, so maybe opponent instead of account, which implies the other person in the game.
Or that we're reporting in a data structure that scales a little bit, so maybe we're not indexing off of that once we're looking at it but maybe it's just useful information so we could potentially report it as an array or a list. It might be slightly harder to query against, but we get the information if we're querying against something else and we just need to see this field once we find it.
Liz: Got it. So kind of the prefixing approach and searching your code to make sure that you're not overwriting someone else's attribute.
Brenna: Yeah. For things that are very function specific, the names of those attributes are usually defined locally and so that string is available for whatever the attribute name is within that function. For things that are more generic, things like account ID, game ID where we always want that attribute to be named of the same thing, we just declare those in a shared library outside of each of the services that all of the services reference so that it's easy to look for references to that thing because you can look for references to the property as opposed to just searching for the times that someone wrote ""Game ID"".
Jessica: Both useful. So you have some shared semantic conventions for specific fields that you know have meaning in your context and also you let functions name their own fields?
Liz: It's really great that you have a shared library for that. I assume that's the difference between your team versus the backend developers teams, is that you focus on those shared common libraries and let them focus on the business logic?
Brenna: We both end up sharing responsibilities for some of those things sometimes. But you're correct, in that we on the infrastructure side get to get to focus on patterns and practices, and making the other devs lives' as simple as possible when it comes to interacting with any service dependencies, if they're AWS or Honeycomb.
Liz: I think I asked earlier about what the pattern is for controlling the verbosity. Is it above build or is there some runtime mechanism?
Brenna: There's a configuration that we can deploy with so it's not build specific, it is runtime. In the same way that a regular sampling would elect to sample or not, we have two knobs to turn for sampling. We have one nob to turn for how often root is sampled, so the top level span, and a second knob to turn for how often child spans are sampled. So setting both of them to one gets you all the way to profiling.
Setting root spans to one and child spans to zero gets you wide, top level events with no children, and some mix of those two gets you not always super relevant data because you're potentially sampling out things that are really important. So I guess one thing I didn't mention was that we also implemented a way for when exceptions or errors are caught at a particular level, we can turn on capture for that specific trace.
So between wherever the error happened and the root will come back as actual spans instead of ruling everything out. Which means that for errors we automatically get the detailed view, but for regular day to day operation if nothing is going on, then we just get single wide top level spans.
Liz: Whoa, that's so cool.
Jessica: Yeah. So that way when people do start complaining on Twitter and Discord, you have stuff to look at?
Brenna: Yeah, and we don't have to retroactively go scramble to turn on a particular sampling ratio and try and get data.
Jessica: Yeah. That's different from the till sampling that I'm used to because on that you'd either send the child spans or you wouldn't, but here if the child spans aren't being sent, you're getting all those fields rolled up to the root span. So the root spans that you're keeping alone have a lot more data than if the child spans were created and then discarded later?
Brenna: Yeah. It's actually still till based sampling, but it's happening basically on near, I guess, export time.
Jessica: In process in the lambda?
Brenna: Well, it's no TLP processing. That's run when the export happens and so we can evaluate if any of the things in that trace could be thrown out or merged up, and if anything says, "I'm super important. I was an error. Do not throw me out," then we keep it.
Liz: That makes a lot of sense, this is something that's unique to your architecture because everything is mono process, right? Each request flows through exactly on lambda, that lambda invokes multiple .Net functions but at the export time you can do that control. Whereas if it had already gone through another process, you wouldn't be able to communicate that state so that's where you differ from a lot of our other customers.
Brenna: Exactly. And there's things that we missed out on in that we don't really use spans to share, to make a very large trace across multiple invocations so we're not doing something like using the game ID as a span so that we can search everything for that game ID. But we do record an attribute for the game ID so we can still search everything for that game ID. Yeah, it's very much a byproduct of how we've written the lambda handlers.
Jessica: Nice. So when you have the asynchronous events going in a queue and getting picked up later, how do you track the connection between the lambda that's started that event and the one that ran it?
Brenna: In general, any time a lambda gets triggered, at the beginning of whatever that path was, it's because a user did something because without clients interacting, there's no reason to do anything. So at some point an account triggered an action and we basically just keep the account ID as an attribute on everything that happens.
It can be potentially a little tricky if you do two things back to back that both invoke the same kind of very asynchronous event based flow in that we don't have a great mechanism for something like a request ID. It's one of the things that we've been looking at and thinking on, but for the most part just by nature of how the architecture works and the design, there aren't a lot of cases where you'd be in a position to do that right.
You're not finishing two games immediately because you can only be in one game at a time, and for things that are game related, we're also generally attributing the game ID there somewhere and so we can look for both account and game, rather than one particular request ID. I guess compound keys is the answer to that really.
Jessica: Right. So you can look at everything that happened for an account and game ID, and you can still see the story on the timeline?
Liz: Yeah. I'm definitely thinking by analogy to what we do at Honeycomb with user events, where we measure similarly these sessions that our users initiate and we can filter by user ID and then we can group by the name of the action and then you can see in the timeline, "Okay, this user clicked the button to expand a trace five times in sequence." Each of those things might be a discrete event, but they're all joined by this common key of user ID that we can still use to filter and group.
Brenna: Yeah. And I'm willing to bet that from your side they're also joined by the particular trace that they were expanding or their query results that that thing was related from. So there's multiple ways that you can group that together, rather than just by making assumptions on time spans.
Liz: Yeah, exactly. They don't all have to share the same request or session ID because often, in fact, yes, technically the user against dataset has one root span per page load, but often people open a thing in a new tab, right? So the trace root is not quite necessarily always going to be the default current unit of granularity.
Jessica: In the end it boils down to events.
Liz: Yeah, it boils down to events and flexibility of schema and flexibility of analysis.
Brenna: Yeah. Not having a static schema has been really useful, just because we can be so freeform in, "Oh, I need this attribute. I'm just going to add it."
Liz: So walk me through what happens, you've mentioned you have a whole bunch of these different microservices like lambda functions, how do you manage deploys? How do you tell whether a given build is healthy or working correctly?
Brenna: We would largely consider a build something that we would group by a given version, application version, data version, et cetera. So we have a collection of all of these little numbers that represent version and one unique set of them is that deployment.
One of the things that we definitely take advantage of in lambda is the ability to just deploy another lambda next to the first one, so we often have multiple sets, multiple versions deployed in a single environment at a time.
I think most people would end up in the same state if you are supporting a... It's still production, but you maybe have two or three different versions, maybe they're not all out, maybe some of them are only open for QA.
And so, again, just adding additional attributes, we always add the application version for that deploy into every single at least root. I think those are actually on every span period, so we can always group by those things to compare one to another or compare them potentially across environments if we really wanted to.
Liz: I see, so basically application version or commit hash maps to a single number that gets associated with that build. That build potentially goes out, that gets exported so you can compare across. Interesting. So I guess that means that API Gateway is handling the direction of traffic of which lambda, different lambda versions are active and receiving traffic at a time?
Brenna: Yeah. Those things actually also appear in the route, so we're routing specifically requests to specific versions. It does mean that obviously the client has to be aware of those things.
Liz: That's actually something that we've struggled with at Honeycomb because right now when we deploy a lambda version, that lambda version just becomes the default version and just goes out 100% when we say so. It's segregated by environment but we kind of don't do partial rollouts of lambda right now, and that's something that I'm wishing that we were slightly better at.
Brenna: Yeah. We definitely put several iterations into what the API routing looked like before we found this, and previously we learned something very similar where you would make a deploy and it would be live. There's not a whole lot that you can do from there, other than roll forward even into a previous version.
So there's a few ways that you can add API versioning into routes, you can go for minor, you can go for major versions and try aiming for a minor. You can add everything. We just opted to add, basically, everything so it does limit us in how we can do certain fixes that might require a client update. But there's other things that we're working on in that space.
Jessica: Okay, okay. You said the client is aware of the version, does it find out about that dynamically? Do I have the same version of the handler's API throughout a single game?
Brenna: Application version is actually baked into the client when the client is originally built, which in our case means App Store approved and distributed. We can't just update the website to automatically change what APIs are being used on the backend. So delay in client distribution has honestly been the thing that was most unfamiliar to me moving from more of a tools and web background into actual live game service hosting.
Jessica: So does this mean you're able to use your observability to figure out which clients are still in production, and therefore which versions of the lambdas you still need to support?
Brenna: Yeah, absolutely. Because we can just group by the version and see how many requests are coming in for that thing.
Jessica: So you can measure your uneven client distribution?
Liz: Okay. So that's relaying the client analytics to pinging client versions to calling specific API endpoints, and then using those API endpoints to basically say, "OK. This one is discontinued, less than 1% of users are using it. We're moving onto the next one." That's really awesome. That's definitely things, as you were saying, that we don't think about as web engineers where you expect, "Hey, if it doesn't work people will refresh the page and you get a brand new version."
Brenna: Yeah, it's a little bit different.
Liz: What does your typical workflow look like for analyzing that wide event or deep trace, depending upon the situation? What does that workflow look like? How do your engineers interact with that data?
Brenna: Yeah. So it depends on the reason that they're going to look for something. If there was a particular bug that they're focusing on, if we have a report from a player or maybe someone in QA or internally that can get their account ID, then we might be able to look for things like account ID and time span or game ID.
Liz: Those are just dimensions that your developers are clearing for inside of Honeycomb?
Brenna: Exactly, yeah. Where account ID equals blah, or if we're looking for something a little bit more generic, we're just trying to look, maybe we have vague reports but we're not totally sure what's going on. Maybe we're grouping by function name and region, and just looking to see if there are outliers, maybe we're just looking at overall duration or invocation count.
There are certain things where we expect them to always trend together and so if some of those lines diverge, it means that at some point in that event driven workflow, one of those events is falling off or potentially one of those events is being recorded twice for some reason. That would make the two or three lines that are supposed to be very congruent, one of them would slice off in a new direction.
Liz: So it sounds like what you're describing is that your developers have gotten some familiarity with the Honeycomb UI, they've gotten some familiarity with writing queries in the builder, rather than clicking around. How long did it take people to get that level of comfort?
Brenna: I think for those of us that were also responsible for writing the trace system, we were really kind of cheating. We were in there from the very beginning. For other folks that are maybe less familiar with how the trace system itself works, and they've just only been exposed to the Honeycomb UI, because they can find either internal documentation or just the shared code library of, "Hey, here are all the shared attribute names. Go search for these."
Then people have taken to the UI pretty quickly. I remember when, "Hey, there's a dashboard. I didn't write that dashboard. Who wrote this dashboard? Neat, other people are writing dashboards."That happened, I think relatively early on after we started sharing, even with production and folks outside of the engineering team specifically. They had questions that they wanted answers to and conveniently this was a tool that they could use to get those answers.
Liz: It turns out that an engineer who's motivated to solve a problem, will get through any barriers in their way whether they be big or small barriers. It sounds like the barriers were relatively small, especially given you've focused on internal enablement, playbooks, guides.
Brenna: Yeah, there are some documentation pages on, "Hey, here's how tracing works, here's what you can expect to find for developers that are actually working in the backend system." That might also include, "Here's how to extend it."But for folks who are reading documentation about a specific feature, that might also include, "Here are the relevant attributes or here's a query that we used in development," or things like that.
Liz: That's really awesome. I'm really glad to hear that.
Jessica: Yeah. You mentioned earlier that there's not as much difference as we think about game development. What would you say to a person who's currently developing business apps or web apps, and would love to go into games?
Brenna: I think the game field is just as wide as any division of tech that you might be in now, and if you're, "Hey, I just want to work in games," cool.
If I was to describe a bunch of engineers as .Net engineers writing HTTP web services that happened to be deployed to lambdas, that's a pretty big Venn Diagram overlap with people way outside of the games industry, so there's a lot of space for that.
If you are specifically looking at, "Hey, I want to do mobile client stuff." Or, "I want to do platformer and physics things." That's where particular skillsets start to be more relevant, and I will readily admit that I am not a client engineer and so my complete knowledge of those things is very, very fuzzy. But there's more overlap, I think, than people realize.
Jessica: So our experience could be relevant if we just want to do something that makes people happy all day?
Brenna: Yeah. And like any industry, there's probably some overlap of transferrable knowledge and maybe there's some things you have learn a little bit new. But I think it's probably more accessible than people give it credit for.
Liz: I think the other thing that people have heard stories about in the past, because this definitely was true in the past and no longer is, is it used to... Before indie game studios, before there was more diversity in the kinds of companies working on games.
There was a period of time in which salaries in games were lower compared to other tech jobs, and there was the expectation that you would join these large megacorps where you would put your nose to the grindstone for long hours, and I don't think that's true anymore and I'm really excited for what that means for the games industry to be available to more kinds of folks.
Brenna: Yeah, I don't think that's true. At least not nearly as prevalent as it used to be, so I think we're definitely moving in the right direction as an industry. I think there's probably always going to be work to make things better or the best that they can be.
Content from the Library
Getting There Ep. #3, The October 2021 Roblox Outage
In episode 3 of Getting There, Nora Jones and Niall Murphy unpack the Roblox outage of October 2021. Together they review the...
Jamstack Radio Ep. #114, Automating API Security with Rob Dickinson of Resurface
In episode 114 of JAMstack Radio, Brian is joined by Rob Dickinson of Resurface. This conversation explores API security in...
O11ycast Ep. #50, Identifying Weak Spots with Benjamin Wilms of Steadybit
In episode 50 of o11ycast, Charity Majors and Jessica Kerr are joined by Benjamin Wilms of Steadybit. This conversation examines...