SF Metrics: Richard Waid & Ben Hartshorne

Published DEC 20, 2017

3 mins

Light Mode

Ted CarstensenPlatform Director, Heavybit

Richard Waid, Director of Monitoring Infrastructure, LinkedIn

Ben Hartshorne, Software Engineer, Honeycomb

Heavybit member company Librato hosted this SF Metrics Meetup in the Heavybit Clubhouse on October 25th. If you’d like to attend future SF Metrics meetups in person, sign up here.

Richard Waid, Director of Monitoring Infrastructure, LinkedIn

In the past 6 years, the Monitoring team at LinkedIn has dealt with an explosive change in scale from 12k to 850M individual metrics, as well as a migration from NOC based escalation to direct remediation and escalation. In this talk, Richard gives a brief overview of how they accomplished that, how they fit into the overall engineering ecosystem, as well as what they’re doing for the next major evolution in their journey. Along the way Richard covers a few of their major learnings: protecting the ecosystem against well meaning users, planning for explosive scaling, and the global vs. local optima challenge of self-service tooling.

Ben Hartshorne, Software Engineer, Honeycomb

The two main methods of reducing high volume instrumentation data to a manageable load are aggregation and sampling. Aggregation is well understood, but sampling remains a mystery.

In this talk, Ben starts by laying down the basic ground rules for sampling—what it means and how to implement the simplest methods. There are many ways to think about sampling, but with a good starting point, you gain immense flexibility. Once you have the basics of what it means to sample, Ben looks at some different traffic patterns and the effect of sampling on each. When do you lose visibility into your service with simple sampling methods? What can you do about it?

Given the patterns of traffic in a modern web infrastructure, there are some solid methods to change how you think about sampling in a way that lets you keep visibility into the most important parts of your infrastructure while maintaining the benefits of transmitting only a portion of your volume to your instrumentation service.

Taking it a step further, you can push these sampling methods beyond their expected boundaries by using feedback from your service and its volume to affect your sampling rates! Your application knows best how the traffic flowing through it varies; allowing it to decide how to sample the instrumentation can give you the ability to reduce total throughput by an order of magnitude while still maintaining the necessary visibility into the parts of the system that matter most.

Ben finishes by bringing up some examples of dynamic sampling in Honeycomb’s infrastructure and talks about how it lets them see individual events of interest while keeping only 1/1000th of the overall traffic.

Interested in joining Heavybit? Our program is the only one of its kind to focus solely on taking developer products to market. Need help with developer traction, product market fit, and customer development? Apply today and start learning from world-class experts.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Jun 16, 2026

Article

What if the AI Harness Was Your Computer?

Giving AI Data Ownership Back to Users Researchers have warned for years: Frontier models are potentially running out of data to...

Jun 9, 2026

Article

How to Think About Agentic Memory Job-to-Job

Moving Memory From Individual Apps to Centralized Infra Despite all the enthusiasm for AI agents, recent reports suggest that...

Jun 4, 2026

Article

Why Orchestration May Be the Future of Agentic Development

For now, AI agents are autonomous entities to which users can delegate simple tasks: Monitor your calendar. Sort emails. But for...