November 18, 2014
PagerDuty’s DevOps: Avoiding a Cyber Monday Fail
Last year an estimated $7.35 Billion was spent online during the Black Friday and Cyber Monday weekend. Coupled with the fact that engineeri...
Earlier this year, Heavybit member company Librato hosted their regular SF Metrics Meetup at our San Francisco Clubhouse. The event featured talks from Twitter’s Megan Kanne, and Bugsnag’s Emily Nakashima. Videos of these talks are included below, sign up here if you’d like to attend in person next time.
As client-side app frameworks like React and Ember keep growing more popular, we’re shipping more and more application logic out to users’ browsers. But we don’t always know much about what happens to it after we send it out to the client.
In this talk I take you on a fast-paced tour of all the strange cases we’ve looked at since shipping our new dashboard, from overseas proxy sites to rogue browser extensions to out-and-out clones of our UI. Finally, I’ll talk about how to cut the noise and focus on monitoring and mitigating the cases that really matter to your users’ experience.
Twitter’s Observability team provides Twitter developers a monitoring infrastructure including real time dashboards and alerting for their services. Twitter has grown orders of magnitude since it debuted at SXSW. The monitoring infrastructure has followed suit, ingesting orders of magnitude more metrics, from millions to billions.
As the company grew, so did the challenges of providing an always available alerting system. Design decisions made when Twitter had only one datacenter and a monolithic architecture created unresolvable scale and reliability issues for its alerting service.
In this talk Megan walks us through these challenges and describes Twitter’s solution: a next-gen alerting system that provides reliable, realtime, multi-zone alerting at scale.