WelpConf: Striving for Stability

Published MAY 26, 2016

4 mins

Light Mode

Greg Poirer

Striving for Stability

The IT Ops Journey: Then, Now, and Next

On May 12th, 2016, Heavybit member company Opsee hosted their first ever WelpConf. The event featured guest speaker Andy Smith from Wercker, and a panel moderated by Dan Turchin of BigPanda, featuring Andy Smith & Greg Poirer from Opsee. Videos of the talk and panel are below, along with further thoughts from Greg.

Have you ever been on-call? Has your phone beeped and buzzed at 2 a.m.? Have you ever looked at an alert and thought to yourself, “Welp?”

Striving for Stability

Systems are a representation of reality that strive, and fail, toward determinism. They are only perfect in our mind, and while we make every effort to translate our desires into reality, we often fail to consider every possible input or address every possible failure scenario during implementation, and this leads to instability.

Every action between building locally on a developer’s laptop and deploying to production should be reproducible.

Only that which is reproducible can be stable. The unknown may never be fully known, but 99.99% availability is a laudable achievement. By working toward reproducibility in our actions, we can inform our decision making with assurance that the system will behave in a predictable fashion. Docker is a powerful tool that is a step in the direction toward reproducibility, but there are right ways to do things and there are wrong ways to do things.

The IT Ops Journey: Then, Now, and Next

Computer operators, systems technicians, systems administrators, operations engineers, systems engineers, developers. Throughout the history of computers, the only constant has been operations. Someone is responsible for making sure things are doing what they should be doing. When the fire went out, homo erectus would rekindle the flame.

Playbooks, battleplans, cron job failure e-mails, and scheduled restarts of services have been the tried and true method of reacting to operational incidents for time immemorial. The sysadmin’s handbook twenty years ago is still very much the monitoring of today.

Monitoring has been the Big Five for as long as most of us can remember: ping tests, process checks, CPU utilization, memory utilization, and disk utilization.

In the past few years, intrepid developers have made the things under observation and the ways that we observe them increasingly sophisticated. We’ve gotten exceedingly good at being able to answer the question, “What is happening and where is it happening?” We have not, however, taken to going a step further.

Our distrust of the systems we build has left us in a position of constant manual intervention and ad hoc automation of remediation.

We need goals for IT operations and monitoring. We need to be able to build trust in the systems we build. Anything emitted by a system should be recorded, catalogued, and understood. Relationships between components should be well-defined and discoverable. The people building and deploying systems should have a comprehensive understanding of their failure domains, and the systems themselves should help inform that comprehension.

Sign up here to attend the next WelpConf, and check out our events page for a schedule of upcoming developer focused events at our San Francisco Clubhouse.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Jun 18, 2025

Article

LLM Fine-Tuning: A Guide for Engineering Teams in 2025

General-purpose large language models (LLMs) are built for broad artificial intelligence (AI) applications. The most popular...

Apr 22, 2025

Article

Regulation & Copyrights: Do They Work for AI & Open Source?

Emerging Questions in Global Regulation for AI and Open Source The 46th President of the United States issued an executive order...

Apr 17, 2025

Article

How to Properly Scope and Evolve Data Pipelines

For Data Pipelines, Planning Matters. So Does Evolution. A data pipeline is a set of processes that extracts, transforms, and...