WelpConf: Striving for Stability
- Greg Poirer
On May 12th, 2016, Heavybit member company Opsee hosted their first ever WelpConf. The event featured guest speaker Andy Smith from Wercker, and a panel moderated by Dan Turchin of BigPanda, featuring Andy Smith & Greg Poirer from Opsee. Videos of the talk and panel are below, along with further thoughts from Greg.
Have you ever been on-call? Has your phone beeped and buzzed at 2 a.m.? Have you ever looked at an alert and thought to yourself, “Welp?”
Striving for Stability
Systems are a representation of reality that strive, and fail, toward determinism. They are only perfect in our mind, and while we make every effort to translate our desires into reality, we often fail to consider every possible input or address every possible failure scenario during implementation, and this leads to instability.
Every action between building locally on a developer’s laptop and deploying to production should be reproducible.
Only that which is reproducible can be stable. The unknown may never be fully known, but 99.99% availability is a laudable achievement. By working toward reproducibility in our actions, we can inform our decision making with assurance that the system will behave in a predictable fashion. Docker is a powerful tool that is a step in the direction toward reproducibility, but there are right ways to do things and there are wrong ways to do things.
The IT Ops Journey: Then, Now, and Next
Computer operators, systems technicians, systems administrators, operations engineers, systems engineers, developers. Throughout the history of computers, the only constant has been operations. Someone is responsible for making sure things are doing what they should be doing. When the fire went out, homo erectus would rekindle the flame.
Playbooks, battleplans, cron job failure e-mails, and scheduled restarts of services have been the tried and true method of reacting to operational incidents for time immemorial. The sysadmin’s handbook twenty years ago is still very much the monitoring of today.
Monitoring has been the Big Five for as long as most of us can remember: ping tests, process checks, CPU utilization, memory utilization, and disk utilization.
In the past few years, intrepid developers have made the things under observation and the ways that we observe them increasingly sophisticated. We’ve gotten exceedingly good at being able to answer the question, “What is happening and where is it happening?” We have not, however, taken to going a step further.
Our distrust of the systems we build has left us in a position of constant manual intervention and ad hoc automation of remediation.
We need goals for IT operations and monitoring. We need to be able to build trust in the systems we build. Anything emitted by a system should be recorded, catalogued, and understood. Relationships between components should be well-defined and discoverable. The people building and deploying systems should have a comprehensive understanding of their failure domains, and the systems themselves should help inform that comprehension.
Sign up here to attend the next WelpConf, and check out our events page for a schedule of upcoming developer focused events at our San Francisco Clubhouse.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
How to Generate Financial Reporting for Board Meetings
How to Get Financial KPIs Ready for the Board Meeting Your startup just closed its first major round after a long and arduous...
2023: DevTool Industry Year in Review
Though it may be cliche to say, this year has been a rollercoaster for the devtools space and tech industry at large. The last 12...
A Guide to SEO in the Age of AI, Plus Modern Best Practices
How Will Generative AI Change SEO as We Know It? Search engine optimization, the art of optimizing your startup’s website and...