about the episode
Niall Murphy: Hello, everyone. Welcome to this session where we will be talking about Twitter. Now, there's a lot that could be written about the Twitter acquisition and a lot that already has been and a lot that will be, et cetera, et cetera.
You could talk about it under financial, cultural, economic headings, I suppose, for want of a better term. We're not going to talk about that though. Instead, we're going to think out loud about what is happening in the industry as a result of the Twitter acquisition specifically, to people who think about reliability, people who do incident management, people who think about failures in sociotechnical systems and all of that kind of thing.
Nora Jones: Yeah. Thanks, Niall. So, yeah, today we're going to talk about Twitter. It's been a favorite topic among site reliability engineers since Elon stepped in as CEO of the company and it's been a favorite topic because reliability has clearly gone down as a priority internally. We can make a lot of assumptions about why that is but it's not been seen as the reliability product I think it once was.
Twitter has always held a paramount position in the industry of being something that SREs look up to, it's a frequent case study in books throughout the industry about scaling highly distributed systems and data intensive applications. The fact of the matter is that Twitter wouldn't have taken off from the beginning if it wasn't as scalable and as well designed as it is, from a reliability perspective.
And now we're reaching a point where Twitter can cash in on that reliability it's had for many years. We're not going to notice the effects of it overnight, but the effects of it will slowly start to emerge. It won't happen today, it won't happen tomorrow, but it will happen in the coming years unless something is done about it now. So that was what we wanted to talk about today, and Niall, I'm curious on your thoughts on some of that too.
Niall: Yeah, I have so many thoughts. I think the first thing to say is we're partially talking about this as you pointed out in some of your previous paragraphs because Twitter is so notable. It's not just notable in the wider world, for all of the cultural valency reasons you might understand, but it's even a little bit more notable within our niches. Precisely because of the folks that worked in it, some of the open source work they did and so on and so forth.
So it even, historically I suppose, punched a little bit above its weight, however you would put that. That's my intuition anyway. But we're obviously also talking about it because of what has happened to it, or specifically, to introduce in some sense the main topic of this conversation, because of what has not happened to it.
Here we probably have to talk about the... I think it's the New York Times, Megan McGardle article in the past couple of days, which basically says, "Elon came in, fired 75% of people and Twitter is just fine. Therefore, he was right." Which is the underpinning driving so many of these conversations.
Yes, for folks working in capital. Yes, for folks who are the C Suite folks and so on and so forth. It's an industry conversation. If you can come in and do this, and it's totally fine, the gates are completely open to massive layoffs and so on and so forth.
So I'd just like to dig in a little bit to that essential proposition, which is Elon did this and Twitter is totally fine. Therefore, we can do this elsewhere. Really the nuances around the quotes, "Twitter is totally fine," piece. I think there's some contrary evidence there.
Nora: Yeah. I think you're calling out what's important and what's the elephant in the room, is at a very high level, the public knows that Twitter has had a lot of trouble monetizing itself since its history. And so Elon comes in, focuses on the monetization which it's a business at the end of the day, it needs to drive money and revenue in some way, shape or form, and if you're not familiar with reliability or the benefit to the product or the benefit of the business, it can be thought of as a cost center.
And so that's what we were seeing at a really high level, was, "Oh wow, reliability is a cost center. It's not something that's actually benefiting our product." I think some of the challenge that we have in the industry is also reframing the conversation and educating executives in a better way because I think you and I know and all of our colleagues in the SRE industry know that it is not a cost center.
It's a necessary part of doing business and at the beginning of Twitter, if tweets weren't loading or if I sent a tweet and it wouldn't happen, no one would've used the product. That experience will dwindle down over time. No, Twitter might not blow up and completely go down. But things will happen over time and those little, subtle shifts will impact the product and it will impact the culture internally too.
I think what I would be curious to see with Twitter specifically is whatever their new monetization strategy is internally, how they're thinking about reliability as a part of that new strategy, versus reliability of the overall product of Twitter. Because, if they're looking to get more folks doing ads and ads is a terrible experience, they're not going to come back and do that again if it's not really shown to anyone. I'm curious what you think of that, and I'd love to hear more about the article you were referencing as well.
Niall: Yeah. There's a lot to respond to in what you just said. I think the three things off the top of my head would probably be, first of all, the actual economic argument, for want of a better term, behind this takeover and behind what Elon wants to do by all reports. It's not have ads, so pushing to move to a subscription business. Subscription business has different kind of customer relations implications, and when you're invested in a subscription business you're a bit more likely to come back to it on a day by day basis, or so the thinking goes, than if you're coming to an ad supported business.
For the people who are engaged, you increase their engagement. For the people who are not engaged, you decrease theirs. So in the act of moving to a subscription business, there's a huge question mark about how feasible this is and many views exist on both sides. Which brings me to my second point, which is the fascinating thing in my opinion, right now is the extent to which Twitter lives in a superposition of states.
It is both a failure, according to some people, and also a success, according to some people. They can both quote the same evidence to support their point of view, which is fascinating. What do I mean? So I think it will come as no surprise to listeners to this podcast who use Twitter, that the website and the mobile app and so on, and so forth, have all been considerably less reliable than they were in the past.
Not just, "I refresh the page and I don't see the things that I was expecting." But also blocked accounts unlock, searches don't work, random outages. There was that wonderful one, apparently about rate limiting all rights so no one could post for ages and a bunch of other outages, et cetera. So the ship has definitely become rockier. It's more uncertainly moving through the sea of misfortune.
But people who are saying, "Oh, this is totally fine," can go, "You chop 75% of the staff and of course you're going to see some kind of unreliability, but it basically still works." Whereas the people on the sociotechnical end of things are saying, "You have removed 75% of staff and therefore this thing is just one significant issue from blowing up totally and you can see the minor disintegration on a day by day basis as it's falling apart."
Nora: Do you think it's going to fall apart? Do you think it's going to come down completely?
Niall: I'd say Centrist Dad here is going to say that there is enough evidence to suggest it could have a pretty significant catastrophic outage eventually. But actually part of the interesting thing about the Twitter app as such, is how simple it is. I mean, it's obviously not simple in distributed systems terms and there's a lot of interesting and engaging and effective engineering work that made it the resilient system it is today, which is partially failing. But also partially working right all of the time.
The thing about the Twitter app is that you have built something that can in fact partially survive all of these things, if you swapped out the Twitter app for something which was launching VMs, say. That is just a use case which happens to be on the top of my mind for some reason, but if you're looking at that and people couldn't access their VMs with the level of unreliability that they're getting from Twitter today, I think that would be business existential. Or bank account access.
If you experienced different numbers in your bank account with the same frequency that you do when you hit reload on Twitter and things just don't run, I think that would be business existential. So I think that Twitter as an app can survive more things because of what it is, but it does still face, I think, a very significant problem if anything really big goes wrong elsewhere, if you see what I mean. If there's an earthquake or if there's a series DDoS or if there's a confounding factor which pushes the system beyond its now constrained envelope of operation, I don't know if the 25% of people who are left will be able to restore it from that meta stable, negative state.
Nora: Yeah. It's interesting to me, over time Twitter as a product hasn't changed very much. They have a lot of folks working on it, and I've had a few conversations with folks in the learning from incidents community about is there ever an example of a product that is done? Are we done working on it? And what does that mean? And what does that mean from a reliability standpoint?
But what's happening with what Elon is doing, is he's actually coming into a product that hasn't changed in several years substantially, and he's adding more features to it. He's adding a lot more features to it, which inherently is going to change the complexity of the product as well. And so I don't think it's going to come down all the way, but I do think he's going to end up in a new cycle where he has to hire reliability engineers or folks that are studying these complex systems because he's inherently making the system more complex.
I would love to be a fly on the wall internally right now just to see what's going on, but that's a question that's been coming up for me recently. But I wanted to shift topics a little bit, because outside of the revenue and the monetization of Twitter as a business, Twitter has had a huge role in society around natural disasters and getting information out to people that are involved in natural disasters.
I've actually been in situations before where I've been in locked down buildings and using Twitter to understand and coordinate what's going on. It is faster than 911 dispatching out to people because people are just posting live what is happening. It's used in storms, it's used in natural disasters, it's used in all sorts of things to get information out quickly when time is of the essence.
And so the impact of not focusing on the reliability from that perspective has a huge impact on society and that's where I see a potential worry. We have had natural disasters since Elon took charge in the company and people have used Twitter to get insight into what's happening and communicate with each other, and that's where some of my worry comes in.
There's actually companies that exist based on that, that are helping coordinate emergency management systems based off of what people are tweeting too. And so I'm curious how the APIs between Twitter and those companies are being impacted too. A company called Data Miner comes to mind, which I learned about several years ago, that scrapes tweets to coordinate with emergency management systems as well. So that's something that I've also been thinking about as well, just with the inherent reliability of it overall.
Niall: Yes, again there's a lot to say there. You may or may not be aware of this, but my understanding is some of the emergency efforts in responding to the Turkish earthquake or earthquakes over the past while, were actually impeded by Twitter's reliability problems. And so I don't know if it's necessarily completely correct to think about it this way, but you could argue that people have died as a result of certain things breaking.
Now, in order to make that argument successively you would have to show that there were no substitutes or yada, yada, yada. I don't know that you could make that claim quite that directly, but it seems to me to be at least adjacent to the truth in various ways. So that's an issue, obviously. Just back to the API piece, rather famously of course you're not going to be able to API access without the cash anymore.
So if the companies in question don't cough up, I think that particular avenue of data exploitation is gone. I have no idea about the relative economic benefit provided or due to either side of that equation right now, except to note as I've said previously, I'm from a European Social Democrat background and so what's good for society is a question which occurs to me which might not occur to other people who have different backgrounds. But I will say that the reliability of all of that again is very much perhaps an unintended consequence, or I don't know actually. I haven't been that fly on the wall, maybe it's an intended consequence.
Nora: Yeah. It's less what we should do about it, and more, "This is a thing." And I think many emergency management systems, we rely on Twitter to get the word out about certain things. I'm sure there's stuff to shuffle through, but I am wondering what responsibility Twitter holds in those moments, if any, and if there ever will be a world where it holds more responsibility. Because emergency management systems have been incredibly clear about the role Twitter plays during these situations, and so that's where a lot of my worry comes in and fear comes in, as it's not just a fun social network. It's a safety tool.
Niall: Exactly. There's an analog, actually, of a thing in the computer science of software engineering world called Hyrum's Law, which basically states that, "For any particular API that you happen to offer, people are going to end up depending on the nuances of how it behaves because over time people just depend on more and more things." I think there's probably an analog of this, maybe it's not quite in the software engineering world, maybe it's more in the social domain.
But if you offer a service, people are going to eventually use it for all kinds of things that you hadn't foreseen, particularly if it's good at those things and it means you can use it without thinking about it, which are two crucial properties of adoption of systems. So it very much seems to me as if the barnacles are going to have to be sliced off the ship, one way or another, if that's the way you think of it. Or it's maybe an opportunity for use cases which are accidentally served by an existing system or setup, however you'd put that. Transitioning that to a situation where they are deliberately satisfied by something which is explicitly designed to do that.
Nora: Also, I've noticed Twitter has been playing around with how it displays which tweets you're seeing too, and I'm wondering the implications of that because we talked about in the beginning of our recording, how designing Twitter is a popular interview question. I've been asked it by several different companies, in the implications of it, and how to do the highest performance when loading your tweets, versus showing the most recent ones, versus batching them.
But I am noticing changes where I'm getting shown ads a lot more, and I'm curious of your thoughts on if you're noticing differences and how you're viewing your home feed too. Any guesses you have on the implications of the benefits of the product, based on some of those changes?
Niall: Well, I use it a lot less, is the first thing I would say. Partially on foot of some things we've discussed, partially not. But I will say I've made the following observations, that as other people have said, I've noticed really weird things happening with respect to the use of the app. Which I reflect psychologically, has meant I've gone, "Oh well, if it's happening to this account, it could happen to my account as well."
Or similar observations where I go, "This thing is behaving oddly over here, and showing me things it shouldn't, or not showing me things that it should." And so I've come to question the rest of the product, even if there happens to be other subsystems which are working totally fine. I don't know if there's a name for this effect, but very obvious, immediate problems in one directly user facing thing have caused me to tar everything else with the same brush.
So I don't know if that's fair of me, necessarily, but it's definitely true. Related to that, there is an effect where when a customer base is coming into contact with a product and potentially adopting that product, you often see a thing called an S Curve, which is where a thing starts quite slowly, ramps up very quickly in the middle, possibly with exponential growth and then tails off at the end as you reach some notional threshold for the maximum number of adopters that you can have.
So S Curves are pretty commonly found in a bunch of things, but there is... I've never read about this, but I'm going to assert without any evidence that there's an S Curve in the reverse direction as well, where it's maybe quite hard to shake people off a system that they're already deeply engaged in, but once some critical threshold is passed, then actually that might go down quite rapidly. Like that famous Ernest Hemingway about bankruptcy and how it happens, which is to say slowly and then suddenly, or gradually and then suddenly. So we might yet be seeing the death throes, but we might also be just at the top of this curve.
Nora: Yeah. I think it'll be a slow burn. I think potentially by design it's being changed on the type of user Twitter wants on it platform, with the subscription based model, with subscription based 2FA now, which is highly problematic. Yeah, subscription based verification model too, it seems less of a trying to become a place where people communicate and form community, and more trying to be a place where people are marketing their businesses or whatever they are marketing. That's how I'm reading into this situation right now, it's driving away folks that are there for community. I'm curious if that is by design.
Though I will have to say to you that the set of events which have transpired so far with respect to Twitter doesn't give me the strong sense of a plan or a huge amount of directional intent, shall we put it that way?
Nora: It seems like they're treating a pretty large company like a startup. A lot of experiments run on a very large number of users right now, and just trying to see what sticks to the wall, which I think is going to really backfire for them at the stage they're at.
Niall: Well, it was a large company. Now it's considerably less large, for good and for ill, possibly, depending on your point of view. But actually, coming back to your point about community and coming back to the point of being a much smaller company, it's, I think, certainly appropriate to talk about the effects, not just the direct effects of lots of people being let go in a, I might say, very, very disrespectful and bad way. So it's important to talk about that kind of effect, but it's also important to talk about the doubt that it puts in the minds of senior decision makers about what it is precisely you would say you do around here, Mr Murphy? Or equivalent. Do you have any views on what other people have said to you about this?
Nora: I think this is not a great thing, but do think Elon's decision making and how he's treating layoffs are influencing the rest of the industry. Like I was saying, it's not going to bite folks overnight, but there is a substantial amount of research done, that laying off folks and doing so in really cold ways has really detrimental effects. I think a lot of the time senior executives use it as a signal to investors that they can make decisions and save the business and put the business first, but it actually ends up hurting the business over time.
And so I think some of the onus is also on the investors too, to understand the implications of some of these things, so that it's not always seen as a good sign for the business that things like this are happening. I think what Elon did at a high level shows that, "Oh, maybe we don't need reliability," which is not the case whatsoever.
It's costing them a lot of money internally, I'm sure, due to the conversations that are happening, due to disgruntled employees. All of that becomes expensive and it becomes expensive in a really exponential way. That's not good for organizations either, and it prevents them from growing and from learning, and if you're not in an organization where you're learning or where learning is seen as paramount, you're not going to be a high performing organization either. So those are some of my rough thoughts on that.
Niall: I do want to call out and acknowledge, I suppose, what you said about the relationship between investors and the folks in the C Suite, et cetera. I do acknowledge that that component of a relationship between investors and the C Suite and the business there, that there is certainly some social signaling because as we know these things are very often socially constructed in a way where they try and signal that it's not socially constructed. Or try and frame it like it's not socially constructed, but it is.
Anyway, I do agree that there's a component of this which is you have to signal that you're willing to be cruel to people for the health of the business (As if business had nothing to do with the people). Anyway, it's complicated, but I do think that that signaling exists. The weird thing is, is that at least according to my understanding when you go and look at what happens to businesses after things are let go or divisions or people or whatever are let go, almost all of those businesses end up at more or less the staffing level they were at previously.
It's just a matter of time. And so if that is true, then what is the difference between 1,000 people on January, 2023 and 500 people the day after, and 1,000 people on January, 2024? Is there a specific benefit? I think many of these things are complex enough that there isn't an easy answer. Some people have claimed that layoffs in this manner are a social contagion as opposed to socially constructed, from a value point of view. Yeah, absolutely that's definitely easy to see.
One of the things which has struck me as potentially underpinning this, though, is something that I think other distributed systems engineers, SREs and the like will need to think about very carefully. I put it this way, in 2008, Paxos is known to very, very few people. The number of people who could write a leader election library for distributed systems is very, very small. In 2023, Paxos is in loads of places. There are loads of implementations of it. I'm not saying all of those implementations are equally good or proved correct or any of those things, but I'm just saying it's a much less unknown science than it was previously.
I think you can make a corresponding argument that many of the things that the 'multinationals', for want of a better term, including Twitter, the tech multinationals... Many of the things they did and released as open source and the knowledge that osmosed between them as people left to join other companies and so on, all of that has contributed to a situation where actually the difference between what you were contributing in 2008 and what you were contributing in 2023 is very different.
Actually much of the immediacy of the support that you provided in 2008, it had a very existential character to what you were doing. In 2023 you can go and buy a ton of stuff that does this for you. Now, maybe it doesn't do it to five nines for everyone, maybe it does it to three nines or two nines. But actually a lot of this is way more available than it was previously, and I suspect that if you actually removed all of the people looking after a thing, all of them, 0% left, I have thought many times in my life, "How many minute, seconds, et cetera, et cetera, hours, days, weeks, would this thing last if all of the people went away?"
And I think that, coming back to my Twitter use case point, some bits of Twitter are sufficiently simple that actually it could persist in a moderately degraded state for longer than if you took all of the people away from, just to pick a random thing, GCP. So perhaps that's another difference as well.
Nora: Yeah. It's the value changes from being the person that writes the thing from being the person that maintains the third party thing that that has bought. You still need a human. It's just a different level of expertise, and I think one thing, Twitter has been around a while, right? And I'm sure there are engineers there that have maybe been there since the very beginning, and I'm sure their jobs have changed a lot.
Part of, I think, being an individual at these companies from the early days is understanding where you need new expertise as the business shifts and evolves too, and part of the business's responsibility is giving that context to the individual contributor. Not every human is right for every stage. Sometimes you have a human that is really great at the first stage and is not going to be great at later stages, and that is on both the business and the individual to figure out if that is the case and how to come up with an exit plan.
But it shouldn't be done in the form of a mass, mass exiting of individuals. It's kind of ignoring the problem until it becomes super pervasive, and then doing something really inhumane, like you said before too. So I'm making a lot of assumptions here, but it is disappointing to see and I hope our industry changes a little bit in these regards so that we don't go through these cycles every several years or so.
Niall: Well, when we manage to fix the whole economic boom-bust cycle thing, I think we'll both in line for several Nobel prizes. I'm wondering where I'm going to put them, I'm going to have so many of them. Anyway, the other thing I wanted to touch on, particularly from the reliability, SRE, yada, yada, yada point of view, is there is so much about the obligations that Twitter currently, in theory, must provide.
Including, but not limited to, they have a consent decree, they have to be able to process GDPR requests, a bunch of other things that are related to legally operating in the environment that they are operating in, for want of a better term. You could look at what has happened with respect to the layoffs and so on, and say, "This is a gigantic walking away from obligations."
And in fact, I don't know and possibly I shouldn't speculate but it's a podcast and we're here, perhaps not all of the GDPR obligations, for example, are effectively fulfillable right now. So let's say there's some difficulty about that. What role should sociotechnical folks, reliability folks, et cetera, et cetera, be playing in these kinds of things? Should you look at this as a reliability issue or a feature issue? Or a business issue? Or does it touch all of them?
Nora: I think it touches all of them, and I think part of the problem is thinking of those things as separate. I think they all need to be thought through and there's stuff that we leave on the table a lot of the time too, but I think the role sociotechnical system experts would play and should play in these companies is understanding, really understanding, what matters to the executive.
For better or worse, they are the executive, right? Going against them and pushing against them in the organizations is not going to help you if they're trying to change. But understanding why they're trying to change what they're trying to change can help you. It helps bridge trust, but it can also help you gain influence to have a more learning focused organization as well. Those are just my two cents on that.
Niall: It is not clear to me that today's Twitter is terrible interested in becoming a learning organization in that sense. Yeah. Perhaps I do it a disservice by saying that, but I think this is, in some sense, way more existential. There is an existential question mark hanging over the company now, which was not previously the case. I often think of learning organizations as they are sometimes interpreted as things that are luxuries in some sense.
Yes, you can learn after we've done all of these other things. Of course that's just a completely wrong characterization. You have to be learning in order to do all of these other things as well, it's just a question of how far that learning has propagated.
Do you yourself end up knowing a sad fact or do you get to tell other people about the sad fact and hopefully prevent them from knowing further sad facts in future?
Nora: Yeah. I think it's little efforts first, and I think the folks in this organization have to start from scratch. They have a brand new executive that doesn't know the history, that might not be interested in receiving some of the history, so understanding what the executives are interested in receiving, if you are planning to stay at the organization, is just going to help you improve it.
So I think that's some of the ways I would put it. Having a full fledged learning organization, yes, might be a luxury, but I think there's little things that you can do, little questions you can ask to help influence hearts and minds too. Rather than telling folks that they're doing it wrong. Any closing thoughts, Niall?
Niall: Just to say for the folks that remain, I hope that things work out well. I genuinely do hope that Twitter, the corporate entity as distinct from anyone in it, manages to build a sustainable and reasonable business model and still exists. I would hate, in some sense, for this wild and precious thing which was created some time ago to just fold up its tent and go home.
I think that would be a terrible loss to world culture. But I do think that some of the things it ended up getting used for are perhaps not quite the right things for it to be used for in that context in the future. And of course my thoughts go out to the folks who have been treated rather cruelly in many cases, and I hope you find additional employment or whatever you're looking for in the future very soon.
Nora: Awesome. Well, thank you for covering this topic with me, Niall. It's something I see SRE communities talk about almost every day, so I figured it was good for us to cover this at some point in time. But yeah, we will see you next time, folks, and thanks for joining us today.
Participate at DevGuild: AI Summit
Join us on October 19th, 2023 for a community summit with 200+ others like you coming together to discuss how AI will change the face of software development.
Content from the Library
EnterpriseReady Ep. #32, Evaluating Acquisitions with Mike Gregoire of Brighton Park Capital
In episode 32 of EnterpriseReady, Grant speaks with Mike Gregoire of Brighton Park Capital. They discuss strategies for growing...
High Leverage Ep. #5, Mergers and Acquisitions with Eliran Mesika of GitLab
In episode 5 of High Leverage, host Joe Ruscio and special guest Eliran Mesika of GitLab pull back the curtain on company...
To Be Continuous Ep. #46, Microsoft Acquires GitHub
In episode 46 of To Be Continuous, Paul and Edith discuss Microsoft’s acquisition of GitHub. Is it a good thing? What will be the...