In episode 35 of o11ycast, Charity and Shelby speak with Mark Ferlatte of Truss. They discuss government software system failures and successes including Healthcare.gov and the COVID-19 vaccine rollout.
About the Guests
Mark Ferlatte: I saw it on the news that this huge, very important component of healthcare policy launches and falls flat on its face.
And I had a reaction that I think a lot of technologists had, which was like, "Whew, better them than me," like, "How awful," like, "That's terrible."
And then about a month later, I got a phone call and it was from a friend, and I answered the phone and he was like, "Hey, we're getting some people together to help on this healthcare.gov thing. Can you help?"
And I was like, "Uh, sure." I mean, because what can you say to that, right?
You have to say yes when someone throws something like that in front of you.
And did all the stuff, had to tell my co-founders that I was going to leave for a while and go do this thing.
When I told them they were both like, "Yeah, that must happen."
The whole thing was wild. I still had this email that was titled, "Logistics," that just said--
This was all of the prep.
It was, "Hey, get a hotel room in this hotel. We're on the third floor. If we're not there, we're at the building where all the action is. Just say you're with the ad hoc response team and we'll let you in."
That was the logistics.
Shelby Spees: Wow.
Mark: And so, fly out there, show up and what it was at that point was a diving save to take this enormously complicated system and try to get people insured through it.
And it was an enormous effort.
Charity Majors: How many people a day were getting insured when you show up?
Mark: Some. It was better than-- I think three people got through the first day.
Charity: Nice, nice. Not zero.
Mark: And by December 24th, which was kind of the end of one of the first phases, we got over 100,000 families through.
Because it's really hard to measure these things, because it's measured in families, not people.
But it was a brute force effort for sure.
Charity: Sobering. So you came home from this catastrophe slash triumph with this firm conviction that, what?
Mark: Oh, when we came home?
Mark: I'll be honest with you.
So we were doing a rotating six week tours, and when I came home from that, I was like, "Well that was an event. That was a thing I did. Let's go back to doing what I was doing before."
Charity: Your regular life.
Mark: Yeah, let's go back to regular life.
And then not too much longer after that, there was a team of people--
So I was on the team that was trying to keep the existing thing running.
There was another team of people that were trying to rewrite part of it.
And those people reached out and were like, "Hey, we could do some help."
And that's when it occurred to me that, hey, the government needs external parties to play with to make all of these systems work, that makes the system that is our society work.
That's what got us at Truss interested in working with the government, and trying to make systems that work for everybody, which is one of the reasons why government systems are hard to build.
Charity: The government is us, literally.
Mark: The government is us. No one is coming.
Charity: No one's coming.
Shelby: This is a great opportunity for you to introduce yourself.
So I'm Mark Ferlatte, I'm the CTO and Co-Founder of Truss Works, which is a software consultancy that works with government agencies, healthcare companies, large organizations.
And I think the pithy way that I would describe what we do is we build software products for organizations that don't build software products for themselves.
Shelby: And so historically government, it sounds like it was sort of siloed away from the rest of the tech industry.
Is that sort of how you describe it?
Charity: I hate the term siloed.
It's one of those words that means almost nothing, and I would like to ban it from polite society, but yes, you're right.
It's like, there's this whole government, GovTech that has grown up.
It's almost like there's been very little crossover between tech-tech and GovTech.
Mark: What's really fascinating to me about this is, that didn't use to be the case at all.
So the US government used to be capable of building things and-
Charity: The internet literally came from the U S government.
Mark: So the internet came from the US government.
Here's one that blows people's minds. Most medical record systems.
The most popular medical record system came from the VA.
Charity: Oh yeah. VA has got their shit done.
Other countries send their healthcare like muckety-mucks to come learn from not the rest of the healthcare system in the US, but from the VA, it is the most successful health healthcare system in the world or was.
Well, and so this is the thing that happened is over our lifetimes, the US government decided to stop doing things for itself and depending on the private sector to do it for them.
And there are a lot of ways where that makes sense, you want to manufacture your own pencils, right?
It makes sense to buy pencils.
I think it makes a lot of sense, given the size and the complexity of most government projects.
I think it does make sense to lean on the private sector, but they went too far and we got into a situation where the people inside the government didn't understand anything about what they were buying. Right.
And that caused a lot of problems.
The thing about healthcare.gov, that's really wild. It is not the largest technology failure that the US has had by far. It was just a very public one.
Charity: What is the largest?
Mark: It depends on how you define it. Cause if you look at-
Charity: The F35, not withstanding.
Mark: I was going to say. I think there are a lot of large defense projects that could fall under that category.
But if you want to talk about a software system, the one that comes to mind is--
There was a project to build FBI case file system, and if I'm remembering correctly, it was $1.1 billion and then got canceled.
That's the largest one that I can think of. But that's the thing. There could be others-
Charity: Right, how would we know?
Mark: The nature of most of these-- Yeah, how would we know?
Charity: That brings this to the vaccine rollout.
Mark: Yeah. A lot going on there. Huh?
Charity: There's a bit. There's a bit.
Shelby: I'm interested in actually hearing if you can give us a nutshell version, from your perspective, knowing how government agencies work with software consultancies and stuff.
What might people not be seeing in the current rollout situation?
Mark: Honestly, I think what people are seeing is this is one of the clearest lived experiences of how complicated, and I'm going to tease Charity a little bit here, siloed the various parts of government are.
Because what we're for a vaccine rollout, we're talking about interactions between multiple federal agencies and 50 States plus territories, plus local municipalities, county and city.
None of those are designed to work together and that's on purpose.
None of them share trust. And that's also on purpose. Like we wanted it this way.
Shelby: The Federalization and local municipalities having like that local authority.
It's good for a lot of reasons, but yeah, when something, you need something to roll out across the entire country, it adds a lot of complexity.
Charity: I just want their money for my fiefdom.
I don't want to topple my fiefdom, I just want money for my fiefdom.
Mark: So what you're seeing is, A, getting logistics done across the entire country is wickedly, wickedly hard.
And that's just getting the vaccines distributed, right?
The other half of this, which is the part I feel a lot more confident speaking about is the-- someone's like, why can't I get a sign up?
Why are the websites so hard?
What about that guy in the New York times who like made a sign up website and it only costs $50, which is one of the most irresponsible things that I've seen the New York times published in a while.
Shelby: That was journalistic malpractice just straight up. Like we can get into that, but continue.
Mark: Because all of these software systems are built on top of policies and these interactions and it's Conway's law at national scale, right?
There is no like vaccine.gov because structurally there is no single place to go get a vaccine.
And so you see a lot of volunteer projects that have kind of popped up to try to fill this gap.
There's the, what is it?
It's vaccinated.ca I think is the one that I'm familiar with where they, they literally just have volunteers calling all of the places where vaccines might be, and then they update a website.
I got to tell you a brute force works, right? Like that's a fine way to do it. The fact that we have to do it that way is indicative of the way the US government works kind of all the time. Like this is how it operates.
And to be clear, this isn't because California doesn't have a technology arm, they have a very, very solid, like they got COVID-19.ca.gov up quickly.
They've maintained it correctly.
They've integrated with Apple and Google's exposure notification work.
All of these systems have come online.
That one was done entirely by government folks that wasn't a contract job.
And it's, I think a good example of places where the government can and does do good work, but-
Charity: But does it matter if it's not in concert with everyone else with the rest of the system?
Mark: And that is the difficulty.
And this is also one of the places where the administration matters.
It really does like having people who are running things and paying attention and trying to make things work matters-
Charity: Say more about that. Why would anyone not want things for work?
Mark: You know, Charity? I honestly, I don't know.
When I think about what government is for, it's to reduce the number of things that we need to worry about.
Mark: Right. And it's the stuff that we need to do collectively together.
Like that's what it's for. And as the world gets more and more complicated, we need those things.
Charity: We shouldn't all have to be experts in the protocols of how the FDA approves what medication or whatnot, or seeking out our own hydroxy chloroquine on the internet or any of this shit.
It's actually too complicated for us all to keep track of.
Shelby: But even if you ignore bad actors, there's so much complexity.
And I wonder if this is something that's different about the vaccine rollout versus your experience with healthcare.gov, where there's so much complexity at the local level.
Like you're actually trying to get vaccines into people's arms.
And so it goes beyond entering your information on a website that can be a single website across the entire country.
Did you run into that sort of thing with healthcare.gov that you had the local level complexities despite it being nationwide?
Mark: Yeah. Because remember for the way that health care exchanges work, there is one exchange per state and healthcare.gov is the exchange for States that didn't build their own.
So the idea is you would go to healthcare.gov and fill out some information and it would tell you, Oh, you're from California.
You need to go to the California one because they have an exchange and they're running it themselves.
And so it was this kind of delegation thing.
But if you were in one of the States that didn't build their own, then healthcare.gov would go and collect all the information and calculate all of the things and make sure that the business rules were all right.
Then package up this occasionally gigabyte plus XML document that represented your application and all of that.
And then ship that to that state's insurance company.
Charity: Since we're on the top-- Since this is an observability podcast after all.
And since it's one of my favorite stories, can we just detour, a little bit to talk about the cutting edge observability used by the ad hoc team to diagnose when the healthcare.gov was up or down and how that matured over the eight months.
Mark: It did mature. It did mature.
Charity: How did it begin?
Mark: It began with having CNN up.
Charity: Oh wow.
Mark: Or C-SPAN.
Shelby: Oh my gosh.
Mark: And then it progressed pretty rapidly.
Charity: Did the monitoring check out next?
As something end to end, or just like a pain check or something?
Mark: No, we had New Relic in there, and we're using that and we were paying attention to end to end.
And the reason we were paying to end to end is because trying to pay attention to anything else was just madness.
There was too much complexity inside the system to like track.
And so the primary indicators were what's the total error rate on the site and what's the response latency. And we flew by those.
Mark: We would recognize the patterns and we sort of knew what would happen.
Charity: And they didn't have those when it launched, they didn't have those?
Mark: They thought they did.
Charity: Oh, okay.
Mark: Yeah. I don't know what the system was because it wasn't in use by the time I got on the project.
But when the thing launched, it was monitored.
I mean, there was no concept of observability at that time, but they were like, "No, you've got monitoring and all the lights are green."
Even though CNN is saying that the site's down.
Charity: I see. Well, who are you going to believe me or your lying eyes?
Mark: Yeah, that was something that I think does reflect the kind of observability push is getting that in place was basically the first thing that the response team did.
Mikey Dickerson pushed that through.
Mark: And then that's what, let everything else happen.
Charity: Right. If you can't see what the fuck you're doing, you're just going to waste a lot of energy and effort and yeah.
It's so basically like closing that feedback loop of just the more you know, the better you can act, but when you know nothing, everything is wasted.
Mark: And when you start to bringing that back to like, how are we doing the vaccine distribution?
Think about the monitoring and observability problems of that, where you have each state, which may or may not want to tell you what their numbers are.
Mark: You know, we've seen Florida do some shenanigans-
Charity: Coupled with the travesty concerns and all the people who are like paranoid about their information.
I think almost the hardest part of this is that there's a double dose thing.
I think it's so much harder to keep track of. Okay.
But did they already get their first dose? Is it within this window?
Are you eligible for the second one?
I think that that seems like an incredibly hard problem, given the infrastructure that we have.
Mark: Oh yeah, no, it's bad. I think the dosage and the dosage spacing is actually--
I haven't seen medical professionals talking about that particularly being the problem as much as they're talking about well, they're all worried about the variant problem.
And again, I'm out over my skis. I'm not a medical professional, right?
Like I'm trusting what they tell me.
And when there's a chance to get the vaccine, I will be very happy to receive it.
But when we're talking about the systems that we're all interacting with to know like, are there vaccines available?
Can I get my mom a vaccine?
Why is it so hard for somebody to sign up to get an appointment? These are all reflective of the-
Charity: Lack of observability.
Mark: Well, it's observability and it's also organizational complexity.
This is Conway's Law made, made real with pretty unfortunate consequences.
Shelby: I think those kind of go hand in hand, it's something I've seen a few times and I've heard other people discuss it where when your organization is bad at talking to itself about it, when parts of your organization's better at talking to each other, it's that much harder to get observability into your production systems.
People have the little fiefdoms and there's a lot of like territorial wars and stuff.
And so observability is part of that socio-technical problem we're trying to solve is just like doing a better job, what our organization's supposed to be doing.
And when people block your ability to just ask and answer questions that affects everything from the customer signup process and BI intelligence all the way down to like system level, knowing how well we're doing.
This is something that I've just been sort of thinking about.
And I've been curious about. Is this something we see often in organizations where you sort of have that command and control, like fiefdoms, everyone in their individual castles and moats and stuff?
Is that related to a lack of observability?
Mark: I think so the way I think about this and the way I try to explain this to people is, think about stakeholder management.
Like if you're in an organization, you have to manage stakeholders.
What most people have a hard time understanding is just the scale of US government organizations.
And so we're not talking about managing five, we're talking about managing like 50 or 150 independent stakeholders who, if you wanted to like understand what's going on with all of those people, you have to convince them that you have legal access to get that data that you want to observe.
You have policy reasons to have that access, and you have to manage all of that all the time. And some of those stakeholder groups are external to the government. They are the constituents that you are trying to serve. And something that's an enormous difference when you're thinking about why does government technology cost more money or why is it expensive or what's going on? It's not just that stakeholder management it's that our government, to its credit, believes when it offers something, it has to serve everybody.
And so you're required by law to have everything be accessible out of the gate.
It has to work for people who can't hear or can't see, it has to work for people who have cognitive difficulties.
It has to work for whatever language that community speaks.
As much as we all like to pretend, we don't have a national language.
So there are government websites that are translated into 30 languages.
Shelby: You can't take shortcuts before shipping.
Mark: Right. You can't take shortcuts when you're to be done. Yeah. Right.
So a lot of the work that we do, and a lot of the work that the kind of civic tech people do is to get the government comfortable with--
Look, we're going to do a small thing first, and it's not going to maybe cover everybody, everybody, everybody, everybody, but we're going to learn how to make sure we can cover everybody, everybody, everybody, everybody, without waiting until the end.
And that's a lot of the conversation.
That's why there's a lot, you try to get people comfortable with the idea of we're going to do a small prototype.
We're going to work with just this one community to start, and we're going to learn some stuff or we're going to work with just this one state.
And I mean, and that, that should tell you when they think prototype and pilot, the scale that they're thinking, because yeah, one state, that's fine.
It's just a state. You're just like, that's an enormous amount of work still, but it's better than for the entire country all at once.
Shelby: Yeah. It's mind boggling just thinking about just the--
Especially for the healthcare.gov or the vaccine rollout where you have all the nightmare of HIPAA privacy laws and all the nightmare of government privacy protections.
Then each individual state and each individual city has all of their special local laws and regulations and requirements.
And that's the stuff--
I mean, I find it super fascinating and it also just like blows my mind that we're solving this problems, and it makes me realize no wonder things cost so much.
We shouldn't be surprised that government software is expensive because we do, even if we're approaching it in sort of the more agile way of picking one state, instead of covering all 50.
At the end of the day, the government has to deliver a more complete version of the product that any Silicon Valley startup would ever have to do.
Mark: Yeah. You can't just ship on iOS.
There's this weird thing, and I've never fully understand it.
We devalue government effort.
We think that government efforts should somehow be cheaper than the private sector.
When I look at private sector, nobody bats an eye when someone says, "Oh, wow."
Engineers at a place like Apple or Google or Amazon or product managers or designers.
Yeah, they get paid a lot of money, those problems are so complicated, right?
That's just kind of taken. And then they look at a government one and they're like, "Oh, why would we spend so much money on a website?"
I mean, how much money do you think google.com costs?
Probably like quite a bit if you total it all up.
How much money do you think a vaccine program that covers the entire country should cost?
Probably quite a bit.
Shelby: And I'm also interested in, because you do government projects at Truss and have a long history of monitoring and observability, and in trying to bring observability into these projects, what kinds of walls do you come up against, especially when it comes to like government data and health data and things like that?
Is it significantly more of a challenge than working with private sector clients?
Mark: It is, although it's often a challenge for good reason.
The thing that I think people should feel really good about is the US government takes the privacy and safety of your data more seriously than probably any organization that you give your data to on a regular basis.
They have multiple categorizations and they care about all of them, right?
Like that is what they care about.
So when you're talking about introducing modern tooling and you're talking about introducing observability tooling, and when you're just talking about doing something different, that's the first question is, "Well, does this thing you want to use? Is it going to take care of the data as much as we need it to?"
Mark: And that process gets all kind of bogged up, and there's this thing called an authorization to operate that every government website has to go through, it's literally, "Can you run the thing or not?"
It's one of those things where the concept is pretty healthy and the implementation is really heavy and very painful and kind of gets in the way.
And I think there needs to be a balance there, but when we're introducing tools, we have great conversations with our stakeholders and we're like, "Hey, like, this is where we're going to get from this tool. This is how the data is protected. This is how the data is going to be used by us and by you."
And it's a dialogue. Sometimes the government says, "Well, you can use that thing, but you need to use a hosted version because we want to run the thing we want to hold onto it.
Charity: How often do you feel like the people on the other side of those conversations do understand what they're doing?
Mark: They understand what they're doing.
Their priorities are not necessarily the same.
So that ETO process, going back to that, the people who are responsible for that part of the, do you get to ship your thing or not, they are not incentivized or penalized.
If they prevent you from shipping, that's fine.
Mark: Right? They aren't trying to like make it go.
Mark: They're trying to make sure that it's safe.
Charity: Right. Do you think that separation of motivations is healthy or ultimately too restrictive?
I can see the argument either way, you want them to have different loyalties than the people who are trying to make things go, go, go.
But on the other hand, if they don't share the common goal of fixing the problem, then you're just going to be mired in stasis. Witness Congress.
Mark: That's the root of the problem.
The structure, as it stands now, you are asking people to accept risks in order to make progress.
And they're not incentivized to accept risks.
And in some cases, and this is something that I've learned over time, it's easy to be like, "Oh, well, they just don't want to accept any risk at all."
And like, "Poo on them." Here's the thing. If you're in the civil service, you're a bureaucrat, and I mean that in like--
Just a definition, you're working for a bureaucracy in the US you have a pension, you have a retirement pension. That's great.
You do this and you serve your time and then you get a retirement. You lose that pension if you get fired. You lose everything. And so, the incentives for someone who's in that like signature position to accept a risk are real tough because they're potentially accepting a risk to their future.
It's not like, Oh, you're going to get dinged on your performance review.
It's like, Oh, well, you, you lose all this thing you've been working for the last 15 plus years. Yeah.
And so the incentives, once you understand the incentives, you see all of these patterns happening, changing incentives is something that I, as a small civic tech contractor out on the edges, that's not a thing we are in any position to do, but it's not serving us.
It means that things go slowly.
Shelby: You see that sometimes in the private sector as well, where I guess it's not necessarily.
Like, they'll say no to any possible risk, but you hear about it with testing and production or chaos engineering where people hear about anything that sounds a little bit scary and they just shut it down, even though not doing the thing is riskier than doing the thing well.
It's like nobody got fired for buying IBM sort of thing.
Sticking with the status quo and sticking with the known sort of approaches. It's a lot less scary.
I think I understand-- I briefly got to work on some government funded work and it's super scary when people have been working on this for 20 years and they've been doing it that way for 20 years.
And they're all subject matter experts and trying to introduce some new process or something throws an entire wrench in this complex machine.
It's been working for them.
Mark: And it might be working.
I mean, that's the other thing.
I think it's okay for most of the government to not move too fast, most of the time.
We've seen what happens when government fails and it gets real bad for people real quickly.
The middle ground that that we're pushing for, and that I'd like to see is where the government feels comfortable making experiments and trying things and learning from them.
And then incorporating that back into these very large bureaucracies that are very used to doing things in certain ways.
Those ways eventually stop working because the world changes around them.
We as citizens, we as like people who live in this country, what we need has changed, what we need from people, what we need from the government in 2021 is very different from what we needed from the government in 1971.
And that's going to continue to happen and that's fine. That's healthy.
Right now, it's still very, very, very hard to get that willingness to try, and it's been that way for a long time.
Charity: Failures are getting it broadcast to the front page of the New York times.
And nobody ever sees your successes because they're just quiet.
And this is like that dialed up to 11 on the national scale, especially when you've got a whole party that might be invested in making sure that everyone understands that government doesn't work.
Mark: Yeah. Can I share a thing in the spirit of like raising a success that nobody would ever see?
Mark: My last interaction with the California DMV was totally fine.
Mark: There was a website.
I had to renew my license. I had to renew my license, right?
Mark: I went to the website, it told me the documentation I needed to have.
I uploaded a copy of the documentation.
They validated it in advance. They gave me a ticket.
Mark: And then once I had that ticket, they're like, "Go to your nearest DMV, show them this, and they'll tell you what to do next." I did that.
They scanned it into a thing and they were like, "We will text you when it is your turn."
And this was under COVID, so they had all the distancing stuff in place.
Charity: What. There's a text message?
Mark: And they're like, "go wait out in the parking lot-
Mark: Not near anybody. We'll text you when it's your turn."
Mark: The whole thing probably took me less than two hours total.
There was like 20 minutes of wait time. It was very smooth. It was like not a big deal. During COVID, even.
Charity: During COVID even.
Mark: People still hate the DMV, right?
People are still mad at the DMV, but I was like, this is fine.
This is a good experience. There was nothing about this that was-
Charity: This is fine. This is an amazing, this should be the slogan of the resistance, "Make everything fine."
Mark: That would be very good.
Shelby: Yeah. The good enough experience.
The good enough customer experience or good enough constituent experience.
Charity: iPhone cameras, cameras,
Shelby: But fine.
Mark: But where we're at right now is we're in the like, "You're going to stab me in the eye and then give me the thing that I need."
Charity: Yeah, absolutely.
And I say this jokingly, but I'm totally serious.
I mean, this is like ops work forever.
If it's fine, if it's not noticed you're doing your job, get a raise except that nobody notices nobody ever gives you that raise.
They don't notice until you're gone and everything's falls to pieces.
Mark: And now you understand why the funding problem is a problem.
Charity: Yeah, exactly.
Shelby: I think my biggest takeaway from all of this is, we already put a lot of trust into government programs.
Charity: Who's we?
Shelby: Anyone with a social security number, I guess.
Anyone with a driver's license.
Citizens or people who use government programs and like most of the government is doing the best that they can with the resources they have at their disposal.
And that was my other question I wanted to ask is, since healthcare.gov, have you seen a change in the government type culture and approaches and are more organizations willing to adopt new approaches in the last decade or so?
Mark: Yes. So the two clear signals of this are that the United States digital service and their kind of counterpart ETNF are both still at it.
They're still doing their thing.
And they're the people on the inside who are trying to find agencies that are, that want to do this work and are looking to make these kinds of changes and trying to help them.
Those both came out of the healthcare.gov debacle. I think they have both been--
We've been way more successful with them and if they hadn't been there, but it's a huge, huge problem space.
And so the fact that we've been at this for not quite a decade and we're seeing some progress, I think is amazing.
I remember several years ago there was a presentation I was at, and someone's just talking about the scale of government and like how it's hard to work with government.
Then the speaker like stopped and just said, but here's the thing.
In government, if you make a 0.1% positive change, you have affected millions of people with that change.
And that's the game, it's this just like going for those 0.1% year over year, and they stack up.
Like my DMV experience things are just like a little better for people.
Charity: That's really motivating.
Shelby: I love hearing, even in these big organizations, these big government agencies, when you try, when you convince people and when you show them that new approaches do make a difference, they do adopt it. We're not stuck.
Charity: Newsflash people generally don't want to do a bad job.
Charity: In government or out.
But if you can't actually show them the impact of their work and this ties it all back to observability, they might accidentally do the wrong thing or do harmful things because they can't actually see the consequences of their labor.
Shelby: Well, thank you so much, Mark.
Charity: It's really nice to have you.
Mark: You bet. Thanks for having me on.