1. Library
  2. Podcasts
  3. The Kubelist Podcast
  4. Ep. #48, Unpacking Software Supply Chain Security with Justin Cappos
The Kubelist Podcast
64 MIN

Ep. #48, Unpacking Software Supply Chain Security with Justin Cappos

light mode
about the episode

On episode 48 of The Kubelist Podcast, Marc Campbell and Benjie De Groot sit down with Justin Cappos, professor at NYU and a pioneer in software supply chain security. They explore the origins of modern package manager security, the real-world limits of SBOMs, and why systems should be designed assuming compromise. The conversation spans CNCF governance, in-toto, TUF, Git security, and the emerging role of AI in securing software.

Justin Cappos is a professor of computer science at New York University and a leading authority on software supply chain security. He is the creator or co-creator of widely adopted open source projects including TUF and in-toto, and plays a key role in CNCF and OpenSSF security governance. His work underpins security for cloud infrastructure, open source ecosystems, and even government systems.

transcript

Marc Campbell: Hey, welcome back to The Kubelist Podcast. Today Benjie and I are here with Justin Cappos, professor at New York University.

I'm really excited about this episode because I spend a lot of my time thinking about software supply chain and Justin's been working in this space, has some really cool CNCF projects and open source contributions that we're going to spend some time talking about. Welcome Justin.

Justin Cappos: Thank you for having me.

Marc: So before we jump into your work that led you to become the father of software supply chain security, I'd love to learn a little bit more about your background, tell us your current role at NYU, what you're doing.

Justin: Sure. So I'm a professor, which usually means that you're supposed to be doing a lot of paper writing and interacting with other academics and doing things like that. But one of the great things about NYU in particular is they've been very supportive of having broader impact in the real world.

And so I really view my community as much closer to the Linux Foundation or the open source community rather than a bunch of academics that are writing papers that are mostly only read by other academics. So I get excited to go and you know--

I got to give a keynote at one of the prior KubeCons and had about 10,000 people in the audience listening to stuff I was saying. It's just been a tremendously positive experience to be able to have a lot of impact across a lot of open source.

Marc: And so you get to interact with folks. You're building software, you're not just writing papers, you're building software, hearing about problems and trying to help people solve these.

Justin: Exactly, yeah. And I also do a lot of other service work inside the Linux Foundation or the broader open source community. So for instance I created and run the process by which the CNCF assesses the security of software projects.

I'm a tech lead inside of TAG Security. I was also elected to the governing board of the Open SSF and you know, try to pitch in and help out with efforts and things like that that they have going on.

So really, when I see an opportunity, a place that I can step in and try to help out, I'm really just trying to make the world a better place.

Marc: So you're actually even working not just on building software but other incubating and graduated projects, the process that they're using, Kubernetes, etcd, like to get their security reviews and making sure that these are holding the bar up as high as they're supposed to be for a graduated project, right?

Justin: Absolutely. So we go and basically the way it turns out is, they first go through a process called a self-assessment which is basically them going and trying to threat model and describe out their system like what happens when different parts of it fail and like who are the different actors in their system and what things should be considered out of scope like that their users need to handle and stuff like this .

And then they do that first part independently and then we go forward and work with them to get that threat model, that assessment, to be a high quality and also give them recommendations about how they can improve.

And this is a process that, you know, I had been doing something on a smaller scale with this individually with a bunch of startups and things like that in the New York area as part of the incubators at NYU and it's the sort of thing that we did for a lot of our projects.

We basically do this sort of security analysis and things like that for the software and projects we produce. And then I worked with the Spiffe Spire folks. They were actually our first kind of real guinea pig for doing this. And we developed a nice process for the CNCF that then turned into the process that we have today.

And even to make it easy for people to see how to do this, we even wrote a book about it that's quite accessible that describes how you do this for your own process, what all the thinking is behind it and have about 10 or so people from the security community that have gone and given quotes and suggestions and anecdotes and things that we used in parts of the book, so they could also give their impression on the best way to do risk management or this part of threat modeling or their hot take on whether Stride is actually useful or things like that.

Marc: Cool. Well, I think we got a lot to talk about. Bad news is software supply chain security is not a solved problem right now. So we have a lot of work to continue to do and I think AI is just changing the threat of that landscape completely. We're going to spend a ton of time talking about it. I think Benjie has something.

Benjie De Groot: Yeah. Justin, it's great to have you on. Thanks for the time here. I want to back up.

I think it's pretty cool to have a professor your computer science professor on and kind of talk about, especially with how long your tenure's been, talk about how you got into this field a little bit. And specifically, you know, a big focus of Kubelist is about the open source stuff.

And so tell us how you found out about open source, maybe t he first project, forget SBOM for a second, but the very first thing you worked on. Just a little background, you know. For me, for example, Commodore 64 was my first computer.

And yeah, I don't remember what my first open source contribution was, ironically, but I'm asking you what yours was. So I'd love to hear a little bit about where you grew up, and how you got into this game.

Justin: Sure. So when I was 7, my parents bought a Commodore 64 Plus/4, not a Commodore 64. So it seems like it should be better but it actually meant it couldn't play a lot of Commodore 64 games and do things like that. It was more made to do spreadsheets or other things.

But I was really fascinated by Commodore BASIC and would sit and write my own little programs that were really horrible, as anybody who's trying to self learn as a 7 year old is likely to do. And then my parents continued to buy other PCs.

I did a lot of programming in GW basics and then event, you know, for DOS 3.2 and then eventually graduated to QBasic which was nice because for the first time there was an editor. I didn't have to just type in a line number, you know, in front of every thing to tell it the order in which the lines of my code would go.

And eventually I started to learn about things like, you know, how to actually use subroutines and functions and not just put go tos all over my code. I got a little bit of like structured programming experience, but really there were a few eye opening things.

So I went off to do an undergraduate at the University of Arizona and I actually out of high school I was active duty military in the U.S. army. And I was doing networking and IT, but in a much broader sense than people usually do IT now.

I basically had an entire Internet service provider in a van that I was running. And so I had to manage everything from like Windows domains and DNS and UNIX servers and BGP. I had to set up Cisco routers and interface, you know, all of that and make sure everything worked across you know, substantial area. So there it gave me a lot of breadth in those sorts of things.

And really to that point I had thought I knew basically everything there was to know about programming. I viewed programming as like, as computer science and programming is just almost being the same thing. And then I started to take classes at the University of Arizona, you know to get a degree which I'm very thankful that my, my mom and my dad, especially my mom, had kind of pressured me to get a degree.

And the first class that I took was actually the like capstone course in operating systems, usually the last class you would take as a computer science student. And I had went and talked to the instructor there and said, "oh you know, I've written device drivers" because I had written a mouse driver and did a bunch of other stuff like this.

Like, "I know operating systems and I know things well and I just kind of want to be done with this. Let me take this really advanced class that has like, you know, all of the prerequisites from the program."

And the instructor, you know, became a great friend of mine, Patrick Homer. And he was convinced I would fail out in a week or two and was surprised that I stayed in. And for me it was the opposite.

My eyes were completely open to how deep and interesting and how much there was inside of computer science.

Like the fact that different algorithms had different run times and that you might not just want to stick everything in an array. And I started to really understand why people use linked lists and stuff like that, that as, you know, people today, you know, you would get in a very early class.

But that really got me hooked on this idea of there's so much interesting knowledge out there and so much depth in computer science, it made me very passionate about it. And so I went off and pursued a PhD also at the University of Arizona.

And the first piece of software that, like the main piece of software I wrote for my dissertation was a package manager called Stork. And at the time the research community had this big network of computers that were all like, every campus would go and usually spin up two of these computers on this network called Planet Lab.

And you could get an OSTM there on each of those and be able to run your experiments. And this is what networking researchers were using to do research at the time. This is when like distributed hash tables, stuff like this were just first starting to come out. And people were really starting to do a lot of network measurement. And there was a lot of advances in CDNs and stuff like this inside of the research community.

And so at the time people were just running an individual package manager that would run and just redownload the same files and reinstall the same files in every single one of these virtual machines, which is really disk inefficient and really network bandwidth inefficient. So I created the first package manager for this, designed specifically to address these problems which this thing later evolved. Like this architecture of having a bunch of distributed systems that have their own OS VMs is of course what we now call the cloud.

And so I really built like the first package manager designed for that. And after I had gone and used this for a while and built and done things, of course being a fairly security minded person, I tried to think very carefully about like all of the security issues that could occur and what could go wrong and stuff like this. So like the security system in Stork was very resistant to attackers who had like compromised a mirror or were otherwise being a man in the middle.

And then when I went to kind of write things up for my dissertation, I looked at other Linux package managers and found they were all like very vulnerable to mirrors, to an attacker setting up a malicious mirror. And you could basically compromise all of the users who came to your mirror without compromising any keys or anything due to vulnerabilities in the way that they used package metadata.

So if I can take you on a slight tangent for a minute and I'll come back to it. The packages have dependencies, as we all know. So, you say, hey, I want to install package foo. Well, in order to make package foo work, you have to have package bar.

And it turns out that at the time, all of this dependency information wasn't actually being protected correctly. And so you could go and you could manipulate the dependency information and cause package managers to install a bunch of stuff that wasn't relevant.

And since the keys that they used to do these signatures had been the same and were basically unchanged over many years. Six years, eight years, something like it. You could say, oh, in order to install this thing you want to install, you need a like 6 year old version of sendmail to run that has all kinds of known remote code execution vulnerabilities in it.

And it would install it and run it on the system. And so this was obviously very problematic. And so when I found these, I wrote a paper about it, I reached out to the CERT, which at the time meant making a phone call, or at least getting on a phone call after sending a couple emails.

Benjie: Wait, so Justin, just for reference, this is like 2006, 2007, maybe?

Justin: It's like 2008. Yeah.

Benjie: 2008, okay.

Benjie: Yeah. Just to give folks that are listening context, I guess technically AWS started I think in 2006 with S3. So I don't even know if EC2 was a product yet when you were working on this, just to give people context of how early this Stork stuff was.

Justin: Yeah, we started Stork in 2002.

Benjie: Okay.

Justin: So, at least no one had used the word "cloud" or anything like that.

Benjie: Was it Git? Did you use Git or was it like SVN?

Justin: No, I mean we didn't use Git at that time. I don't recall when Git was created, but I think we used SVN to manage our software. But it might have even been CBS, which is like so dead.

Benjie: I feel like Git was actually later in 2000 and not the point. I didn't mean to distract you from here, but I just, I think that it's interesting you're talking about problems that we have today or similar, and you were trying to address these, starting with 2002.

Marc: But when you created Stork, Justin, if I understand, you weren't like, oh, supply chain is a mess right now, I gotta solve that. You were talking about efficiencies in package delivery. And then in the process you focused on security and just realized this is exploitable if we don't address this problem a little more broadly.

Justin: Yeah. And that was actually very surprising to me. I had just sort of thought that someone like one of the major distributions, like they would have thought of things like all of the things that could go wrong and really focused on that.

And so that was actually very, a bit of a wake up call. And the second wake up call is when we went through the CERT to disclose this, the response that we got back was, "well, you know, that's not actually a security bug. It's just our design that we just like trust these mirrors and like no one bad would do something."

And so I went and rented hosting from a hoster that was renting to botnets and spammers and set up mirrors for five of the most popular distributions. And it got added instantly. They all just automatically added me. And I was serving packages to banks and military computers and all kinds of places that obviously you wouldn't want bad people to get into.

And you know, reached out again and they still were like, no, no, no, this isn't part of our design. Like our, you know, our system works fine. And so, you know, what did you do back in 2008 if you wanted to get the word out, you got an article on Slashdot which I think is basically a dead website now. Or I don't know, it's probably not very relevant.

Benjie: I think it's still, it still lives. For those that don't know, it was basically Slashdot, then Dig, and now basically Hacker News. But like Slashdot was the original one. We would all talk about stuff.

I remember specifically one of the only websites that was working during 911 was Slashdot. I don't know if anyone else remembers that. So for those that don't know, we're all dating ourselves, but Slashdot was the cool place to be.

So you literally are making your own mirrors and you're injecting the entire ecosystem with.

Justin: Oh no, no, no, we didn't inject anything. We did our own testing on our own separate instance to show we could compromise all of these. And then, then we went and got this kind of like really nice snarky article out on Slashdot on the front page, which, of course, at the time, you know, there was also the phrase of being "slash dotted," which meant that you had so much traffic going to your site it went down. Which was usually like, more than a few users at a time, for a lot of websites.

Benjie: Yeah, I think it was like 10 plus users at once, right?

Justin: Yeah, yeah. Anything that was, more than that a typical web server could handle. And so our site got clobbered, but people got enough of the info from there that all of a sudden then all of the package managers were very concerned about this as an actual problem and recognized it.

And so APT, YUM, YaST, and Pacman, just to name the major ones adopted fixes from the security model that we did in there.

And what that really also taught me is sometimes you do have to get people's attention a little bit. But also that as an academic, I had done all the right things.

I had written a paper, and it was in a good conference, and no one cared. No one had read the paper outside of my peers, and no one really cared. And so then I really started to think about, like, how do I get the stuff that I'm doing out into the real world?

Because one of the things I've always thought about is, like, I try to think about my retirement party and what I want them to say about me when they're talking about, like, my career and summing it up and then work backwards from there. How do I make those things happen?

And so really, you know, I've been focused on how do I have impact in the real world? You know, if they say I have 50 papers and good conferences, or 40 papers and good conferences or 80 papers, I don't really care that much.

But if they say I built these systems that mattered and are used to underpin important parts of society. And I have all these wonderful students that have continued that legacy in other ways, as I' ve been fortunate enough to have, that's what I want to have said.

Marc: Yeah, that's cool. Fast forward to now, you know, sure. Package managers understand security at this point. I think we can attribute a lot of that to the early work that you did there to kind of push them to, like, this is actually an exploitable problem. You can't say, like, our architecture doesn't lend itself to security, so it's not a bug that's no longer a thing.

But the way we just in general build software today, it's not just a package manager for Linux. Right? You end up with the simple looking website you go to is pulling in, you know, a thousand, ten thousand npm packages and have a CI pipeline pulling in all these different GitHub runners that are coming from who knows where.

And it basically just takes the threat model completely out there. So anything along the way, like one exploit, that's all you really need. And then you end up with remote code execution on your servers and like people exfiltrating data.

So you work closely with the CNCF process and creating projects that are part of the CNCF and staying pretty active in there in some of the tags too. I'd love to hear a little bit about, let's start with just generally the work that you're doing with the CNCF and then we'll move over to some of the projects if that's okay.

Justin: Sure. So I am a tech lead inside of TAG Security and Compliance. Tech leads are-- So the chairs tend to be the responsible people that need to be in all the meetings and need to do the things to keep things going. At least that's my perspective.

And the tech leads are kind of like the house cats that tend to go out and find really interesting things and bring them home. At least that's my perception of the way that the group has historically worked.

So for instance, you might get excited to do a security assessment process as I'd mentioned before, or write a white paper on a specific topic like when you should use SBOMs and when you should use attestations or create a software supply chain security catalog of past incidents and things like that.

So there's all sorts of little one-off things that people can go and do and that our tech leads and others have done an excellent job doing. And you know, I really like that.

One of the reasons why I like being an academic is I have a tremendous amount of freedom. Other than showing up to teach a class periodically, I basically get to plan and work on the things I want to do.

And so we often do fun things like have mock debates about, you know, is the security of MCP servers for AI in a good state now? Or have mock debates about like CrowdStrike, you know, was the company doing reasonable things or not? You know, where someone gets to take one side of that argument and someone takes the other side and we each try to make the strongest points we can, which in some cases is harder than others. But yeah, it's always, I think, a fun experience.

Marc: Right? Yeah, Some of those examples I'm like, well, I know which side I'd prefer to argue on that one. But it'd be fun to have the argument both ways and listen to the other side.

So you basically, as the tech lead in TAG Security, you're kind of thinking not just like, what's the next project I can build? But like, you're actually looking at other projects that are in the CNCF that are like, what mostly graduated projects do you work all the way down to sandbox level?

How do you actually do that? And what do you actually help them do?

Justin: It's a bit of everything. So there's I think, six tech leads now in TAG security and we are very much like I said, like the house cats who can just wander where we want.

So we'll have some discussion about like, hey, wouldn't it be nice if we put out guidance for people about a good way to do this? Or wouldn't it be nice if we curated a data set that had this information? Or wouldn't it be nice if this or that happened?

And really what being a tech lead lets us do is it lets us release it as a work product under the CNCF and it goes through a bunch of steps after we initially, say, hey, this is a good thing, like this would be a good blog post, this would be a good whatever, this would be good so on.

But it goes through then a process where it gets approved at multiple levels and then it becomes like a Linux Foundation thing that gets released and that of course it has a lot more reputational weight and things like that than just something that I put up on my lab's website or similar.

Marc: Cool. You said a lot of terms specific to this industry, like SBOM attestation, and there's other ones too, like CVE and signing and SLSA compliance, SLSA levels, things like this that are actually really important.

We have a pretty technical audience generally, but I think it'd be good, if you're up for it, to give a couple of minute quick overview of what these mean and why they're actually relevant, just so that we're all sharing a vernacular for the rest of the conversation.

Justin: Okay, so at a high level there's a lot of different steps that happen when you make software. There's things that happen on like, well, the developer's going and using their IDE to create software. There's things that are happening when Git and other version control systems are committing software, you have usually like some kind of build step or build process, you do some packaging, you also do testing throughout.

And some of this gets rolled together into things like CICD systems and can get kind of combined where maybe your IDE might go and also interact with your Git repository or do signing or things like that with it.

But in general there's all these individual different pieces of the software supply chain and there are individual projects that I've targeted that target most of those individual pieces in, or at least I guess all of those pieces, in varying levels of detail.

However, there's a couple of things I want to talk about that go across different levels and are helpful to understand. So one of which is, when you think about things like SLSA or you think about--

So SLSA is basically trying to give you guidance about how you should do different steps in certain ways to get different security properties out. And SLSA itself uses an attestation framework called in-toto, which in-toto is actually one of the projects that I created out of my lab along with Santiago Torres-Arias and is a way of providing like a cryptographically, like a computer checkable kind of signature about the fact that these are the things that happened at this step.

So the way you can think about it is that normally when you sign something, you just sign and say, "I am saying this is okay."

Marc: Yeah, the signing is basically saying who's behind it. Right?

Justin: Yeah, it's like putting your signature on a document to say, you know, usually you also mean you agree with whatever is in the document in some way. And an in-toto attestation in addition to that also does things like says, "these are the files, these are the files and the secure hashes of them that came into, let's say my compiler and these are the files it touched while it was running. And then also here's what it produced out, here are the binaries it produced."

And the advantage that gives you is that if there's some tampering or some other problem that's happened, you know what came in, you know what went out and you know what happened. And you can more easily figure out either what went wrong or you can check things.

Like if the compiler's only supposed to be building things that came out of your version control system and there's a mismatch where the things out of your version control system were substituted like new files were substituted in, then you will be able to detect that there's something wrong, that the outputs of your like source control, your version control system, are different from the things that came into your compiler.

Marc: And that's because in-toto is looking at-- Like you're not talking complete reproducible builds where given these inputs, this is the exact output every time. But you're just saying, look, I can go to Git and I can say I built this, commit this tag, this release. And so I know the checksums, I know the list of files that were in that. And you're able to basically independently inject that in and create an attestation that says, "you said you built tag 2.0.4. We looked. Here's the inventory of what was in the tag that was actually on the file system in the build environment."

Justin: Yeah, I think that's a good way to look at it is that it tracks things as they go through your supply chain and gives you this cryptographically verified, provable information about it.

And it's very simple, it's very non opinionated about like what it's tracking. So you can say like, my lawyer signed off on this open source license and they can write that as an attestation, for instance.

You will have things like your test runner go and attest that the unit test passed or failed or it met whatever threshold or things like that. And then in the end, you know all the right things have happened for making the software. That's like in-toto attestations at a really high level.

And then as a comparison to this, they're quite different things. But there's also SBOMs, which stands for software bill of materials. And the analogy that everybody uses for SBOMs is it's like the ingredient label on a candy bar or something like that to tell you like what went in it.

And SBOMs are usually generated today by taking a finished thing and then trying to look at it to figure out what's in it. So it's a little bit like you baked a pie and now you're going to smell and taste the pie and write down the ingredient list.

Whereas in-toto , attestations are more like as every ingredient comes into your kitchen, you take a photograph of it and then you say like, look, these ingredients are the ones that came in. Right?

And so one of the efforts that I'm involved with now is trying to work attestations into the SBOM format in a way that they'll have provably correct information about their ingredients, because one of the dirty secrets in the SBOM community is that SBOMs are actually not very accurate in practice.

They're really good for compliance because they give you someone else to blame. You know, if someone tells you this is what's in my software and they're wrong, then it's kind of their fault. But from a security standpoint, you care a lot more about it being actually correct.

Like, you want to know, is there something else in here that shouldn't be there?

The difference between being 95% correct and 100% correct is actually really substantial in security.

Marc: Yeah, if you're Coinbase and you accidentally have some malicious software running inside there and somebody steals a bunch of Bitcoin, it doesn't matter who's to blame, you still lost the Bitcoin at the end of the day.

So like, security for compliance sake, but it's not actually that accurate. Probably not the thing to put a bunch of time and effort into. It's instead get it to be actionable, get it to be able to block this malicious software prior to distribution or running.

Justin: Yeah. And just to be clear and fair to everything there, it's not necessarily that most of the time it's malicious software, it's just there's little things that get missed by doing this process. Like, imagine you smell a pie or you taste a pie, you might not realize that, you know, hey, it had powdered sugar inside of it as well as granulated sugar or brown sugar as an ingredient.

Marc: So like in software that could be like, it's not malicious software, but maybe this thing I'm running has some GPL library that wasn't detected. And hey, I can't actually run GPL. And not knowing, that's not really, like I'm still violating the license if I'm running the software.

Justin: Right, or worse is you have like Log4j or something in there. Or you have something in there that you don't know about. And then some event happens and you're like, okay, well I have all these SBOMs for all my software. Let me see what's vulnerable.

And you go and you check and after you check you say, oh, I'm okay. I've cleared things up. I've taken a look at these pieces of software and they're fine, but then you missed a couple of things.

Marc: Yeah, so that makes sense. Let me say it back, make sure I understand. So we, you know, we generate SBOMs a lot of times we're generating SBOMs using another open source tool out there called Sift. It's pretty popular, but that's generating an SBOM from an artifact that somebody else produced. Often needed. Right?

Because if the developer of that other piece of software is not producing SBOMs as an artifact, the only way to really get one is to like introspect that binary, that container image that whatever it is and figure it out. But you're saying instead what in-toto is doing is let's push that like responsibility to not like generate an SBOM of what was produced, but let's generate it along the way because we can know everything.

Even if it was like a transient dependency, the attestation shows that it's there. You may choose not that it doesn't need to be in the SBOM because it's not delivered to the final artifact, but it actually puts more of the responsibility of the SBOM generation then on the developer, not on the consumer of the software.

Justin: Yeah, I think that's very fair. The SBOM community has also kind of recognized that they have a problem here and there's been pushes to try to make SBOMs in different parts of it and stuff. And they have a bunch of smart people and we've been working with them to integrate this in a way because attestations are also a much more machine checkable format. They're much easier to do in that regard.

They're just streamlined for things like that. They have all kinds of like key management, ability to rotate keys and do all sorts of other stuff like that all built in. So that's also a big advantage of that.

Marc: Cool.

Benjie: So wait, Justin, you were talking about SLSA. What's SLSA?

Justin: It's a set of effectively opinionated actions you should take to improve different parts of your supply chain. So in-toto just tracks what happens. You can say like, I'm just going to Wgit this thing off of this random website over HTTP and run the script. And in-toto's like, "okay, I've captured that for you. I now expose to the world that this is what you did."

And in-toto doesn't judge you and say this is good or bad. SLSA has different parts of the software supply chain. historically it focused more early on on the build parts of the software supply chain, but now has like a source track and other Things with it that says you should like if you want to be SLSA level one compliant, you should do these things. SLSA level two, you do this and so on as ways to harden it.

SLSA was an effort that had been going on at Google prior, you know, for quite a while. But when it came into the CNCF, it actually started as a sub project of the in-toto project and then after that it spun out into its own thing in the OpenSSF.

Marc: But SLSA is just definitions of these levels or is there actually tooling behind it to verify it? Or is it just self-attestation that like on level three?

Justin: No, there, there's tooling behind like you generate in-toto attestations as part of it and it can get checked and validated.

Marc: Oh cool. So in-toto sounds like a cool project. I think we're going to talk a little bit more with Santiago too about that project really soon.

What other projects in the CNCF have you been contributing to, working on, started? Anything like that?

Justin: Yeah, so I created a project called the TUF Project which, we talked about those like kind of steps of the software supply chain. The TUF project, which stands for The Update Framework, is a project that focuses on securing things like registries or repositories, any kind of like software repository from very serious attackers, from like nation state actors or other people that reasonably could be expected to be able to break in and compromise at least some, you know, the server itself, some of the signing keys and do other things.

And so one thing that it's really designed for is it's designed with the assumption that some parts of your system will be compromised, that someone will be able to register malicious projects on there or will compromise the CDN node or will even break into your registry.

And it minimizes the damage that attackers can do based on their access and also gives you secure ways to recover from this. And so because of this it's actually quite widely used not only in the cloud space, but also like across things like outside of this.

So things like there's a version, an automotive variant of it that's used not only across like lots of the major OEMs are either directly using it or have changed their architecture to incorporate parts of it. And that project's name is Uptane.

This is also a standard, an IEEE standard and a JDS standard where basically the entire server side is almost entirely TUF and the client side is basically TUF, but has a bunch of other stuff to work well in automotive and embedded devices. It's actually used on millions of devices that are not automobiles also, like medical devices and factory controls and things like that.

And it's used for things like Sigstore's sroot of trust is distributed using TUF. And TUF is also used to protect things like actually the law, the legal code in Washington, DC, in Baltimore, in lots of other cities.

In fact, hopefully by the time this goes live, we'll also be live in the state of Maryland. That'll be our first US State that's rolling it out to protect their legal code. But it's being used as, if it's something where you want to have security that is meaningful security, even if an attacker breaks in, TUF is really the only solution today that does it for that part of the supply chain.

Benjie: Wait, sorry. This Maryland thing sounds super cool. Can you explain to me how this applies? I don't know if I made the full connection.

Justin: Sure. So in fact, if it's okay, I'll talk more about DC because they're live and they were the first adopter of this.

Benjie: Well, we know that D.C. is not a state and there's no representation there, but we'll allow it. Go ahead.

Justin: So, so basically, governments have ways of making laws. And there's a process that they go through to say like, this is the law. So they have some legal code a lot of times that they produce periodically.

And I also should say, I'm not a lawyer, by the way, so please, treat this as a high level explanation, not a detailed treatise on law or how to do this. And then when they want to go and they want to publish something, they want to change a law, they want to make an amendment and things, they need to get that information out.

And the way that this historically is done lots of places is they literally send it to someone who prints these big books, and those books get shipped out to all the law libraries and all the lawyers who subscribe to them.

And that is like the actual real copy of the law is whatever got printed in those books. And of course that's unwieldy and expensive and problematic. And so there's been an effort to try to put all these things online digitally. So what we have is we work with a nonprofit called the Open Law Library, where their mission is to go and help to produce like, open, accessible versions of the law that anybody can go and look at across different organizations.

And so they worked with us, they took TUF, and they built a system called TAF which is the archive framework. TAF is basically TUF and Git. And also they have to have a little handling of how we do things like know when laws are passed and when they go into effect. Which is actually very complicated in the legal sense because you can pass a law today that says that a law that expired two months ago, treat it like it never expired and then people need to be able to know things like what would somebody a month ago have thought the law was?

Even though that isn't what the law actually was because the amendment was passed later. But TAF is something that's being used across more than a dozen, couple dozen governments now and is used to protect the way that they produce this legal code so that if you go and you get charged with some violation in D.C. or Baltimore, San Mateo or a bunch of other places, you can go and look up and see what the actual law is yourself, your lawyers and others can look at it and you can know you're getting an authentic, correct, auditable version of it, even if a nation state actor tried to break in and change the laws, as has happened in the past in Europe.

Marc: That makes sense. I actually like that TUF doesn't focus on, hey, I'm just going to secure this and keep bad actors out. I think there's. The defense in depth is really, really critical. And so saying like, yeah, secure your server, but you have this distributed system and let's assume something's happening you don't know about. H ow do we ensure the integrity of that thing that we can mitigate against that?

How does TUF do that? Is TUF actual software I'm running. I s it fair to say that I could use it to help secure like a Docker distribution? The Docker registry that's containing OCI images, right?

Justin: Yeah. Actually one of our first major adoptions was Notary V1. And that was the, the Docker folks that actually had heard about some of the work we were doing. We were working with folks in the Python community and elsewhere to try to push TUF, out with them. And the Docker folks ran with it and it was terrific. They were great partners.

But yeah, you can, you absolutely can use it to do that at a high level. It uses a bunch of different principles. But one of the key principles is the idea that you have different roles that do different things and so your server might need to say things like, "has there been an update in the last 10 minutes? Yes or no."

And your server also might need to say things like, "this hash is the hash of the most up to date version of this package."

And the level of trust you have is very different for those two things. Like if someone is able to say there hasn't been an update, when there has, and they're able to delay you from realizing there's an update for a few hours or days, that's really not that big of a deal compared to being able to give you an arbitrary package that has whatever software they want in it that you shouldn't be trusting. Right?

So TUF has different roles that work at different levels that protect against different types of attacks. At the lowest, the kind of weakest thing it does is timestamp, which is, is there an update or isn't there?

Above that there's protection for things like rollback and replay attacks and also a type of attack called mix and match attacks where you're able to take different versions of different packages or files on a registry or things like that and cause them to be available at different times.

And then above that you have like the signing of the actual images, or the packages or the laws or whatever it is themselves. And then at the highest level you have the root role which is able to do things like revoke trust and other keys and is really used when there's like a very serious compromise, but otherwise isn't that heavily used in the, in a day to day operation.

So that's a really quick primer. I'm happy to go into more detail.

Marc: Cool. I think that makes sense. Benjie, does that answer your question?

Benjie: I think so. I need to go talk to Claude about this after this podcast, I think. But yeah, it's really cool. And I think these different types of attacks that you're talking about are illuminating for me. You don't always, you know, there's the obvious stuff like you know, the SQL injection, all this stuff, but there's all kinds of weird surface areas that you don't think about that obviously you've been thinking about for your entire career and that's why these frameworks kind of exist.

Justin: Yeah. One more thing I want to say is that, one thing that a lot of people miss out is doing things like rotating keys or revoking keys, is actually really painful operations.

So if there were a compromise of like pretty much all other non-TUF derivative systems out there, then the way that you would handle this is you'd have to try to push out a new update to say replace the key you're trusting now with my new key. And the problem is that you have to do this over your normal update mechanism.

So if the attacker did that first, then they now have a key they control and they can control all your users.

Marc: Right. You're likely using a compromised system.

Justin: Yeah, and this has happened historically, this is what's happened when Debian and Red Hat and others have been compromised in the past is they just like, well, you know, come get an update sign with the key that was compromised. We, we we have our new key there.

Benjie: So from the very beginning it's really important. You need to have these things in place at, ahead of time or you just end up building on top of the already insecure thing. Like in my brain, my mental model, it's kind of like, well, you can't ever commit any type of key to a Git repo because technically you can still go back to the history and see it and once it's done, you're done.

Justin: You know, it's interesting you should talk about that. We have another project called Gittuf which is actually a layer on top of Git. The TUF part of the name, maybe overstates the connection with TUF. It uses some TUF design principles, but it doesn't have just TUF like plugged into it.

But we did a bunch of work on Git security and source control, like version control, system security and you know, actually some of that we upstreamed into Git itself through like the way they do tag signing and stuff like this because it's actually possible to do these types of attacks called branch teleportation attacks. Even if everybody signs all their commits on repositories.

A branch teleportation attack lets you go and move branch pointers wherever you want in a way that's not that other developers we found will just not notice. And you can use this to do things like take experimental features that never should have been merged and have them get merged into master or do things like have security fixes get omitted from your master branch.

Marc: That's interesting. That definitely feels very much kind of in that category of, you know, you could say, well yeah, that's kind of the way Git's designed like branches are immutable in Git. So how do you solve that? Like it's an interesting problem.

Justin: Well it, it turns out like the branch information isn't signed in the way that, that you would think. Like commits themselves are signed, but branch pointers are not in, in a way that makes them immutable. And actually tags weren't either until we made the change. Tags were signed.

But like at the level of this podcast, I won't go into detail. I will say we have a paper that we produced that talks about all of this and we have implementation available that people can look at to see this.

But the long and short of it is, is that there are a lot of concerning things you can do even if you're signing your Git commits. And so for that reason we created a project gittuf, which has all of the security protections and controls that we think should really be there inside of Git. It does things like has, you know, whose keys you should be trusting, like who are the other Git developers.

And usually you use something like a forge for this. You use GitHub or GitLab or something like that in order to find out this information. But you know, this information, you just have to trust that GitHub gives it to you correctly. It's not something your client verifies or knows about.

Which if you think about it is a little bit odd because when you go to pull files from like GitHub, you check the SHA hashes of it to make sure that you're getting the files correctly and you check commit signatures if you have the key.

But for some reason we don't check any of the metadata about who are we supposed to be trusting? And we don't check the code owner's policy locally and we don't check other things like that. Most of the verification that these forges provide are done server side and we just have to trust that the managing entity has done everything correctly.

And, you know, they have bugs in that software. So by putting that information and pushing those checks and things client side, you get a lot of benefits. One of course is that you can see all the checks and do all this validation. It also works across different types of forges.

So now you can much more easily migrate and do things in there, do things across different types of forges. And actually to one of Benjie's points that teed me up in this whole direction is we're also adding a bunch of things that are really not very possible to do in a good way on forges by putting them on clients.

So one of the things we're doing now is we're adding like enforced client side hooks in that do stuff like secret scanning. Because what happens today is even if you have a secret scanner running on your repo, the thing is still committed to the git repo for a moment and it's out there, right?

Benjie: Yeah.

Justin: And so we can stop that from ever happening by enforcing those to happen on client side. To another point we also are working on doing things like allowing, like having a transparent way to encrypt and hide some data inside of a git repo.

So if you do have configuration files or keys or things, you could actually put them in your git repo and other users that go and look at it are going to see an encrypted blob there. But you and others that have access and you can by the way limit whose access. You could have access, but Marc for instance, you could decide he doesn't have access to and you would be able to go and use it normally you wouldn't notice anything different on your side, but anyone who doesn't have access would see encrypted data there.

Benjie: So this is just like an extended git client.

Justin: It's an extended git client and in fact it's already being used today in Bloomberg and a bunch of projects in the Linux foundation are already using gittuf. we'd love to have more people go and play with it.

Some of the features that I mentioned here just a moment ago are things that are not out. But it does provide you with a lot of policy support, the ability to manage users and things very well and stuff like out of the box today, which is why we have a bunch of users already.

Benjie: So what's the centralized piece for the-- Say, I've decided Marc's allowed to see my secrets blob now. How does that work?

Justin: So what you do is when you set up a repo to use GitHub, you say, oh, here are the people who are going to control it and, and do things with it. And that information gets written into the git repository itself. This is, by the way, not in a place that if you just like browse on GitHub or GitLab or whatever, you won't see it there because it's written like where the git metadata store is.

But it's written in a way that's also backwards compatible with Git clients and also backwards compatible with existing git forges. So you can have a mix of users that use gittuf and git and you can have a mix of users, you can be pushing things to GitHub, to GitLab, to just a normal vanilla Git repository. It all works and does that.

Benjie: At the end of the day though, Marc with his going to Starbucks and leaving his laptop unlocked thing is still a problem.

Justin: It is, but now you can recover.

Benjie: Yes, sarcasm for those listening. That was a joke.

Marc: Yeah, I don't drink Starbucks.

Justin: Yeah, he leaves it unlocked other places.

Benjie: He's like, "I don't drink Starbucks."

This is super interesting. I personally, gittuf, to me, is solving an issue that I have had with git forever. I've had these conversations forever.

We're kind of running a little low on time, but while we have you, there's this new topic some people are talking about. It's called AI. I don't know if you've heard of it.

Justin: Is that A and then I? Okay, got it.

Benjie: Yeah. No, but in all seriousness, like I will say that for the listeners, my curmudgeon-ness has started to wear off and I'm a Claude Code user now. It's crazy. These things are moving very fast.

I do think that as someone that's been looking at supply chain since the early 2000s, 2002, if not even sooner, I've read some articles and they say, oh, security is going to be better because you're going to have more surface area that can be covered by the automated systems.

I've read things that say, oh, well now you're going to be able to find a lot more zero day and other weird things because of these systems. What's your hot take on this? Like, is it going to get worse? Is the battle just going to move to the next level?

Justin: Yeah, I think it's going to be both, but I think medium term it will probably be better.

Right now with AI, a lot of this is a disaster because it's all moving so quickly and everybody's doing land grabs. I think MCP is really problematic in ways that people don't fully understand.

I'm not as worried about software supply chain security as I am worried about more general people hooking it up to their email and hooking it up to other things and all the various types of prompt injection and other scamming and stuff. I think that's going to be really bad.

A lot of the code generation stuff and other things has gotten better as the technologies evolved. So I think there's some hope there. I mean people overly trust it for sure. And if it's trained on bad patterns, which I think some common patterns are pretty bad, it'll replicate those. But I think there's also hope that it can be retrained not to use those patterns and not to do certain things.

Marc: Yeah, I was reading a take really just like last night actually around, you remember like left-pad with Npm in that whole story, right? Like there was like a package in Npm that got taken over and everybody was using it to do some trivial little thing and then they ended up with some other functionality in their applications.

But like with AI, there are folks talking about like a decrease of using these like kind of trivial packages because AI is just, you don't need all these libraries. AI can just kind of build it yourself. Not necessarily going to shift the bugs around. Bugs are still going to exist because code is code. But I don't know, maybe it creates like this left-pad package that has millions and millions of million--

It becomes like this attack vector. It's like a sweet target you want to go after because you have so much surface area. And now you kind of change that model a little bit. You still have bugs, you maybe have more software bugs, but no longer you have to worry about. You know like you talk about gittuf, right?

Like the thought that was going through my mind on it is you push this client side, you, you don't really fundamentally you don't remove the risk here. Risk continues to exist. But GitHub becomes less of a suite. Like oh, if I get into there, I get everywhere. You know, you don't have to target each of these individuals orgs, their supply chains separately.

Justin: Right? Yeah, I think that's absolutely true.

We're in interesting times and I think anybody who says they have a crystal ball that lets them see more than a few months ahead is lying. It's all moving so rapidly.

Benjie: I will say this and maybe I think that people should have in their Claude markdown or whatever, make the pin packages. Like if that was just one of the things that the agent did. I mean I can only imagine how many, forget security, just how many bugs that that would avoid.

Marc: Benjie, how many of us just have like Dependabot running in a repo and like, like there's a security vulnerability. So we have Dependabot pulling the latest package unit. Test pass, we merge it, cut a release and like we don't really think about supply chain there. We think about solving like a cve. But like you're just pulling in some latest patch release, right Justin? Like I mean like that, that's challenging.

Justin: Yeah, this is going to devolve into another one of these discussions, but we actually have a bunch of work in this area doing almost exactly what you're saying and actually trying to tell you like with a very high degree of assurance if this is a completely safe thing to do or not.

Marc: Great.

Justin: But back on the AI front, one other thing I want to say about it is AI actually might be better because I don't know if you paid attention to the DARPA AIxCC effort, but they had AI go and try to find bugs in a bunch of widely used software projects and they put a bunch of artificial bugs in the commit stream that worked in different ways and the AI competitors were actually core quite good at finding almost all the bugs.

I think they found all but one of the bugs and they even proposed patches that fixed most of the bugs, like the vast majority of them. But what's interesting is they also found 18 bugs that were not put in there by the folks running the study.

So you know, and this is the first year of something that I think a lot of people expected would yield almost no results, including in fact from what I've heard, one of the contest winners was very surprised. They had thought like we'll compete in this, but we don't think we're going to be able to do anything or find anything. They were very skeptical is what I have heard through the grapevine.

Marc: I'm not super familiar. We'll include a link to the AIxCC Darpa challenge here. But like that's like real money for these, these prizes too.

Justin: Yeah, it was a super interesting effort.

Marc: So that's cool.

Justin: Maybe AI will solve a lot of our low level, you know, buffer overflow, whatever style bugs. It seems like it can find those types of things quite well. If it can find and fix them then great.

Benjie: I mean the SQL injection stuff-- I will say the other interesting security trend that I'm seeing because of the adoption of coding agents is type languages, for example. You know, these coding agents are better at that type of stuff.

Now Python is, I don't think Python's going away, but it's interesting nonetheless. So there's a lot of positives, a lot of negatives. I wonder, is there something at the foundational model level that these guys--

And I'm just pulling this out of left field, but is there something that these, that Anthropic and OpenAI should be doing to like bake in some type of anti-malicious something? I mean, I don't think that's very realistic, but I'm m just curious if anyone in your field is working on like--

Justin: I mean there's actually a huge subfield of researchers working on exactly the type of thing that is happening here. I don't know how many of them are plugged in enough. I'm sure that some of them are plugged in enough to these companies to be doing things.

But one of the advantages of academia is we get to point ourselves at a problem that we think is important but that may not have a direct financial benefit for an organization to do correctly and we get to work on that.

And so actually myself, like one of the things I've been interested in for the last, I don't know, three months or so is MCP security and have been like thinking pretty hard about that and how could we improve that and that's something that I don't know that would make anybody any money in the end but would make the world a better place.

Benjie: Have you come up with anything? Because I got nothing on that.

Justin: I do actually, but that, that is like a, I don't know how much we wanna, we wanna unveil, but we're probably gonna have another podcast if we get into something like that.

Benjie: All right, so MCPtuf? I'm just gonna call it.

Justin: No, it is actually very different.

Benjie: We'll save it for another one. Justin, I don't mean to put you. We'll save it for another one.

Justin: Yeah, save it for another time. So I'll come back in six months or a year or something and talk all about it.

Benjie: Well, you better do it in two weeks at the rate things are going.

Marc: Yeah, I will say though, like you're right, like I'm not going to make money securing my MCP server. But like the flip side of that though is if I have an MCP server that's just exposing credentials, nobody's going to use it. I'm going to lose a bunch of money. Right?

Like, I think you go back to the public visibility for the Slashdot story you told at the beginning of the episode was the motivation for folks to realize, like, we have to solve this problem. Sure. Like, I didn't want to go prioritize thinking through how to better sign images and prevent malicious mirrors from distributing malware potentially. But, like, thanks, Justin. Now you made everybody realize this is a thing, so we have to do it or else we look bad.

Justin: Well, that's why we give off the shelf stuff people can use. We don't want to shame people. We want to say, hey, look, here's a solution. A bunch of people are using it. Please use it as well. And come to our community meetings, ask us questions. A lways happy to talk.

Benjie: Speaking of which, when are your community meetings if I want to contribute or check out your CNCF projects? I know we've listed a lot, but if I wanted to learn more about this stuff, where do I find you, Justin?

Justin: We have community meetings. It's usually the first Friday of the month. And the TUF community meeting is from 10 to 11am, New York time. And the in-toto meeting is 11 to noon, New York time. And they both can be found on the CNCF calendar. The gittuf meeting usually follows them as well and would be on the OpenSSF calendar.

Benjie: Well, I mean, I feel like I've got another seven hours of questions for you, but unfortunately we're out of time.

Justin: I do want to mention one quick thing about the left-pad example. It wasn't somebody who put malicious software in. It's that the developer pulled the package down.

Marc: Right.

Justin: And that caused everything to crash.

Marc: That's a good clarification.

Justin: It may not matter to your point. I just want it like, because occasionally I say something wrong like that, and I'm really happy when people correct me so that I don't repeat it like a thousand times and everyone in the audience is like, wait a minute.

Marc: Yeah. I mean, I think the point is the supply chain. You're counting on these things to exist in the supply chain when it wasn't there. Your software is broken.

Justin: I'm not trying to nerd snipe you or anything.

Benjie: No, no, no. First off, thank you, but I do think it's an interesting thing because it's literally naming. It's like talking about, like, the point of the left-pad thing is that, like, for example, Shipyard. We got GitHub.com/shipyard. And the way we did that was I found the guy who had it before he canceled the org and then I immediately went on to GitHub and claimed it.

So, like, that's the problem with npm, right? I t's like a name squatting thing. And it's not necessarily malicious, but it can cause all kinds of unintended consequences. So that's a different issue that we should probably get into.

Marc: Supply chain security is, there's a lot of facets here.

Benjie: Justin, thank you so much for coming on. There's so much stuff to learn and dig into. I've got seven different tabs open of things I'm going to be researching.

I really appreciate the time and we look forward to having you back on and we're going to talk to you about some of this other security stuff. MCPtuf.

Justin: All right, awesome. Well, thank you so much, Benjie and Marc, it was terrific meeting you as well.