In episode 27 of The Secure Developer, Guy is joined by Jeff McAffer, director of Microsoft’s Open Source Programs Office, who shares his insights on how to keep open source projects sustainable and secure for the whole community.
About the Guests
Guy Podjarny: Hello everybody, welcome back to The Secure Developer. Today I have Jeff McAffer from Microsoft with me on the show. Welcome, Jeff.
Jeff McAffer: Thanks for having me.
Guy: Thanks for coming on the show.
Guy: Jeff, before we begin, can you tell us a little bit about yourself? What do you do, a little bit of history of how you got there?
Jeff: My role at Microsoft is I run the open source programs office. We help drive policy, and process, and tools, and the culture change across the company. You might have noticed that Microsoft is changing its views on open source.
We try to help mature our viewpoints on open source and make the practices easy and smooth, because there's a lot to do when you're using open source. It's not free. You've got to do work to do it well, both our releasing and our consuming of open source.
That's what I've been doing for the last four year. Historically, I did a bunch of stuff in open source. I was one of the original guys on Eclipse, and I did a bunch of work in that space and spent some other time at Microsoft doing a few different things. But that's been my recent past, it's all been driving the open source program.
Guy: Does that qualify as going to the dark side? Going from doing open source to managing open source?
Jeff: I used to work for IBM doing Eclipse work. Of course when I ended up leaving and joining Microsoft, because this was seven years ago, that was the dark ages for Microsoft.
Guy: Pre-Microsoft culture shift.
Jeff: It was an interesting change, but it's a super exciting place to be right now with all the changes. There's a lot of stuff happening and a lot of evolution happening right now too.
Guy: Cool. OK, that's cool. What you do today is you manage these open source programs inside of it, maybe let's double click a little bit on how-- That's a Microsoft analogy there. Into "What does that mean?" Do you have a team there? How does that work, managing--?
Jeff: We've got a modest sized team. About a dozen people that are in my team directly, and we do everything from looking through the legal policies-- When we started this office a few years ago a developer at Microsoft had to answer 20 skill testing questions if they wanted to use a piece of open source. It was really prohibitive. We went through it with our legal partners, reviewed everything, all the policies and whatnot and just trimmed it down.
To make a long story short, we've gone from using dozens of pieces of open source a few years ago to using hundreds of thousands of different things across millions of different usage points within the company.
We have to track all of that, and understand all the license compliance issues, comply with all the licenses, understand all the vulnerabilities, and try to make our devs aware of that and track where they've gone and everything like that. That's one of the big challenges we've been facing in the last few years, and we do that in partnership with the security team, the legal team, and the product teams across the company.
Guy: What does it look like today if a developer wants to consume a piece of open source in Microsoft? What happens?
Jeff: We have streamlined that to the extreme, and we have-- Our mantra is "Eliminate, automate, delegate." First we eliminate any policies or questions or friction that we can find, if we can. Work really hard to understand exactly what's the risks, or whether the opportunities and the tradeoffs, and if we can eliminate any of the challenges there we just get rid of it.
We write policies now that are highly automatable. So you can write policies that say "Jane has to review everything," and that's not automatable because there's not so many Janes. But we write policies that are highly automatable and can take in data and context and spit out an answer. That's our automate phase.
Then there's some things that when you get to a certain point there's a risk that we're unsure of, or whatever. You need to pop out and a human needs to-- A lawyer needs to engage or a business person or something.
We've gotten to the point now where with integrations into our build systems, and these automated policies, and good data, we're at the point where about 99% or 99.7% of our open source usages are automatically detected and automatically flow through our policy engines with no humans whatsoever.
Guy: If I'm a developer and I wanted to use whatever, some NPM package, or some NuGet package, I just do it?
Jeff: Just do it.
Guy: Then the build system will scrutinize--?
Jeff: The build system figures it out and detects it, runs it through our policy engine all in real time. I'll say at build time.
Guy: Build time, yes.
Jeff: What comes out of it, we've moved away from a request and approve model, because that is a pessimistic -- to more of a register and review model. It just gets registered. We take note of the fact that you've got it, that you're using that open source, what version etc. and try to figure out the scenario. "What product is it going into? Is it a dev dependency?"
All that stuff, and then run it through the policy engine. It comes out and says either "You need to answer some more questions and get a review," or "You're good to go."
Guy: Would it break the build? If it didn't get a full bill of health, would I find out because you e-mailed me out of band? Or is it like, my system will stop working?
Jeff: There's lots of different dimensions. Teams can dial the knob where they want. We can technically break the build, typically we don't.
We couldn't do that in a central way because teams are all different points in their shipping, and some risks are tolerable, and all that sort of thing. We don't want to have a vulnerability come into our database and suddenly everybody's build breaks, and they can't ship anything, that's really disruptive.
We tend not to do that. You get in-experience warnings. Build warnings, build errors, that sort of thing. You also get alerts in the services that we offer, like in our git services. But we've also shifted left, if you will, all the way to the point where you get the warnings in VS Code.
If you're taking a dependency on something that has a vulnerability you get little red squigglies in VS Code that tell you about that.
But we've also gone further left and into the browser where when you're browsing NPM or NPMjs.com, or NuGet.org, or whatnot. You get a big red box if there's a vulnerability on that thing, and that's a Microsoft specific dataset that's feeding that. That will give you information about vulnerabilities or license issues or whatnot so that right when you're choosing the component you can choose wisely.
Guy: This is like an extension?
Jeff: Yeah. A browser extension.
Guy: It's something that's just installed by default on all--?
Jeff: It's optional right now, but it's available to everybody and it's got pretty wide usage.
Guy: It sounds like you've done this combination of-- you've built a lot, right? You embedded a lot into those processes. What's your criteria around, you have a bunch of these tooling, when do you choose to invest your own development resources and build those components versus taking off the shelf software?
Jeff: That's a really good question. If you pan back four or five years ago when we first started down this path, we were starting from a traditional workflow that was almost inherently point and click for users. There was no way that it was going to scale and there wasn't much available at the time that was going to scale.
We found the combination of that and a bunch of the quirkiness-es of the Microsoft code bases and engineering systems that there weren't a lot of tools that we would be able to integrate with. They simply weren't available.
We headed down a path of building pretty much everything. Could we have done that differently? Maybe, I'm not sure. There's been a lot of good advances in tools out there, and some of the things that we've got internally we are thinking "Could we make that a product? Should we make that a product?"
We'll see how that unfolds. Or in many of the things we're doing, we open source as well. In terms of my team, we've got a number of different elements out there that we've made open source because they're not going to ever be a product.
Guy: There's something, there's some good karma element of it when you're consuming-- Generally speaking, you shouldn't be open sourcing stuff, but specifically you would deal with the consumption of open source. There's something extra right about open sourcing that.
The irony of the Open Source Programs office not open sourcing things was not lost on us.
Jeff: We try to open source as much as we can, but some things we can't.
Guy: Let's dig into that a little bit. Let's talk. You teed up, you said there's a bunch of these deals that you have or that you are open sourcing. Do you want to talk about some specific deals?
Jeff: Most of those are not security related, but things like managing our presence on GitHub. We've got tens of thousands of developers and repos on GitHub, and trying to cross a hundred orgs, and stuff like that. Managing all of that is a real challenge.
Trying to keep all the cats in line and take care of all of that, everything from that to also monitoring GitHub. We've written a few things that harvest data from the APIs, and give us a good perspective on what our devs are doing and what the community is doing, how the projects are working, all those sorts of things.
A lot of stuff in that space is where we've been driving. I've got a project that we started called ClearlyDefined that is trying to crowdsource licensed metadata, because that turns out to be a real big problem, and there's potentially some security angles on that.
Guy: We can dig into that in general around community, because we're again shifting that a bit later on. But let's talk about the-- This is at the end of the day, we're talking about The Secure Developer, into the vulnerability handling aspect of the process. Walk us through a little bit of the system, and you have these components, what happens when a new vulnerability gets disclosed? What's the alarm bell?
Jeff: In something we're already using, you mean?
Guy: Exactly. There's some new stress vulnerability, or the equivalent? That stresses our poster boy these days for something getting vulnerable. What happens next?
Jeff: There's two scenarios that we see there. One is it's in something that's shipped already and isn't being built again, it shipped last month or whatever, and maybe the team has moved on to a new branch or a next release or something. Then there's stuff that's actively being developed.
Most of our tooling is integrated at the build time. I mentioned the shift left stuff, so for people who were browsing and looking for things, that's when you're doing active development. That's an idea that will prevent you from getting the vulnerabilities into your code in the first place.
But in this case, we've already taken a dependency on Foo version 1, then some new vulnerability in Foo has been discovered. So if it's being built we'll get the alert either way, because it builds and it's already happened, and we know that Foo version 1 is there.
Guy: OK, so for starters you track, that's in bill of materials, elemental--
Jeff: As soon as we see it in the build, we track that out in a system and it's tracking millions of different use sites across the company's code base. As soon as we see that we end up with an alert being raised in the engineering system.
By alert there, I mean that's our user visible banner across their UI if they're going and looking at the website in Azure dev ops. Most of what we do is in Azure dev ops. Go figure, we sell these products, we use them too.
Guy: Indeed. Top footing.
Jeff: It's integrated into Azure dev ops, but we also have facilities for emailing and getting reports, and you as an individual can go to this dashboard and it's personalized to every individual that goes to see "What are all of the vulnerabilities in any repo that I'm responsible for?"
You could just go and click on that link, and it shows you "There are seven vulnerabilities with these criticalities, or severities, and here's the repos they're in." That sort of thing.
Guy: From an ownership perspective, whose responsibility is that? Is it on the dev team to go? You provide into that portal, it's their responsibility? Is it the security team that's trying to push it? Is it yourself?
Jeff: Yes, that's a good question. I skipped over that part. When we get a new vulnerability, it depends on the severity. All that happens regardless. You get the alerts, and all that sort of thing. If it's a high severity vulnerability then we have a whole part of the company that exists independent of open source.
The Microsoft response center, the Security Response Center, and they will engage on the high severity things too. Severity has a number of different dimensions, some particular to the actual vulnerabilities, some to the business case and "What product is it?" And stuff like that.
But they will engage in a higher severity levels and drive a whole incident response process where they've got hotlines set up, and we're figuring out which customers have it or which data centers have it. All that muscle gets invoked pretty much, I won't say automatically, but that's a very practiced process at the company.
If it doesn't fall into that category, then we do have a set of standards in our development process that talk about vulnerabilities and an SLA around them being fixed. It does go back to the dev teams. They are made aware through the notifications and alerts and whatnot that they have these vulnerabilities, then they have a time period to address them, and various dashboarding and reporting things help them stay in the SLA.
Guy: So the SLAs are-- OK, cool. The different levels are, one set of security components are about active development, when you add a component that has some security problem or license problem for that matter, then it would it would flag. You'd get notified, you'd engage, that's when you added it.
Second is when a new vulnerability gets disclosed, if it's a high severity, it goes to the Security Incident Response Team and they make a determination based on whatever information available to them. If they sound the alarm or not, and the health element of doing that as a long term bid comes down to that SLA definition.
Jeff: Sure. Like I say, for the lower severity things, that's just business as usual for the development teams there. We've always-- Independent again of open source, in that they've always had this heartbeat or drumbeat of "There's a vulnerability in something you're shipping or building, you've got to go and deal with it. "
It's almost business as usual, it's just that the volume goes up because we're using so much other code that the team themselves didn't write.
You own more code than you're able to write.
Jeff: Exactly. Absolutely. That's the wonder of open source.
Guy: Indeed. Cool. Let's shift gears a little bit. Thanks for that, this outlines the way that you manage, and you control open source. You deal a lot, you open source yourself, and you work a lot with the providers and with the maintainers, with people that are writing open source themselves.
How do you find those? Are there projects that are more happy to you, or not? What makes you happy in an open source project that you see?
Jeff: It's interesting. It varies quite a bit from ecosystem to ecosystem, and there are certainly ecosystems where there's more trust because they've shown historically that they're more attuned to security issues. We feel more confident about that.
Generally speaking when somebody is choosing to use an open source component, we look at individual devs at their desktop and go and look at various aspects of it, and we hope we try to guide them to look at security things, security related topics. To hopefully a large extent they do. But the interesting thing, the producers-- We produce a lot of open source ourselves, so we're in the same boat.
There's some simple things that help folks understand what's going on from a security point of view, and it can be anything as simple as having a clear statement about how to report security issues. That signals a bunch of things.
One, obviously it tells you how to report security issues, but the other is it tells you that this project thinks about security and they understand that security is a topic. That they have put in place a process and it might be through their umbrella foundation, they might have done it themselves, on their own.
Either way, I feel more confident now as a consumer and somebody who is looking to engage with that project, that I can talk about security issues with them, that I have the means of doing that and that they're receptive to it.
Guy: We've observed in our state of open security, not from this year but from the one last year that we've done and we asked about, "Do you have a disclosure policy yet?" The statistics were very clear to show that if you have a disclosure policy then you're more likely to get reports. It was-- it sounds obvious but it's statistically verified that you will get more reports because you're guiding people about where to go. You're right around the security consciousness. It's like you've taken a moment to think about security.
Jeff: In interest in full disclosure to our listeners here, Microsoft has not done a great job in that regard. But there are, a lot of our repos don't have these kinds of disclosures on them, and that's one of the things that we're working on in the next few months.
Guy: Keeps you busy.
Jeff: Getting all that stuff into shape and really clarifying for people how to engage.
Guy: I love it. A lot of the Apache projects I've observed have a good security .md file there--
Jeff: Eclipse, Linux foundation, there's a lot of good high quality projects out there that are attuned to security issues. One of the other things we found that's interesting is when you get these vulnerability reports in, oftentimes these days it's easy to get a thousand NPMs on your machine or in some Docker image or something like that. It's easy to consume tons and tons of open source, and the dependency graphs get really deep.
It's easy to have a vulnerability in something that's 10 levels deep in your dependency graph, and you've never heard of it because you never used this thing at the top and it brought in this thing at the bottom.
One of the things that becomes interesting is a lot of these vulnerabilities are things like DOS vulnerabilities and that sort of stuff. Whether you're vulnerable to it or not is something you need to go and look at. There's a couple of things there, one is if you understand a little bit about the architecture of the project.
If projects help people understand that, you can understand simple statements like "We don't take any regular expressions from outside, our APIs have no regular expression surface area." Then that whole class of vulnerabilities now is immaterial in general to that project.
You can know just because you're using something that has a vulnerability doesn't mean you're subject to that vulnerability. You typically have to use it in a vulnerable way.
There's almost no packages that we blacklist, that are just outright bad. It's only the ones with malware or something that are outright bad, they're all based on how you use it. Projects having a little bit of a security oriented architecture discussion is super useful, because as a consumer and somebody is looking to engage in a project I know how data is being treated, how code is being executed. I can come and help find vulnerabilities, I can be more confident in my use of it, etc.
Guy: We at Snyk-- Snyk is free for open source and all that, sort of a flag but we have the ability to put a badge to say whether you're dependencies are vulnerable, and it always felt almost counter-intuitive for people to do it. We put it out there and weren't sure if people were going to do it or not.
At this point there are many thousands of repositories that put this badge that says how many vulnerabilities they have, and it feels almost like "Hold on. Why are you advertising?" But really what they're saying is, "We aim for that to be green," and they can manage their vulnerabilities and state as much in the repository.
Presumably when you consume such an open source project then you're able to say "OK, they have a bunch of these dependencies, but to the extent of their ability to assess whether they're vulnerable or not they've stated that they're not. They've accepted their vulnerability, but it's better off than being on your own."
Jeff: It comes back to that thing we were talking about a little bit earlier.
If you have a description of your security policy it means that they think about security. Already you've already got half marks, right? It's an awesome statement way above a lot-- Most of the other projects out there, that you're already ahead of the game by doing that. That's super useful.
Guy: Cool. Those are two. Are there others that are proxies? Like security disclosure, inform on the way that you consume input? What risk factors might apply to you?
Jeff: Of course, proactively reporting your vulnerabilities, my understanding-- I don't have the concrete stats but my understanding is that the majority of vulnerabilities out there today are not in the central database, is the thing. They exist in some issue or pull request and maybe they don't need to be called out explicitly, it's just the dev wouldn't fix the problem.
Actually reporting those things, and whether it's through the standard CDEs and whatnot or through some other way, surfacing the fact that you had a vulnerability, doing it responsibly and respectful. But surfacing that there was a vulnerability in some version, it's now fixed in some new version, is also something that clearly helps with the problem.
Back to our earlier discussion about dependencies and the badges and that sort of thing, going and proactively knowing what your complete chain is. Like, "What's your user's viewpoint?" As a project producing a component. People are going to go get it, they're going to do NPM install or whatever the verb is, and they're going to get 20, 30, 100 other things. Understanding the shape of that from a security point of view, from a licensing point of view, there's this term passive carrier or something. They use it in diseases.
It's asymptomatic carrier, that's the term, where you carry the disease but you're not actually-- So you could carry a vulnerability or a licensing issue, but not in yourself, you're just going to subject all of your users to it. So understanding what your users are going to get when they use you and what the security status of those projects are, a lot of times when we come across these deep vulnerabilities in deep dependencies it becomes hard for somebody to fix it.
You might be able to go and get a new version of the thing that's vulnerable, but the thing that's consuming the thing that's vulnerable, you need to up it. Then it's a new version and you need to up it, and you need to walk all the way up the chain or do some patching down to the low level, and there's some cool tools that do that.
Guy: We try to help in that space. But I agree with you, this notion of own it. It almost comes back to ownership, as an open source maintainer you chose to use some open source components, you need to show some modicum of ownership for those components and understanding that you need to be tracking them and reporting them. Because for all intents and purposes you are distributing that code, and you don't want to be distributing vulnerabilities.
Jeff: By no means am I trying to offload all of the work onto the project team. We want to be able to come as a large consumer of open source, we want to be able to engage and help teams become more and more secure. You signaling that you're willing to do that is a good sign that we're going to come to help with that. It's an engagement, it's a bi directional collaboration.
Guy: All of those are great components and I do think that there is more awareness. I would bet, that there are more securities today than there were five years ago or ten years ago. So, the conversation is there. What other means? You mentioned ClearlyDefined and those projects. Let's maybe talk a little bit about maybe a more structured element of contributing or sharing such knowledge.
Jeff: I mentioned ClearlyDefined earlier--
Guy: Can you give us the general overview on what this is?
Jeff: The current focus is on crowd sourcing license data. The current focus is not on security, but the general premise is there that we've got tools and capabilities that we can run and do automated work on open source components. In this case, signing licenses and copyright holders and whatnot, and put that out there for people to consume.
Right now it's hard for people to-- Just like it's hard to figure out what vulnerabilities there are in a component, it's hard to figure out what licenses the copyright holders etc. there are. To comply with the licenses, that's hard.
We've tried to automate and put a bunch of tooling in place, the tooling is not perfect because humans are humans and tools are tools, and we don't always get all the data but we make that available for curation. So people can come and update the values, if the license is wrong, they can come and fix it.
That gets reviewed like any open source contribution and subsequently gets merged into the definitions of open source components, and hopefully up streamed to the original components so that future versions are more ClearlyDefined in the way we set.
Guy: That is done by the maintainers of ClearlyDefined?
Jeff: There's a curator community that you can come and go to. You can go to ClearlyDefined.io and see a component that you know and like, or maybe you're the owner of that component, and see that "Crap, in version 1.3 we forgot to put the license in the package file." So you can go and fix it there and submit that as a pull request, and it's all automated.
Jeff: So it's nicely done. Then the curator community will say, "That's really cool. We'll merge that in and it becomes part of the corpus of data, then we went upstream that back to the original project so that version 1.4, when the next one comes out, is more clearly defined."
Taking that, that's all licensing taking that and trying to apply it in the security world. We have this notion of clearly secure, and it's very nascent. I'd love for your listeners to help us figure out what that could or should be. By all means, come to the site and there's a-- You can join the Google group or whatever and send us, share information about it--
Guy: This is ClearlyDefined.io.
Jeff: Exactly. ClearlyDefined.io. But what we're thinking so far, and again very nascent, it's simple things. Like many of the people we work with including ourselves have developed mappings from component identities to the CPE identity that's in the database. It's not obvious to everybody, but it's not as easy as going, saying "I want to go to the database and see if Foo version 1 has a vulnerability." You actually have to do work to figure that out.
A lot of people have independently developed these mappings, so "Why are we doing this independently? we're all in open source, let's collaborate on that and have a central place where we can develop these things." Then of course up streaming that, if you will, back into the databases and helping work with the database communities to make the data at source better.
But then there are other things, like we talked about the underreporting of vulnerabilities. How can we make it easier for projects to report in a very simple way? You could imagine the hash-vulnerability tag that gets put into your issue, or something that when you commit that that pull request that it's automatically hoovered up by ClearlyDefined and put into a database.
Now you can subscribe to that feed of vulnerabilities. We do not want it to be a new vulnerability store. We've got enough of those, and we just want to make it easier for people to use and manage and integrate with the data that's out there, as much as possible. There's a bunch of other ideas, but I'd really love to hear from your listeners--
Guy: That's a great call to action.
Jeff: As to what they might find interesting--
Guy: For those listening. Got some homework there. Go to ClearlyDefined.io. One of the challenges you're going to have with secure versus defined might be the sensitivity of the data, because there's the vulnerability store but there's also the-- If you're going to make a security conscious statement, or security impacting statement about a project for license, there's no--
Guy: There's no sensitivity aspects to it for vulnerability. For instance, what you wouldn't want is for somebody to come along and say "I found this vulnerability over here in this project. Let's just add it to the list," when it hasn't been properly disclosed.
But all that said, it sounds awesome, sounds like there's definitely a lot of, at the very least around the metadata and the curation but maybe even the security properties around what inputs come along. What all of those can very much be crowd sourced.
Jeff: The other area that's super interesting is more like assessment information. A lot of the vulnerabilities that we see, they come in and they've been produced by security researchers. They're great, they're super detailed, but it's like "Line 47 of file Foo.js," or whatever, " has this construct and it's going to cause this problem," and it's very detailed. From a consumer point of view, again, if you're like 10 levels up from that component you've already lost them at the line number or whatever.
What's interesting is to look and say, "What better assessment information can we have? How can I tell as a user if I am vulnerable? What are the characteristic usages that are vulnerable, and not just 'This is a DOS attack.'"
It's like if you call this function with a third argument being this way and I can easily take that and go and ideally write some tools to do that or get some tools, but even if I have to do a manual inspection, that's much easier to do than then trying to dissect it.
Guy: There's an interesting question there, around the crowd sourcing bit of some of these things versus the technology bit. Should that information be gleaned by crowdsourcing the community, or should that information be gleaned through runtime observation of data or the likes? And maybe even then there's a crowdsourcing element of contributing that data.
Guy: And making it available.
Jeff: Sure. We have absolutely no desire to get a bunch of people to do things if we can tool it back to my mantra before, about automate, right? If you can eliminate, automate, or--delegate is the last one.
Jeff: That's where humans get involved. If you could do this in any automated way, seek to improve the tools, and that's maybe another thing that happens in Clearly Secure, is people helping to put together- -I don't imagine it becoming a place where we develop security tools, but security data aggregation tools or something that could be useful and interesting.
Guy: Jeff, this has been a great conversation. Before I let you off over here, I like to ask every guest that comes on the show one last question, which is if you have one bit of advice or a pet peeve or something you want to tell a team that is looking to level up their security expertise or their security posture? What's the one thing you would advise that team to do?
Jeff: There is lots of different angles on that. I'm going to take the angle of the group of people consuming open source.
Jeff: It's to engage. A lot of people still think, "It's open source. I'll just take it, I'll use it, and that's it." But you really have to treat it like it's your code. Even if you're not going to write any, even if you don't know the language, you have to treat it like it's part of your system and you do need to care about the security of it and you do need to engage with the producing teams, the project teams, and say "How can we make this together? How can we make this a secure project so that we can all consume it in a secure way?"
There's just not enough people doing that. We see that across the board. It would be a lot more sustainable and a lot more secure if more people were more deeply engaging with the projects that they consume.
Guy: That's an excellent tip. That's great. Jeff, thanks a lot for coming on the show.
Jeff: Sure. Thank you.
Guy: Thanks everybody for tuning in, join us for the next one.