February 28, 2015
Hiring a Tech Ops Team
In this Heavybit Speaker Series talk, Charity discusses scaling and hiring an ops team from the ground up. She shares what she looks for in ...
Thank you very much. Hi, I'm Alex and we're talking about security process for your company, for your project, for whatever it is you're working on.
Sort of more relevant to this talk, I'm one of the developers of PyCA Cryptography, a member of the Python Security Response Team and the Twisted security contact. Doing security for open source is an interesting parallel to doing security for a startup in that absolutely nobody has any time to devote to working on anything besides the core of what you're doing.
Open source runs on volunteer time. Startups run on startup time. There's no time to waste, so all of this process is designed around being incredibly lightweight and not at all intrusive.
So, why do we care? Most of the talks here at Heavybit, I've watched them online, they're about how to do marketing for your product, how to hire, how to scale up your engineering, how to scale out your operations. That's not what this talk is about.
This talk is about how we sleep at night. Our users place an incredible amount of trust in our ability to handle their data and to serve them and we have an obligation to handle that responsibly. This is an ethics talk as far as I'm concerned.
When you've got a security bug, there's a malicious adversary. You don't know who they are and they can take advantage of your mistakes, and that's what makes them different. We want to account for the malicious actors.
This talk is about how to deal with the process of finding out you do have a bug.
How did you look for reports from your users, from security researchers, from people who are just looking at your code and going through the process of going from that to getting the patch in your hand.
To be clear, you want people to find bugs in your software and you want to hear reports about that. You can't just pretend they don't exist so hearing about them from your users is much better than finding out via somebody exploiting them.
The first place to get started is the first touch. Someone finds a bug in your software, maybe part of your software is open source, maybe they're just using your web application, but they hit a security bug.
They want you to know because they're a user and they would much prefer nobody would be able to take advantage of it. People will find vulnerabilities.
First thing to do is make sure you have yourwebsite.com/security and firstname.lastname@example.org as resources for people. Both of those should basically get these people what they're looking for.
What information do they need? Somebody finds a security bug. The first thing they need to know is how do they report it to you? Particularly, if you've got an open source project or an existing bug reporting system, you probably don't want them to go through there.
Your existing bug reporting infrastructure is totally public. Anybody can see the bug when it's filed. That's not what we want for security. For security, we want to be able to handle the bug and for people to find out only when we're ready to share the final patch with them.
So, you're going to want a description of how to report the bug and what the exact process somebody reporting will go through. We'll go through what that process should look like.
Next you're going to want to have a PGP public key so folks can send encrypted e-mails when they report it. This is basically just a shibboleth to certain elements of the security community that yes, we're taking this seriously.
And finally, you want a clear policy stating that if folks disclose responsibly, don't try to compromise your users, that you won't try to prosecute them or anything like that. Again, a gesture of good will to folks reporting, that yes we take it seriously and we want to work with you.
Here's the example of Slack's page. It's pretty good. It makes very clear that they want reports. That they handle them responsibly and they're willing to work with members of the security community.
This page actually isn't found at the /security URL but it is very clearly linked from there. When I went looking for how to file a report, very easy to find. So, they get high marks for their reporting page.
Somebody sends you an e-mail, gets in contact with you, there's a bug. That's great. The first thing you want to do is just review this ticket like you would any other. See if you can reproduce it quickly.
Ideally, you want to get back to reporters within 24 to 48 hours, just to let them know that you're looking at it even if you can't reproduce it directly or need to spend more time investigating.
If you can't confirm it's a bug, you're thinking it might not be real, it's really important to try to ask the reporter to get more information out of them. Try to figure out if maybe there's just a misunderstanding.
It's not uncommon for folks working hard to try to figure out a bug to report it in a way that's just completely incomprehensible to anyone besides them so you're going to want to try to pull at the strings a little to figure out what they're saying.
Throughout this entire process, you're going to want to constantly be in feedback with the reporter, particularly, experienced security professionals, folks who are existing security researchers and not just random folks who found the bug.
They have frankly a low opinion of most companies handling of reports so staying in contact with them and keeping them up to date with what you're doing to deal with the bug is really important in terms of giving them confidence.
The next part is where we write software and we do the computer stuff. You actually have to fix the bug. Again, if it's taking time to figure out how to fix the bug, if there are complex interactions with say, product design issues, maybe it's just technically complex, keep in touch with the reporter. Let them know why it's taking a while.
If your product is just a software as a service, and someone found, say, an SSX exploit, just go ahead and fix it, deploy it and you're basically done here. Unless you think somebody has actually used this or knew about it besides the reporter, this can be the end of the day for you.
If your software is open source or it's run on premises, shrink wrapped, then the story gets a little more complex. You don't just want to push out the patch and commit it like normal because you want to be able to coordinate issuing the release and making sure all your users get it and handling all the PR stuff that goes with making sure that this gets into your user's hands as quickly from when you make it public as possible.
We also typically, for open source projects, will run our patch past the reporter. We want to make sure that our understanding of the scope of the issue matches theirs. It's fairly common for them to report it as, "Hey, it's a bug in A," when they look at the patch, they realize, "Oh, actually, this has much wider implications and you actually didn't cover everything I thought I was describing."
Next, we have to deal with shipping it. A bunch of users have our code and we want to absolutely minimize the time between when we share that there's a security bug with the world, and the time that they have it running on their servers or wherever they run their software.
There's a few steps to this. First, you want to get a CVE number. CV stands for Common Vulnerabilities and Exposures. It's a system for keeping track of security bugs and giving people a common way to refer to them.
This is 3466 and you can say, this software is or is not vulnerable to that issue. It gives a common lingo to refer to stuff.
There's basically three ways to get a CVE. You can e-mail distros@openwall. This is basically, a private mailing list that's open to people who are basically Linux Distros, BSD's, basically operating systems. So, if your software is shipped in say Debian, this is a good place to report it to.
They do require that they will hold the issue under embargo for a maximum of two weeks. You don't want to e-mail this until you're absolutely ready, but it is a private mailing list. So it will stay private for those two weeks.
OSS Security by contrast is a totally public mailing list. There are representatives who read it, who are able to assign CVE numbers, but basically, the only reason to use this for getting a CVE is if it's already public for whatever reason, you forgot to get it or it was reported to you irresponsibly so you didn't find out before the general public did.
This is a pretty interesting place to follow along for security issues just because lots of folks are reporting stuff there.
And lastly, you can get in touch with MITRE directly. MITRE is the company / US Government agency that deals with issuing these and managing them and they will negotiate on a one-on-one basis with you the embargo time and all those details.
Once you have a CVE, you're going to want to coordinate with anyone who redistributes your software.
If someone who resells it, it's in the Debian repositories. Somebody else is taking your software and getting it to users, you're going to want to coordinate a release with them so you don't have this issue where, "Oh, we released our version of it but you've got this other version that you're shipping to users that's still vulnerable."
And of course, the last step is actually issuing the release. Scheduling is obviously an important question. Issuing a release at 5PM on a Friday, Christmas day, a little unfortunate, try to avoid those if at all possible.
There are a couple of things you're going to want to make sure you're putting in your release announcement. SaaS products very rarely do any sort of release announcements for security issues unless they know it was exploited or it's otherwise very public like a password dump or something like that.
But for other pieces of software, the release announcement usually contains a precise and complete description of the issue. I'll talk about why you want to have a complete description.
The CVE number, the release artifact, so what it is your users should download. Your Ruby gems package, your Jar, the EXE whatever it is. For open source, the raw patches themselves, users are going to modify your software, be running it in some weird environment where they just want the patch.
You want to credit the reporter and thank them for reporting it. Remind folks to disclose responsibly in the future.
A lot of companies and open source projects have a hall of fame for people who found issues. There's a way to give credit just because that's sort of a currency within the security reporting community.
All this was kind of the good case. Someone reports a bug to you. You find it, you fix it. You get the news out to all your users. You make sure everyone's aware. They get patched. That's the good case.
The bad case is what's called the 'Zero Day' because you have zero days to prepare for it. This is when your issue is known publicly, maybe even being exploited before you have any chance to work on it.
It happens for all sorts of reasons. Somebody accidentally discloses it, and sends to a public mail list, "Hey, I think I found an issue. Is this a real bug?" It happens. Sometimes malicious, right? Somebody starts exploiting it and that's the first sign you have.
Basically, all well planned coordination kind of goes out the window and speed is absolutely a factor, particularly if it's being exploited. I originally had a bunch of material here on how to handle zest or management. Then I found out, Blake Gentry gave a talk here a few weeks ago on coordinating incidence response. So, instead, you should just go watch that and learn a bunch about handling emergency response.
Just to give you an example of what this looks like, we had an issue maybe a year and a half, two years ago in Django where somebody sent an issue to a public mailing list saying, "Hey, I noticed that if I send moderately large passwords, it's possible for the website to take a really, really long time. Should we put a cap on that?"
The reporter didn't realize that this was actually a perfect denial-of-service vector. Anybody could send just a few small requests with hundred byte passwords and suddenly, the website is just spinning, churning, trying to deal with them.
Thankfully, we realized pretty quickly. "Oh, this person just publicly disclosed a DoS." So, basically spend the rest of this Saturday trying to get enough Django core developers online to agree on the patch, to ship a release.
Our release manager it turned out, was in Denver being a judge at a Magical The Gathering conference, and didn't have his regular laptop on him. It was completely unplanned. Later that evening, I get a call from friends saying like, "Hey, I thought you said we were going to do this thing?" I was like, "Oh, right, no. I'm sending an e-mail to all of our largest users right now, telling them we're doing a security release in half an hour. That's not going to happen."
You can't plan for these unfortunately. All you can do is have good procedures in place for 'when an emergency happens, how do we communicate?'
Getting reports in. You want folks to give you reports directly. If they don't, they'll either leave a security vulnerability lying around that they won't deal with or they'll disclose it publicly if they don't think that you're interested in reports.
Figuring out the patch and being in constant communication with the reporters, coordinating with folks who distribute your downstreams, and then issuing the release and making sure your users find out through whatever channels you use to communicate with your users; sending them e-mail announcements, blogs, whatever you use.
There's a couple of common mistakes that happen when trying to deal with these. The first is incomplete disclosure - not sharing the full details of what the security issue is with your users.
This is a big problem because it gives power to attackers particularly with open source software. There was a security issue in Node JS about a year ago and they issued a release and said, "Hey, there's a security fix in here." But didn't announce what.
So, users were left scrambling trying to figure out. "Hey, what is going on?Am I vulnerable? Is this important?" Node JS has a wide variety of different users. "Does this apply only to the HTTP server? Does this apply to anything I'm doing with it? If I'm trying to use parts of Node JS on the front end does it matter?"
There was not information for the users. Attackers by contrast, particularly ones trained in dealing with security issues, are able to look at the patch and extract what was the vulnerability and in fact, they did and in under five minutes, folks are figuring out what this is.
But the way these folks are discussing it, you know, random back channel conversations on Twitter, it doesn't empower users. Users, you might think, will go, "Oh I don't know what this is." And it gets the highest priority. This is not the case.
Lots of organizations are extraordinarily risk averse. And for them, risk means changing running software that makes money and not, "Oh, hey, maybe we're vulnerable to something." The lack of information disempowers them and gives the attackers a really good one up on them.
Next issue that can happen is poor coordination, even within your team. We had this issue just a few weeks ago on the Twisted project. A security report came in, responsibly disclosed. We developed a patch, everything is going well. We all agree this patch is correct. Okay, we should start getting to the release phase.
The release manager didn't realize that there were a few steps between 'We've agreed on the patch' and actually shipping the release. So, I went to bed thinking, "I'm going to send an e-mail to MITRE. We'll get the CVE tomorrow and then we can start this process. Hopefully we'll do a release in a week."
The release manager in Australia saw this and thought, "Okay, we got the patch.Time to release, let's go!" You wake up. The release has been done. There's no CVE. Debian hasn't been notified so they're not ready to update their packages. Small mess.
It's much smoother when everybody is appropriately informed.
You agree on the embargo date, all that is much better.
And finally, if you don't appropriately communicate to your users particularly when you have a public issue tracker that they should file security bugs somewhere else, they will almost definitely screw this up.
Django had an issue closed for six months that was a security issue that we didn't realize. Somebody filed a bug saying, "Hey, the e-mail validator is super slow with some e-mails," and then I closed it saying, "That sounds like a bug in Python's regex implementation. Maybe you should file a bug with Python."
Six months later somebody starts exploiting that and using it to DoS Django users. Not the way that should have gone.
Big question that comes up is pre-notification, particularly if your users are running your software directly, whether it's shrink wrapped, on premises, or open source. Probably, you're going to get to a point where you have some users or customers who want advanced notification about security issues so they can get themselves patched up in time. Or maybe Debian wants to know in advance or some other redistributor.
Redistributors are easy, in my opinion, as long as they're trustworthy and hopefully they are. They're easy opt-ins. The big question for a lot of folks here is probably, "We have customers with support contracts or some other relationship with our business. Do they get special treatment?"
My answer, I would tend to fall down towards no. So, for Django, we have a specific policy on how do you get it on pre-notification.
If you follow certain rules, you can get on the list. We require that you have a clear security contact. We're not getting in touch with a certain random employee. You have your own security contact to handle these reports.
You have a clear description of how the information will be used. We want to use this to make sure we're patched in time. We want to use this to make sure our customers who all deploy web applications so we can blacklist this version of Django and notify them.
How do you want to use this information? Is it good? Are you going to deal with this responsibly? If we report to you that there is a security vulnerability, is it going to go out to your entire 5,000 person organization? That's not great. How are you going to handle this information?
And finally, you need to be a large organization or have high visibility. Large is, you're world famous, a top 100 website that uses Django, yes. Obviously, you're an easy target or a popular target.
High visibility is a little more complex. We have websites that are listed on the homepage of Django. Here are some fancy Django users. Look at how great they are. That's the example of visibility for us.
We have the Politifact "Are-You-Lying" website. Not necessarily super high traffic but very high visibility because we promote it on the front page. If someone's looking to exploit a bug in Django, they're going to go to that list. That's an easy list to people, "Who can we exploit that's kinda important? Governments, folks like that.
If folks meet these requirements, we give them access. Turns out, it's maybe a dozen folks who have been interested in getting pre-notification and I definitely think it's better to have a clear statement of, "Here are our requirements to get this notification," rather than letting it be a commercial relationship. Particularly if your product is open source, it can get really complicated when you try to introduce the commercial relationship there.
Another big topic lately is 'bug bounties', basically, paying folks to find bugs or paying folks if they find bugs. Personally, I think it's a really good idea. Lots of companies are doing this. It clearly produces results.
There's two really good services for managing these - HackerOne and BugCrowd. They're both hosted services. They give you a standard form, reporting guidelines, and ways to manage the interaction with the reporter.
I'm a big fan of either one of these. My friends use both of these in their companies. Big fans. The small downside of these is that they cost a little bit of money, not a lot. Much less than the cost of a security audit.
The far bigger downside is it cost you a bit of time. Bug bounties attract nonsense reports frankly. Particularly, we get a lot of reports from folks for whom $100 and $250 is a large source of income. You get a lot of noise.
I think the pinnacle of noise I would say is somebody reported to Mozilla that Firefox's source code is available. Of course, that's intentional. Firefox is open source. You get noise like that, nonsense reports.
Folks running automated scanners and saying, "Hey, look you respond to this HTTP method even if absolutely nothing bad happens." I would probably say, don't really dive into this until you have a little bit of a breathing room in terms of letting folks take time to review a report because you get ones that are frankly difficult to tell that they're nonsense. Splitting them out.
Lastly, I want to talk about security impact ratings. This is a practice that tries to apply a score to a bug to indicate how important it is. This is remote code execution. Super important. This is SQL injection but it's against an uncommon configuration of the software. Less important.
Lot of folks are big fans of this. I am not. There are a couple of reasons. The first one is we are really bad at estimating what the impact of a bug is.
Some folks at MIT did a study and they looked at the Linux kernel and found that 14% of vulnerabilities were not actually identified as security issues until eight weeks after the patch landed. Patch lands, more than eight weeks later, somebody figures out it's a security bug. Of course, even within things we recognize as security bugs, sometimes it's hard to see what the exact impact is.
Django had an issue where we thought an XML issue was just a DoS but it turned out that under certain configurations, it could also be used to leak files on your disk. Trying to guess at what the exact impact is can be complex.
The other issue is, we are often not well positioned to decide how important an issue is for our user. "Oh, this leaks some data, that's less important than a remote code execution." Turns out if the data is really important, it can be just as bad.
So, I tend to shy away from trying to assign ratings and instead focus on how we can get the patched version of our software into our users' hands by informing them and letting them be the judge of if they're at risk or not.
To wrap up, to get started, going from zero to having a full policy - get a /security page up, security@ e-mail address. Give folks a way to know that they should get in touch with you.
Document how you're going to handle vulnerabilities and how you'll work with reporters. Consider creating a bug bounty program once you're ready to deal with a certain amount of reports.
And, sort of a freebie, the three most common things I've seen CVE's issued for are poor password storage. Make sure you're storing passwords responsibly, no plain text passwords or anything like that. Make sure your HTTP clients appropriately verify the certificate and host name of their peer and make sure that your application does not have any SQL injection vulnerabilities.
Three easy things to keep your app more secure. So, thank you very much.
The question was, "Where within your organization should responsibility for security live?"
For most small companies, the answer is going to be, it's with somebody who's interested in the subject or passionate about your user's security. Typically, but not necessarily, on your engineering team. Someone who's passionate about it, but you don't have time to dedicate a full person to it.
Once you're larger, starting to think about, do we have a person dedicated to security? I think the initial place to find the person within your organization is a roving resource. This person is a part of our engineering team. And they work with whatever product teams have a need for them at a given moment.
Yes, the question was, "Is there any sort of good process for helping support folks who are getting requests about a security issue, automatically flagging those and bringing them to the attention of developers?"
I don't think there's a good automated tool other than something really low tech if it contains the word, vulnerability or security, making a note of it. I think the best thing you can do is sort of train support folks what to look for, "Hey, a security vulnerability report if it goes to this channel accidentally might look like this. If you see one of these, just flag it for us." So, I think it's a much bigger cultural question than a technical one.
The question was, "If your company has a bunch of third party maintained SDK's, how do you deal with the implications around that?"
I think the first thing to recognize is you're going to get reports for those SDK's and you should probably try to treat them almost as if they were reports directly against you. This is because the implications are about the same, it's your customers.
You probably want to try to have a relationship, which is a good idea for a bunch of other reasons, with the folks maintaining these SDK's. Basically, being able to act as sort of a person in the middle, funneling the report from the reporter to the maintainer of the SDK and basically keeping everybody in the loop depending on how you message things on your blog or whatever being the place you make the announcement about a security release for the SDK.
So the question was, "How do you manage urgency encouraging your users to upgrade promptly but not freaking them out generating panic?"
That's part of the good thing that comes with full disclosure. Users can evaluate for themselves, "Do we need to panic?" Or is this merely, "This doesn't affect us but as a part of good procedure, we should make sure we're upgraded in due time."
I think full information is definitely a key part of that. Talking about the known properties of it is another component. If folks know it's remote code execution, that gives them information. They're not going to audit your patch to understand what is the implication of an accidentally dereferenced pointer.
You want to tell them what the known implications are and folks are generally pretty quick to pick up on this stuff. We see the different responses to major open SSL vulnerabilities when folks recognize we've got a patch everything today versus ones that doesn't get as much attention. Folks are pretty sharp.
Yeah, question was, "What happens when your security issue becomes a PR thing and hits top hacker news?"
This actually did happen to Slack last week. Just by knowing a domain of a company on Slack, you could see what all the chatrooms were named if they had user signup set to a certain mode. The PR doesn't necessarily have to be damaging. I think it does affect the way you handle these.
Slack's original position on the issue was this is kind of a feature of our product. This is how the workflow goes if you allow this signup option for your users.
I think a clear component of bad PR is it can give you feedback that you weren't necessarily getting otherwise. Slack saw this feedback and changed the signup procedure for these users. Basically, because they were able to re-prioritize based on the feedback they got.
If you decide you want to dig in your heels and say, "This is a feature of our product design," I'm not a public relations person, I don't know how you make that one look good.
All right, thank you very much!