October 6, 2015
Ep. #4, Scaling Your Engineering Team
In this episode, Edith and Paul talk about who owns code, how to scale your engineering team, and cowboy coding.
In the latest episode of The Secure Developer, Guy is joined by Aren Sandersen. They examine the current state of access control systems and discuss the need for better education and tooling to support time-bound dynamic access control.
Aren also explains why most startups consider security too late and reveals the minimum mindset that all early stage startups need to adopt to manage their attack surface.
About the Guests
Aren Sandersen is founder of Foxpass, a cloud-hosted LDAP, RADIUS, and server access control service. Aren has over 15 years of experience in software engineering, architecture, technical operations and IT at companies like Pinterest, Microsoft, Bebo, Oodle, and Danger.
Guy Podjarny: Hello everybody, welcome back to The Secure Developer. Today we have Aren Sandersen with us. Hi Aren.
Aren Sandersen: Hello, thanks for having me.
Guy: We're going to talk about a bunch of things, including access controls and how do you control all these pesky users you have on your team and still stay secure. So Aren, thanks for coming on, can I ask you just as we get started to say a little bit about yourself. So what's your background, how did you get into security?
Aren: Absolutely. I'm Aren and I'm the founder of Foxpass. I started out doing software development on server-side teams for about 10 years, and then I migrated more and more toward the operational side, helping manage data centers, and got really in-depth in that space. I ended up running tech ops and IT teams for several social networks. Then I did a period of DevOps, scalability and security consulting, where I worked with a lot of startups struggling with these areas.
And from my experience at Pinterest, which is one of the companies I was at, and at all those startups, I realized there's a big gap around access control that a lot of companies face. So we created Foxpass to solve all of those access control and identity needs for the infrastructure. Who can be on what servers, VPNs, wireless networks, what can they do, how long can they be there, that sort of stuff.
Guy: Okay, cool, sounds like you're personally going through the Dev to DevOps to DevOpsSec revolution, evolving your skillset. So let's go back a little bit, you worked for a bunch of companies, maybe we start around there. You mentioned you worked at Pinterest.
Aren: That's right.
Guy: I know in that role you were saying you were more in an Ops role. What was that experience as it came to security, and what type of security activity was going on.
Aren: Yeah, I would say when I got there, there was nothing wrong with it, but there's always room for improvement, and especially the rate at which the company was growing around when I got there. There was a lot of improvement on "how do we onboard people quickly, give them access to the things they need quickly". If someone needs to change their SSH key, you know, how can we do that quickly, and just so we really can focus more on getting people up and running when they onboard, getting them working on the product, and trying to enable them to work quickly.
Guy: Was there a dedicated security team at the time?
Aren: There was not, like many young companies, it really fell on the developers early on, and then as we added more DevOps type folks it was in their role. A dedicated security team didn't come until years later.
Guy: And the security responsibility was just generally shared, or did it fall more like you said as the ops thing came up it kind of became more ops responsibility versus dev?
Aren: Security's a very broad brush. I would say that there's usually a pretty clear division early on,
When there's no overarching security team, AppSec falls on the more senior app developers, and OPSEC on those who are very senior devs or have had more DevOps or DevOpsSec experience.
Guy: It's funny that to an extent, young companies, because of lack of alternatives, really, sort of operates at the holy grail mode of the large companies, because the large companies, kind of go through this crunch where you build a security team and you want them to give guidance. And then much of that security team's time is trying to get engineers and ops to actually embrace and just own the security practices in their sort of core competence. When there's somebody with that role, people may confuse that as, "it's not my job. It's that sort of person's job".
Okay, cool, so you're at Pinterest, I know you have security kind of run as a part of those teams, you're not in a security title at the time, so you're dealing more with OPSEC at the time?
Aren: That's right, I was OPSEC, DevOps, I wrote some app code as well. You know, in early days at a company you wear a lot of hats.
Guy: Yeah, anything that's needed. So that sort of evolved, and then after that you went on to do kind of ops and security consulting?
Aren: Yeah, I mean based on many of the challenges we faced scaling up Pinterest, especially on the operations side, I went and helped a bunch of young startups figure out their operational challenges, whether that was scalability, whether that was security, whether that was just foundational things, and I did that for a little over a year.
Guy: How much was security a component in the ask, like when people went out to, you know when they found you, and they sort of told you what they wanted, what they believed they needed help with. How big a role, if at all, was security in those project definitions?
Aren: I would say they were rarely asked upfront, it was rarely, hey come in and tell us what we're doing from a security perspective. More often what I heard was, can you just come in and look around and give us improvement suggestions across the board. And so I would always make a point of honing in on security, and highlighting areas for improvement, as much as I would all the other operational challenges that a company might have.
Guy: Cool, so you'd be sort of engaged on the ops front. Come help us operate better, but just like, from your perspective, that includes the security even if it wasn't stated in the bullet items?
Aren: That's right.
It was pretty obvious from the get-go if security was an issue for a company, because I would onboard as a contractor, and get to go through the flow of how do employees onboard. How do they get access to the production systems. What other systems they would have access to that didn't go through any sort of access control.
So it was pretty easy to write that report, because I was living it.
Guy: What were the insights you had? I mean maybe now you can contrast a little bit, you've seen Pinterest at a certain size, you've seen a bunch of companies that you worked with at sizes that were similar, maybe smaller, as you're building out. What did you see was typically handled well from a security perspective, versus things that weren't?
Aren: I would say almost all these companies go through the same arc, which I think is pretty common for all companies of this size. So you start out writing your app, somebody spins up a couple of Amazon machines, gets their app running, and then you add more people. Normally they're all sharing the same credentials until you get to this certain point where someone decides to bring in some sort of configuration management, that's usually fairly late, and then the company keeps growing, then you just keep iterating security slowly, and then at some point, they'll bring in a dedicated security team.
Guy: So you know, you talk about giving access to the systems and things like that, and we're going to sort of deep dive into that topic shortly. From other practices you're sort of seeing in terms of operations security, what were typical types of mistakes that you would walk in and you would say, you know, quite obviously, like pretty much all the other companies I've worked with, I can see you are doing X wrong.
Aren: Yeah, I mean like there are many companies where I would come in to help out and I would ask for access to the production systems, I would get an email of the private key for the Ubuntu or the EC2 user. Right, this happens a lot. It's easy for many listeners to sort of scoff at that, but it's extremely common because that's the way Amazon sets you up, right, they don't really tell you that this public, private key pair that they gave you is kind of just for you, because there's no way to add more users.
In many ways, the cloud has been a setback from an operational security perspective in that regards, that there's a lot of foundational security that is missing when you get that public and private key pair from your first Amazon machine.
Guy: Through that sort of statement about the setback is an interesting one, you know, cloud is obviously a productivity boost, right, I think there's probably no arguing that it made us produce more faster.
Aren: That's right.
Guy: So you know, the notion about access is the ability to access all of these machines, and whether we're sort of more fast and loose with it. Like, do you see that as something that's cloud-specific, or do you think that's a symptom of the speed? Is cloud the thing that you sort of feel is setting us back, or is it just about moving fast?
Aren: I wouldn't say it's the cloud specifically, I just think it's the current process of what developers go through when they're spinning up a cloud server. I think the cloud overall, as a whole, actually makes things more secure, but
I think the current process sets people up where they think they're doing everything right because they're in the cloud. But in reality there's a big gap between what they have in their hands when they set up that machine and where they need to be make a secure environment from an access control perspective.
Guy: It's interesting I think sometimes around the ease of use for a, security and usability are oftentimes at odds, it's oftentimes if you make something easy, you sometimes make it less secure, or the other way round, if you want to sort of harden access controls. If you want to harden the ability to do actions, then that comes at the expense of usability. We see this across the board right, we see this with the consumption of packages and publishing a new source code package, right, and in many of these platforms, in MPM and the likes.
You don't need a password for it, so if somebody got on your machine, they can now publish stuff on your behalf, and to an extent that reduces security. At the same time, it's easy to publish, so the community thrives because you publish a lot of this type of code, similar to cloud access, you know, you want the entire team to be enabled.
It's DevOps, it's everybody's responsibility, or everybody's empowered, if you will, to ship stuff, to access these machines, to grow them. In the process of making it more usable, you lose maybe some of those controls that you had, as opposed to you've got a physical machine, and you're guarding access to it. It's slower, but it's safer in some respects.
Aren: And I think security and ease of use are always going to be at odds. You have to put up some gates, you can't have a process that anyone can do anything at anytime without any checks. I think even some of the more basic things that are must-haves, two-factor auth, for example. I think that process has to exist. It's a little annoying when you're trying to do something and you're stopped and you have to dig out your phone, or to enter the token.
But you know, I think the right balance is important, and the right balance oftentime depends on what stage of the company you're at, what are the security requirements of your particular company at your particular stage.
I think ease of use and security are probably going to be at odds for a while until we come up with something a lot different than what we have now for verifying identity.
Guy: Yeah, I think, playing devil's advocate a little bit here, just to sort of flesh out the thought, on one hand, when everybody has access to everything, so in this sort of extreme, and very common case in platforms, then anybody can make bad mistakes. Also if anyone's machine gets compromised, then people would have very broad access. At the same time, oftentimes when we talk about the improvements in security that continuous processes have, you are fast to fix, right, when there is a problem, everybody has the ability to fix it.
You're probably right that it's just about the balancing act, maybe that's the security angle, not just this ability that pulls you in the other side, if you still want to make security sufficiently easy that you can respond quickly, so it's actual security demand.
Aren: Yeah we talked before about the arc that startups go through round security, and you know in the early days every engineer does have access to everything because things break, you don't know who's around who can fix it. But as you split up into separate teams, which companies do when they're mid-sized, then it might be time to start restricting which engineers have access to push code to which servers to access, which servers.
If you're a company where you have auditors or you have very sensitive data, it's going to be a requirement that you have role-based access control to start limiting who can do what and most of the companies, they struggle to build those tools.
Guy: And so let's sort of dive in, indeed into access control, so before I have a bunch of questions that dig in, let's try to sort of give it a definition, can you kind of scope this for us a little bit, we're talking about, you know, accessing the machines, what do you see as being included under that title of access control?
Aren: Access control covers a huge spectrum. I mean you've got companies out there that control access to third-party websites. Those are the Oktas and OneLogins, and Bitiums of the world, where your company credentials control what other outside websites you can get to. Then you've got sort of your internal access control. Who can be on your wireless network, who can access a VPN, who can be on your admin portal, who can be on your servers for the engineers? So there's a huge spectrum in the access control world.
Guy: So, kind of defining this a little furthier, so this is kind of the broader access control. When you talk about a startup, what would you say are the sort of primary areas that I need to think of? I'm starting a company, you know, a young company, it's in the 10, 20, 30 person, or even earlier on. Like as they just started, just a handful of people, what should I think about and what should I explicitly not think about right now?
I think you should think "if this person left my company today, what would they take with them?"
So, if this person left now, if they're an engineer, would they still be able to access your servers from their home? Would they still be able to access your DNS portal where they could cause some problems with your name servers if they wanted to, you know some of those have a weak TTL, so those can be very painful to fix. Even if that means at a very small company just writing this stuff down. What's your offboarding plan, just so you know that there's nothing this person can do to cause you problems later.
I think as the company grows, an offboarding plan on paper doesn't work as well as having more formal processes, and that's when you bring in companies to make one-click offboarding or role changing much easier.
Guy: Yeah, so there's less human error in the process. Probably something to keep in mind in those cases is it's not just a matter of trust, it's not just about whether that person is nefarious and is going to do something badly. It's also about just sort of your attack surface. So
If you have somebody that left the company, they may accidentally give access to those systems to another party, they may get infected with some malware, something that gives access to those machines. So it's about reducing your attack surface.
Aren: I think that's a really, really good point. I mean, you tend to think of it in terms of what could this person do if they were nefarious, but really the question is, "what could this person do if their account was compromised, or if their credentials were compromised". If they were phished, humans are the weakest links. So if this person was phished, what access do they have and how can you create policies where they're access is low until they need higher access?
So if they don't need to be on your credit card database server, they probably shouldn't without prior approval, whether that's from a peer, whether that's from a JIRA ticket being open or whatever, because if those keys were stolen, it's nice to know that, hey, well they didn't have access to anything really sensitive, so the attack surface was much lower.
Guy: Okay, so I think that's a really kind of good way to think about it when you just get started, is to say what happens if somebody leaves, you know, maybe there's a little bit more of that as well as you sort of figure out even your sort of team culture and structure as well, just sort of understand it. So step one, think about what happens if that person leaves, even if you're fully trusting of the employees as they're sort of still in the team.
And then sort of step two sounds like write down an offboarding plan, and then later on structure it, automate it, so that there's no human error or omissions when that happens. Those are I think good steps. What's the next level, so like you've done that, that's about people joining and leaving. What's the next step up in sort of access control?
Aren: Yeah, so we've mentioned just briefly, which is what we call dynamic access control.
So your access is zero or very low until it needs to be increased. For example, you don't have pseudo-access to any machine unless you happen to be on call that week. And then your access is automatically adjusted as needed.
Or when you do give access, it has to be approved, like we said, a peer or a manager, but it's always time-bound. You can only be on that server for one hour, then that access is reduced back to normal. So what you don't end up with is what we call the master key. So if you've been at a company a year, two years, your key gets you in any lock. If that key gets compromised, then that's very bad. So if instead your access always was lowered when you're off a project or when the time expired or whatever, then it keeps the attack surface much, much lower, and that's really where we see the future of infrastructure access control. Everything is time-bound.
Guy: I think that's a really, really good kind of destination, and it sounds like one where it really is about tooling. Because technically you could have implemented a human process to do everything you've just described, except that's very impractical, it slows everybody down, but also you need a certain size company to do it. While on the flipside, if this was sufficiently easy, if it was just like the two-factor auth, it's just, you know, a 10% overhead on the process of giving somebody the master key, then why not, right, assuming the costs are correct as well, but I think a lot of it is about that effort.
Aren: Yeah that's right, I think access control's a manual process. Updating someone's access, updating someone's SSH key means updating a get repo, pushing that up, setting that to the Chef server or Puppet server. And then if it was supposed to be temporary access, you'd have to remember to undo that, which is again, a lot of overhead, it's a lot of people's time. If you build automated processes you get better security, because you have these tools in place, you don't forget to change permissions if they're time bound because it's automatic.
I think there's a lot of room for improvements over the current state of the art tools in this area around access control.
Guy: Many of these controls we talked about are really good when you have some sort of central user management portal. Oftentimes you have multiple user management systems, right? If I use some things, in full you know, you might have a GitHub account, and you maybe use Google Apps, and maybe that is sort of a central focus for a bunch of the other things, and maybe you have some JIRA account, or there might even be Twitter account and stuff like that if it's about social media access. Not all of them kind of connects together. Is there kind of a good practice that you can do to try and wrangle these things?
Aren: There are a lot of tools out there. I think on the low end is a common list of passwords in a spreadsheet, not really recommended, but at least you know where to look. There's also password management tools that have group features, LastPass is one of them, 1Password is another. Then you can move up to things like Meldium, which is very similar to those previous tools, and then you're looking at the federated access systems, like Okta, OneLogin. They really have the ability to tie all those systems together, so they'll hold onto the passwords for accounts that don't have federated access.
They'll provide federated access for sites that have it. They also will do provisioning, so you add a new user to the marketing team, they're going to get access to Twitter, they're going to get access to your ad management portals automatically, but someone on another team won't have that access.
Guy: That sort of evolution is a good listing and probably the key question would be what's the right time for me to invest in that type of system, and maybe it's when the pain is sufficiently high?
Aren: It always turns out to be when the pain is sufficiently high. The right time is actually day one. I don't think anyone actually does that, and I really can't fault them, I think product market fit is probably a startup's first priority, if you don't have that, then what's the point of all the security?
But security's almost by definition brought in later than it should be.
Guy: You need to build assets that are worth protecting sometimes before you can invest in that protection, and at the same time you have to manage that risk, about saying you know, what could shut you down tomorrow? For instance losing your customers' data, or things like that could make those assets dissipate very, very quickly.
Aren: I'd say security is all about trade-offs at every stage, and that's one of many.
Guy: Cool, so I guess describe, just to echo back some of these flows of evolution in your sort of access control level-ups, going from understanding offboarding and what happens if that person leaves, to documenting and then automating that process, to simplifying onboarding. And then having dynamic or as fine-tuned as possible a process with minimal operations.
Maybe if I understood this correctly, there are actually two tiers here, first there's minimum permissions, and then there's temporary permissions on these different systems, which probably go hand in hand with tools. And this is sort of an area where Foxpass kind of can help you out, or is this in the OneLogin space?
Aren: The temporary access specifically around servers, because really our focus is all infrastructure. So temporary access or temporary group membership on servers is really something that we push. We believe that no company should have those master keys floating around, so we give you all the tools to make that happen.
Guy: Yeah, cool, so that definitely is sort of a good path, and then we talk about the future a little bit, in the sort of the dynamic path. Is there a lot of evolution as well in tooling around understanding that it happened? In many security conversations today, there's the prevention aspect, but then there's also a detect quickly when a problem occurred? Is there an equivalent to that in the world of access control, of just being focused on anomalies and the like around access?
Aren: I think you have to combine a bunch of different sources, which is the nice thing about I think the current DevOps ecosystem is there are APIs everywhere. So you would make sure that products you use have login output that you can get your hands on, so you can look for failed logins or odd-looking logins.
You can use a lot of cloud-based tools to aggregate that data and then build reports on it.
I think there are a lot of silos out there, you know, we'd be a silo of login information, there's be another silo of network traffic to look for anomalies, and there are a lot of companies out there who are doing a good job of mining that data from all these different sources and looking for patterns. So I think it's really about the combination of all the pieces.
Guy: I really like that idea, you can probably sort of create a dashboard, just like you measure a whole bunch of the kind of ops metrics, just some of these access-related stats. So you can spot anomalies about failed logins around whatever, how long between logins or things like that, that just draw attention to it, so you can inspect if that seems legit or if there's a problem there.
Aren: Yeah, and if you see a user trying to access a machine that they should have long lost access to, that looks pretty weird, right, something that could be flagged in one of these alerting systems.
Guy: Yeah, so I think as really, really good sort of education around access control and I guess where we're evolving. One tidbit that does come up in my mind when we talk about access controls and these keys is just the relationship of access control to secret management, right? Because with secret management oftentimes the secrets are tokens to access the different systems. How in your mind do you delineate those two worlds?
Aren: It's a great question, so you know Foxpass itself doesn't do anything with secrets management per se, normally secrets management is API keys that your application needs to talk to third-party services. There are a lot of those that are afloat out there. For our world, the biggest secret is actually on the user side, which is that SSH key on their laptop, how do you protect that.
You know, we could talk about disk encryption, a lot of things on the user side, but in terms of secrets management in a production system, you know, you want to look at tools like HashiCorp's Vault. There are a lot of companies who built systems around the key storage systems built into AWS. There's a big world out there of those.
Guy: Yeah, so it would be the sort of keys that are not necessarily associated with an identity, they're just access, well I guess they're still access control, they're authorization means, but they're not associating to a person. Maybe that's where terminology fails us a little bit around that, because they still control access, but they don't represent a person behind it.
Aren: That's right.
Guy: Cool, well before I let you off here, I often ask my guests if there was one thing today that's kind of your pet peeve, or your key recommendation if you're talking to a sort of security team that's trying to up their game in the security space, what's your one recommendation for them to do so?
Aren: So when I'm talking to developers about sort of operational security, my biggest tip is just know your tools. You have know where your cloud provider leaves off in terms of security, what's missing. You know a lot of them have documentation about, okay, we've left you here but we recommend a bastion host, we recommend a VPN, we recommend you do all these things, your work is not done.
You know Amazon has this pretty fine line of we're responsible for everything below this line, you're responsible for everything above. So you've gotta know those, and then you have to know what tools will fill those gaps. You know it's an art to figure out when the right time to bring these tools in is, but you gotta know that they exist, and I think
A lot of security does boil down to knowing how to use your tools and where their weaknesses are.
Otherwise you'll just be blindsided.
Guy: Yes, an excellent recommendation I think, much of software development today is assembly, you're sort of pulling together all these different services and tools and packages, and you put them together. When you put them together you need to understand the seams, and understand how does the eventual puzzle work together well, and that is sometimes almost goes without saying from a functional perspective, because otherwise it wouldn't work, but it's very easy to overlook that from a security perspective.
Aren: Yeah, and there are no silver bullets. You have to know the limitations of what you're using.
Guy: Yeah, for sure. Cool, well Aren thanks a lot for coming on the show, this was fun.
Aren: Thanks so much for having me.
Guy: If somebody's trying to find you on the Twitter feed or online what's the best way to contact you?
Aren: Just search for Foxpass. Our Twitter handle is @foxpass.
Guy: Cool, well thanks everybody for listening in, hope you found this useful, and tune into the next one.