October 21, 2014
Scaling Support Operations
Ali manages Slack's rapidly growing Customer Experience team.
In this episode of The Secure Developer, Ben Bernstein from Twistlock joins Guy to discuss container security. Are you currently using containers, or thinking about moving to containers in your stack? You won’t want to miss this episode.
With containers, developers control the entire stack. While empowering to developers, this can also open up new security vulnerabilities. Ben and Guy discuss the tools and processes you’ll need to put in place to ensure your containers are compliant and secure.
About the Guests
Ben Bernstein is CEO and co-founder of Twistlock. Twistlock delivers Docker container security for every stage of the DevOps workflow – with seamless CI integration, extensive API support, and dev-to-production security controls that deliver consistent policies across the container lifecycle. Before Twistlock, Ben worked in various roles at Microsoft with a focus on enterprise software security.
Guy Podjarny: So hi everybody, and welcome back to The Secure Developer. Here today with us we've got Ben Bernstein from Twistlock. Thanks for coming on the show, Ben.
Ben Bernstein: No, thank you for inviting me.
Guy: I think maybe before we dig in and start talking about all things container security and microservice security and the like, maybe Ben, do you want to give a quick intro to yourself, your background. What do you do?
Brian: Sure, I've actually been a developer throughout most of my career, in Microsoft, working on different security suites and the OS security of windows.
Recently I and another friend, we started Twistlock, which had to do about the change that is happening in the world of developers and how it affects the security world, and this is how Twistlock came into being.
Guy: Cool, again, I think the container security space is hot and new, but also entirely imperative to the adoption of containers, which is growing probably faster than the security controls on it are.
Brian: Absolutely, and for us it's really an opportunity, and we were pretty amazed with the reception of the concept that you just outlined, yeah.
Guy: Yeah, so to level set a little bit, we're talking containers and security, right? This is probably going to be a theme this episode.
Can you give us a bit of a baseline about what should you care about, what should you think about when you think about security aspects of containers?
Brian: So, it's really interesting because
A lot of the people that read about containers, they read the theoretical material.
People come to the conclusion that the most fundamental issue about containers is whether they're as secure as VMs or not and whether you lose something between moving from VMs to containers, but that's not what I've seen in practice.
In all the customers and organization that move to containers, and especially the enterprise ones, the core thing about containers is the detachment of software from the actual physical, or now virtual machine.
So the interesting thing is not whether they're as secure as VMs, but rather how you could control the mess and the empowerment of developers to do so many things.
The developers now control the entire stack. Security's one of the most important things about software and developers can make mistakes.
How do you make sure that everything is compliant and it is as safe as possible? In the past you had IT people, it will be their safety belt. Now you need sort of something else to help them.
Guy: Yeah, I guess the location of where some of these decisions are made changes, right?
And, hopefully outdated but still probably very much alive in certain systems worlds, where in order to run something you would have to ask InfoSec to provision the server, or ask IT to provision the server.
And that would require an InfoSec inspection, then you know, it's not a great world in many ways, but in that world at least there is that security inspection.
That just completely disappears in a world where as a developer you put in a Dockerfile, and voila, you know, the entire operating system just got stood up.
Brian: Absolutely, and here lies also the opportunity, because when you think about it, as a developer you'd probably not want to wait until the IT person knocks down your door and says like, "What the hell you just did?".
What you'd like to do is to use the CI/CD tools in order to push them into some staging mechanism or something that reviews it and pushes back to you if there's any issue.
And that's actually really a good opportunity for you as a developer to get the feedback right away, and so, in the past a lot of the things that used to be issues would only be discovered very late in the process.
And then you had to find out which developer did what and why, and here, once you do something that is wrong, this is an opportunity to actually push back things.
But just like you said, you have to choose a different location to do these things and the CI/CD tools would be probably the best location to give feedback to the developers about anything you can do.
For example with vulnerabilities,
You'd rather know about a vulnerability as you check-in your container or your code rather than wait for it to happen in production and it being picked up by some mechanism.
So always closer to the developers always have and always will make more sense.
Guy: Yeah, I definitely agree with that perspective. I think containers sort of do two or three things in the process.
One is the technical one, which, as you pointed out is not that important around the fact that technically your operating system's going to run under or within a container versus within a VM.
And there's a lot of conversation in container security world earlier on about, you know was, really was just shortcomings in the Docker Engine.
And some of those are still there, but they're less interesting about whether a container is isolated or not, they're interesting today, but long-term they're a secondary.
What's more important is the revolution of how a software is built and who is building it.
So maybe we should split those two a little bit. There's the technical aspect of it.
Or it's not that technical, but it's around how is the software being built. It's now built as part of a CI/CD process, and
Maybe we lost some security gate that we had before in asking the InfoSec person whether they can do it. But we've just gained access into this CI/CD world and this opportunity to run tests early on.
So I guess we can have the whole thing be a net negative in terms of security if we don't capitalize on that opportunity.
Or we can turn it into an advantage, if we do tap into it.
Brian: Absolutely, and it almost goes back to the question of tests and a lot of the other stuff.
And you can look at some of the companies who have been doing it right, like Netflix and Google, and the way they did their CI/CD, and the way they did their staging, and their cows, monkeys, and all that kinda stuff.
I mean, just trying to figure out about things as early as possible and doing it in an automated way is really important. So you must develop new tools that don't exist today that enable you to do all of that.
Today's tools were not built for the CI/CD world.
They were not built in order for the security people to set the policy and then for that policy to be enforced in a dev friendly way.
So when you're thinking about how you're going to build your dev to production environment, this is definitely something that you want to keep in mind.
Guy: Yeah, and I think the notion of how security tools were built for security audits, more so than any sort of continuous testing is a bit of a recurring theme.
In the show here we've had it come up several times because the tools were built for the present in which they're being used.
And in fact, even today's present, right, the majority of security controls today happen outside of the container world in the continuous world.
And that's increasingly changing, but that's still the case, so you need tools that focus on the use case of building in that CI/CD, again, capitalizing on an opportunity.
Because you just lost something, but you gain something all that much powerful if only you had sort of the tools to take action there.
Brian: Absolutely. Honestly, this is almost half the story. The other half is actually the fact that containers in themselves.
The fact that they're minimalistic, the fact that they're immutable, the fact that they're more declarative lets you get better indication of compromise and anomalies.
And get to better baselining based on machine learning, and a lot of the good things that security is about. But I guess it doesn't have to do with the developer space.
But I'm just saying because developers move to this, you actually get more information and you're able to protect them better at run time, so not only do you get better feedback to the developers, eventually the security pros would also find this system more useful, so it's sort of a win-win.
Guy: Right, in this case I would say that containers are just one manifestation of infrastructure as code. And infrastructure as code, as a whole, implies predictable hardware, or predictable deployments.
Again, barring bugs, but it's predictable deployments, and therefore you can go on and you can check controls. Netflix actually have, I think it's called the Conformity Monkey as part of their Simian Army.
It goes off and lets you deploy stuff, I think as is, I'm not sure if the Conformity Monkey's engage or not, but then it goes off and it randomly finds systems and it just checks to see whether they conform to what they should be conforming.
So developers can go on and do whatever they please, but they may be caught if they've done something wrong by the Conformity Monkey.
You know again, giving them sometimes opportunities while showing them the responsibility that they need to address.
And tools of that nature, they don't have to be containers, but they have to be in that context of infrastructure as code.
Brian: Absolutely, I actually had an interesting discussion with one of the people in Netflix. And they mentioned to me that they even have a new monkey that tests access control.
So you don't typically think about that, but developers now have not only the power to create code and to create the entire stack, but there are some things that typically were taken care of by the IT people.
And suddenly developers have full control over it, and you suddenly don't have the extra safety belt. And one of which is identity and how much privileges does your services have.
Because you as a developer, you're testing in some environment, you might create some kind of a authentication and authorization policy, which is good for your environment but maybe it's not good enough for production.
So they actually have this monkey that tests list privileges, and that's really interesting that they came to this conclusion so early.
Because, they did it based on practice because they saw the developers sometimes make mistakes and you need some kind of staging tools, the monkeys, to sort of check whether they did everything correctly or not.
Guy: So, I guess I'm trying to enumerate a few examples, just sort of give people some things to tee up. Let's focus on containers as opposed to the broader concept as infrastructure as code.
Your need to test for something, have some security controls as part of your CI/CD process. We touched on two examples here, you can look for vulnerable artifacts in those containers you're deploying.
And the notion of sort of least privileged users so you can audit probably the user that systems are running with. What other examples do you encounter?
Brian: Sure, so the most common one, or the most basic one would be a golden VM. That used to be sort of a way for IT people to force certain OS hardening rules.
And so anything you could imagine about OS hardening. A simple example would be there shall be no SSH daemon in production, right? I mean, that's just one example.
But anything that you'd expect a base OS to have, anything that, and when I say base OS I'm just thinking about the user mode, right? Because the kernel, you know, the Linux kernel is shared with the host.
But still there's so much damage you can do by accidentally slipping something into the OS layer that's not protective. And then you basically need to make sure that it conforms to certain standards.
And then you go to stuff like devices, right? You could write something that looks at some attached device for some reason, and as a security person you probably want to limit these capabilities.
Because you, in development it made sense to you to attach this device, you probably don't want to attach any device in production, you know? So a lot of these slips that could happen need to be actually checked before something is being put into production.
On top of that, there's something, specific to containers, something called assist benchmark, which have to do on whether in the container you defined a user or did not define a user.
And it's based on which version of Docker you used and whether you used certain restriction or you didn't. So honestly, even like, the biggest experts could get something wrong.
Bot to mention, a standard user who's just trying to get around to writing in a low world program and may not have restricted everything that should be restricted.
So the CIS benchmark has about, I think 90 different things that could go wrong and you want to check for, ranging from the daemon configuration, the host configuration &the specific containers.
It could be things that the developer did wrong, or something that the DevOps or the IT person that set up the host on which the Docker is running has done wrong.
Guy: Yeah, those are really useful. We can throw a link to them in the show notes. The concept of enforcing, or testing for some, basically policy violations, right?
That sometimes sounds like a heavy concept, but in fact it's actually a very straightforward to see that you're using the right operating system. And I can totally see that happening, and in fact have sort of seen it happen even, you know, personally.
I've done it, which is when you're local and you create a Dockerfile or you create some environment, your bias is just to get it to work. And the inclination is just add things.
And then by the time, the distance of time between the time you've just done that and you've made the decisions about whatever operating system, whatever you installed on it, and the time in which you commit that and have that deployed.
There's a lag there, and during that time you don't remember those decisions that you've made earlier on that you entirely intended to make temporary, except, you know, nothing's more permanent than the temporary.
So, yes, those are really useful, and I came across this interesting Dockerfile, Linter, earlier on that does some of those components. We'll throw a link to that.
These are our tools, maybe this is the technical side of the fence, right? The tooling you can put, and the audits or the checks that you can add as part of your CI/CD piece.
I think the other part is the people piece, because what also shifts in what you've described of the process is that it's not just the tests that get run.
It's also the people that run them that change. It's not the InfoSec person that does whatever inspection on the check. It's the developer that is adding a test to the CI that does the inspection.
How have you seen that in track? You work with all these companies that are adding container security components.
What do you see works from the interaction between the people coming in with the security inputs and the developers or DevOps teams that need to apply them?
Brian: So, it's interesting because, it's sort of bottom-up. The whole approach to DevOps seems to revolve around smart people who owned DevSpace and then smart people who come from the OpSpace.
And they technically work together in order to create some kind of a legitimate infrastructure on which the entire organization can follow. And so, the end result is that the SecOps people, or security pros, they would like to set certain standards and have them applied.
And they need the DevOps people to actually implement all the mechanisms. If you go back to how application security used to be in the older world, in the VM world, you always had the security ops people working with the networking guys, in order to put in all kind of IPS, IDS mechanism, so it's almost the same to some extent.
They work with the DevOps people, but here the DevOps people have a lot more responsibility because they're dealing with a lot of delicate things such as the development process.
So they need to be very professional about it, the toolings, you know, the tools are still new, there's a variety of things.
So they need to be experts in that, and sometimes you get to a situation where you run into a security pro person who actually is so good that he learns about the development process.
He learns about the CI/CD tools, and he's comfortable implementing some of these things himself, but, that's kind of the exception rather than the rule.
Guy: Yeah, I think the, maybe one delta between the network ops people and the DevOps people is just the pace of change. The network world did not change faster than the security world.
Or in fact probably the other way around, while the development world, especially in this sort of continuous versions of it changes very, very quickly.
So to an extent I think you're entirely right, I entirely agree with the importance of having the security team and the development team, or the SecOps team and the DevOps team communicating.
But I would also say that this is a case that resembles a little more DevOps, this sort of this, no, not just blurring, but entire elimination of the line almost between those components, where those teams work very, very closely and very much hand in hand.
DevOps did not eliminate ops teams, or make all developers Ops experts. There are still people within the majority of companies that operate it that are predominantly dev or predominantly ops.
It's just they're not 100% anymore, they're 5% of the other thing, or 10% of the other thing, and either way they're sort of a part of the same teams, cohorts, you know, goals, and working together.
Brian: Absolutely, and it's sort of even elevated the level of policies that the security people sort of put into the picture.
Because in the past you know that the security people used to be involved in every little thing that the developers did before they actually put it into production.
Now it's no longer manageable 'because the scale is so big, so it actually forces the security people to think about this meta policy. There should be no this and everything should be that and apply it.
Because they can no longer go to every person who owns a microservice and ask him to describe in 10 pages what he's going to do and then read these 10 pages and the next day he's going to change it slightly.
So actually, it sort of elevated their level of policy making and also required them to get, to understand the DevOp space much better in order to understand what they can and cannot do, so I absolutely agree with you.
Guy: Yeah, and that process has actually happened in the ops world, the notion of write it down and then write it down in code.
Ops systems were also voodoo. The flow of actions you might do during a security audit were in somebody's head. Or they were written in some outdated document.
And then as systems and the deployment of those systems became more automated and more touched, then those had to, first of all, be written down in code so that they'd be predictable and not go out of date because they represent what's on the system.
And later on even be written down or edited people that are not in Ops. So, I guess it's the same process that security needs to look into.
Brian: Absolutely, and we're actually taking advantage of that, like you pointed out. when we get to actually see what's running you need to understand the full context.
You need to understand the infrastructure that was there from the hardware all the way to the actual last bit of software configuration that you did, and like we said, you can't do it on a manual basis.
So actually, infrastructure as code is actually very helpful in the process of protecting software all the way to run time, so this is a blessing for the security world.
Guy: So, these are really good topics, and when you talk about containers, we talked about both the security implications within the containers, you know, thinking about what's in them.
And the fact that they get created differently. We talked about the opportunity to integrate testing and which tests you could do as part of the CI/CD process and the people that run them.
Maybe one last topic we can touch on, which is also kind of top of mind for many people is not the containers themselves, but rather the microservice environment that they enable.
Containers as a tool offered us the opportunity to now deploy many different systems because it's that easy to create them and to create lightweight versions of them eliciting this new microservice environment, right?
Suddenly you have 100 servers, or maybe 100 is a little extreme, but 20 servers that perform the same action that a single server would've had before, that also introduces a whole bunch of security concerns, no?
Brian: Yes, absolutely, it does, and it goes back to our talk about the scaling. By the way, we've seen customers running it on hundreds of hosts. And we have some customers that plan to go to thousands.
And you need to, when thinking about security like we just said, you need to take into account the host stack. But you also need to sort of think about the scheduling of these microservices in different environment.
And on one hand understand the full stack, which could mean different hosts. On the other hand you need to understand the software piece, the specific container that you have, and if there's an issue with it you want to flag it and say that this was the container.
It wasn't the actual host, so it has to do with how you analyze threats, it has to do with how you report the threats, and it has to do, again, with the fact that you need to do everything automatically.
So when something comes in you need to analyze it automatically because there could be thousands of the same container, it could be a thousand different ones, some of which could go up for three seconds and go down and you'll never see them again.
So everything needs to be automatic, you need to think about scale, and you need to think about the different pieces, about the orchestration and about this new stack that's not exactly just Vms and software which you run setup on.
So that goes back to everything we've talked about, including the fact that these are microservices, which just make things sort of worse.
Guy: And I guess here there's also the two-fold version of it, right? When we talk about container security, one topic that often comes to mind when you run those containers is the fact that the container is run on a machine.
And many cloud services, like AWS, would have their security policies around which network slots were open. Or which VPC you're a part of be an aspect of the machine while the containers run on those.
And there's probably, there's merit in them, again, and sort of security concerns and, that run today, but there are once again shortcomings of the current ecosystem that is just adapting to it.
Probably the bigger concern is in the changes that are here to stay, which is the fact that now you have all these different microservices and have to think about how they interact.
What happens when one of those services misbehaves, you know, what type of exceptions might bubble up to the user or to the outside organization. What type of network monitoring do you do to identify whether one of those components were comprised.
Brian: Actually, that's a huge opportunity again, because suddenly you got infrastructure as code.
And suddenly you got the person who developed this service sort of imply to you, or even explicitly tell you, depending on whether he's talking about the inbound traffic or the outbound traffic.
But he's sort of implying to you where each microservice might need to go. And then if you baseline it correctly, and you understand the orchestration mechanism.
Then you have this new type of firewalling where you could, instead of just looking at static hype, or you know, FQDNs, you suddenly understand this is a service, he's trying to do one, two, and three.
And if he's doing four, which doesn't necessarily translate into a different IP, maybe it's the same IP that you had before. But now it's a different host.
Or maybe it's a new IP but that's okay because it's talking to a microservice that it should talk to at this point.
This actually presents a challenge, and again, an opportunity for tooling companies and firewalling companies and security companies to create a different type of firewall. Or a more elevated and container friendly type of firewall.
Guy: Right, each of these services now are much easier to understand, and if you understand them better, because they're doing something much more pointed.
Then it's easier to differentiate right from wrong and be able to monitor it in the right way.
Brian: Absolutely, that's exactly what we think in Twistlock. And this is what we believe. That the security world is actually going to revolve around that, about pure software and about understanding the developers when making the security decisions.
Because now developers are actually telling you more and you need to listen to that.
Guy: I think that's maybe where the communication needs to be indeed kind of start going the other way, right?
In the deployment process, the gates that have disappeared have now moved into the developer's hands.
The developers now control what gets deployed, how it gets deployed, what tests run on it before it gets deployed, and that opportunity was lost in the gate, but, and it was sort of gained now in running these far better tests in a far more continuous and efficient fashion.
Now that that's deployed, you know, security is never static. The fact you deployed something that you believed to be secure at the moment does not end your security work.
Now you need to monitor these things in production, and that's where the information needs to come in the opposite direction.
Again, like in DevOps, a lot of the concept is, if it moves, measure it, if it doesn't move, measure it in case it moves, right?
And this notion of building operative software, you need to build, you know, it's probably not a word, but securable software that has the right outputs to enable a security professional looking and probably monitoring the system in production to distinguish right from wrong.
Just like they would a service that is just about to hit its capacity threshold and you're going to have an outage.
Brian: Absolutely, and I see it as almost a thread that goes from the dev, through baking to staging, all the way to production. And it could go both ways, and this is really the biggest change.
It's the big, it's the change in development, it's the change in IT. It's the change in responsibilities, it's the change in security, and it's the whole opportunity for the ecosystem and specifically security, absolutely.
Guy: So, this is, was a really good conversation. Thanks again for sort of joining me in it.
I think it's amazing to me every time how often you come back to the analogies between the DevOps world and the security evolution that needs to happen for us to sort of secure this world.
Before we part, can I ask you, if you think about a development team or a DevOps team that is running right now and wants to improve their security poster, right?
Want to sort of up their game in terms of how they handle security. What's your sort of top tip, right? What's the one thing you would suggest that they focus on?
Brian: So, if I had to say one thing, I would say that
You should really start designing security as early in the process of moving to DevOps as possible.
Because you need to think about the tools, and you want to put them in as soon as possible. It's much harder to implement changes in the process later down the road
So it sounds simple, but that's what all the people who implement best practices have done that we've seen so far.
Guy: No, that's really sound advice, and also I guess containers give you an opportunity to do that, because you're probably restarting or rethinking some processes.
So you know, that's your opportunity to build security in. Thanks a lot again Ben for joining us on the show.
Brian: Thank you for giving me this opportunity. I really appreciate it.