OCT 21, 2020

50 MIN

Ep. #4, SPIFFE and SPIRE with Sunil James of HPE

GuestsSunil James

light mode

about the episode

In episode 4 of The Kubelist Podcast, Marc speaks with Sunil James of Hewlett Packard Enterprise. They discuss the SPIFFE and SPIRE projects, focusing on identity control for distributed systems.

about the guests

Sunil James is Senior Director at Hewlett Packard Enterprise. He was the CEO at Scytale.io before being acquired by HPE. He previously held product management roles at Google and AWS.

show notes

about the episode

about the guests

show notes

In episode 4 of The Kubelist Podcast, Marc speaks with Sunil James of Hewlett Packard Enterprise. They discuss the SPIFFE and SPIRE projects, focusing on identity control for distributed systems.

transcript

Marc Campbell: I'm here today with Sunil James, a senior director within HPE's newly formed security engineering organization, to learn more about SPIFFE and SPIRE. Welcome, Sunil.

Sunil James: Thanks, Marc. I appreciate it.

Marc: To help us get started, Sunil, help me understand the path you took to getting into HPE and working on SPIFFE and SPIRE before we dive into the details of the projects.

Sunil: Yeah, I'm happy to. Sitel was a company that myself and my co-founders formed in early 2017.

We started the company because we were spending quite a bit of time with enterprises that were beginning to adopt or look at technologies like communities and content orchestrators, runtime like Docker.

We were starting to rethink how they're going to re-architect some of their application development platforms for use with these kinds of technologies, and one of the areas that was an area of interest to us in particular was this idea of authentication.

Specifically, how do you authenticate one service to another service and how that would have to change in the future?

So we started to work on building a company to tackle that problem, and along the way we also helped to bring to life a number of great open source technologies such as SPIFFE and SPIRE, which we're going to talk about today.

We sold the company to HP Enterprise in Q1 of 2020 , and inside the organization our mission is to continue to build trust into all of our products and through our products to our end customer offerings.

SPIFFE and SPIRE remain essential components of all that.

Marc: Great. Let's dive in a little bit. I'd love to understand more SPIFFE and SPIRE.

There's two different projects, one is a spec and one is the implementation of the spec. Could you--?

At a high level, let's start there, just explain what the SPIFFE spec is trying to declare and what its intent is.

Sunil: Yeah, sure.

The SPIFFE specification is fundamentally a specification that's designed to help create this idea of a universal identity control plane for these distributed systems that we're talking about here.

SPIFFE itself is an acronym, it stands for Secure Production Identity Framework for Everyone.

It's a play on some naming conventions inside of Google that I was privy to when I was at Google prior to starting Sitel.

You could think of it as this set of specifications to create a framework that is capable of then bootstrapping and issuing identity to services across heterogeneous environments and different boundaries.

There's a number of specifications, but the one specification that is at the core of it is this idea of a short lived cryptographic identity document called the SVID, or the "SPIFFE Verifiable Identity Document."

What ends up happening is that this document is what captures and holds the specific SPIFFE identity that can then be used to identify a workload when it's authenticating to other workloads, or if it needs to establish TLS connections, or it needs to be verified using JWT tokens and things of that sort.

So SPIFFE is basically fundamentally that, and maybe we'll start even further up . An identity in the world of SPIFFE is called a SPIFFE ID, and that is a string that is designed to uniquely and specifically identify a given workload.

These IDs can also be assigned to intermediate systems such as a group of virtual machines, but they can also be used to identify in any given individual instance of some workload, like a container or what have you.

These SPIFFE IDs, they look like-- They come in the form of URAIs, and so they can have this naming convention that looks very similar DNS, except in our case we start off with SPIFFE:// and then the rest of it is broken up into two parts.

If you have something like SPIFFE://Acme.com /billing/payments, the Acme.com defines the trust domain for the identity itself. The domain within which these identities are being issued and verified, and then everything after the .com and after that slash, it's considered to be components of the workload identifier itself, which uniquely identifies that workload within the trust domain itself. So that hierarchy can go as deep as you want, we didn't make that as prescriptive as others might have expected because we didn't know what types of conventions were going to be used across the enterprise landscape to define that.

OK, so that's what the SPIFFE ID is. The SPIFFE specification goes into much greater details about that format and the use of SPIFFE IDs itself.

Now, that ID is then encoded into this thing called the SPIFFE Verifiable Identity Document.

It's document with which a workload uses to prove its identity to a resource or some sort of a caller.

An estimate is considered valid if it has been signed by an authority within the SPIFFE ID's trust to me.

There's one to one mapping between an SVID and a single SPIFFE ID, and as I said beforehand, that SPIFFE ID comes in the form of that URAI naming convention.

When we're early on defining these specifications, we needed to make sure that there was some sort of a ideally universally understandable transport mechanism through which we can actually ship these IDs from system to system.

One of the two currently supported formats was in the form of an X-509 certificate.

Inside of an X-509 certificate there is a subsection called a "Subject alternate name field," and inside of the subject alternate name field is where we actually placed that SVID document itself.

We chose that location because X-509s are generally supported across various SSL and TLS libraries out there, we didn't have to really change any implementations.

It would just pick up and parse that part of the X-509 transparently so that we get to go along for the ride and then make sure these documents are carried back and forth.

Then we also have an encoding within JSON Web tokens as well, because there might be situations whereby you can't terminate the connection with the X-509, you have to actually punch through something like a load balancer or an API gateway.

So the JWT itself actually carries that identity document all the way through to the back end itself, wherever it might be.

Let me stop there and see if that probably makes sense.

Marc: I think so. I have a Kubernetes cluster, and I have micro services, so a request comes in that requests us to go out to various different micro services to fulfill the requests.

Some synchronously, some asynchronously.

The SVID is encoded in that protocol, whatever the protocol is, HTTP-based but it could be a JWT Token, it could be whatever.

In order to identify and authorize that workload at the various layers of the micro services, is that what you just described?

Sunil: Yeah. You used the word "Authorization," which is the third rail in our world.

One of the early things that we worked on with the open source community that we had back in 2017 and probably for the better part of 2017 was to create a hard delineation between authentication and identity separate from authorization.

For us and for SPIFFE, SPIFFE is about making sure that we can uniquely identify and test the provenance of a given workload and then assign to it a strongly secure identity that's universally understandable.

That document can then be built upon by other downstream authorization systems to be able to define what types of privileges should be associated with that identity.

So, SPIFFE and SPIRE as implementation do not necessarily reason about the authorization aspects of it.

We leave that to other systems of which there are many inside the enterprise landscape today. But generally speaking, that is correct.

The idea here is that these SPIFFE standards create a common and ubiquitous way to identify a workload regardless of the infrastructure you're deploying it on, whether it's private cloud behind your data center or whether it's public cloud in AWS or Azure or Google cloud, and also regardless of the platform that you're using to orchestrate your workload.

So you can choose to use SPIFFE with our middle instances, you could choose to use it with vanilla VMs, you could choose to use it with containers.

We're actually just spinning up a subgroup to start evaluating the use of SPIFFE in the context of Serverless end functions as well.

Marc: OK, so maybe a really basic question.

I have my Kubernetes cluster running and I've implemented micro services and handling sending the requests back, I have a service mesh and I have whatever it is that that I have, and I'm handling all these requests.

But these back end micro services are only available inside the cluster, there's no ingress into them.

Why is strong identity important in that world?

Sunil: A couple of reasons why, even the presumption that there is no back end access, I think, is a bit of a misnomer.

Because even in a world where you've got Kubernetes clusters that are instantiated, egress mechanisms exist and you still have communication patterns that exist between container to container, off node to off node, that are still going to have some level of discourse and conversation that's happening.

That's not always going to happen necessarily entirely within the cluster itself, so in an ideal world you'd have everything running inside of a Kubernetes cluster.

Kubernetes cluster is implemented with all of its own unique security capabilities and then itself is surrounded by an increasing layer of security control points that hopefully do a decent job of ensuring that an attacker cannot necessarily find its way into a workload and somehow compromise it to be able to usurp data.

Keys, privileged information, PII or whatever it might be.

One of the things that we're trying to build into this world of micro services architecture is a lot of the things that enterprises are trying to do, they're trying to implement a set of more granular, key based, secret based authentication mechanisms between any given service talking to any other service.

In a legacy world, a lot of financial services and government customers and the like would have utilized technologies like Kerberos, and for many people that are there on the block and have worked with Windows NT 3.5 and 4.0 systems going back many years, Kerberos is the underlying tokening system that can be used to grant time bound tokens and scope out tokens for a given system to be able to make a request to another system. Then when that token effectively comes to an end and has to go back and fetch a new token so that there is always some sort of a last mile check to ensure that this thing is allowed to actually be able to communicate with something else, because it holds a token in and of itself.

That model is a model that's now baked in and embedded within a lot of industry standards that still-- As enterprises move into the cloud they'd like to be able to continue to use that.

So with SPIFFE, it's effectively giving you the ability to emulate those same types of systems, except it does so at scale and it's pretty much completely automated.

It's designed to serve as a centralized platform identity service that can allow for you as a central security engineering team or DevOps team or whoever is the one that's responsible for defining some of these security controls, a centralized way to almost whitelist "What are the types of workloads that should or should not be running in any given workload ?"

And then defining the-- Then using the set of attributes available from the infrastructure to attest to that identity?

Then use SPIRE to basically automatically run this system so that any time a workload spins up on any platform, rather than have it preloaded with some pre-shared key or some standard token or some static secret or whatever it might be, it can be born almost naked.

Then the first thing it does is it can go and initiate a set of calls down to the SPIRE infrastructure, so that it bootstraps trust from the get go and it gets its credentials or whatever it needs, and then it uses that to initiate authenticated communications to any other system.

The good thing in micro services architectures is that instead of just relying on that authentication flow to happen in back end client libraries that are embedded within the application itself, you can take advantage of things like Envoy.

Envoy is a popular open source service proxy that's widely used to provide abstracted, secure, authenticated and encrypted comms between services.

Instead of having this logic embedded in back end client application libraries, you can instead teach Envoy, which sits in front of these services, to think about these identities so that the back ends rarely have to even be changed.

All of that identity, attestation, authentication and encryption flows can happen from one Envoy instance to another Envoy instance, so it's a nice way of injecting in strong forms of very granular authentication without necessarily having to disrupt your back end services in a material way.

Marc: That makes sense. I think that's a good transition here. Like, SPIFFE is a spec.

I think I understand what the spec is describing and how it works with two supported methods, X-509 and JWT tokens.

But I want to continue to dig into SPIRE a little bit more, so SPIRE is in a separate CNCF project right now, but it is an implementation of the SPIFFE spec.

When you talk about using Envoy to help bootstrap, that's all work that lives in the SPIRE project, correct?

Sunil: That's correct. When we launched the effort, we launched the effort by making sure that we had a separate spec and an implementation of the spec.

Because we felt as though the world is pretty darn big, and there's going to be plenty of organizations and plenty of people with great ideas that might not necessarily all be convergent.

We wanted to make sure that the specification was out there for people to rally around and decide which implementation made sense.

We, as a team, when we started Sitel and we built our initial community, all started working together on SPIRE.

Which as far as I'm concerned, is still the de facto implementation, but there are others that are issuing and implementing the SPIFFE specification in various forms and fashion.

That includes projects like Istio, Istio has a component called Citadel which is using the ideas for all of its workloads.

Instead of HashiCorp, you've got the console connect office.

It's the service mesh offering that uses SPIFFE to establish service identities.

You've got something called Kuma as well, which auto generates these 50 compatible certificates to identify services and workloads running in their mesh.

These are some projects that are actually using SPIFFE, and we also have folks that are increasingly consuming SPIFFE.

Envoy we've talked about, we've been able to adapt Envoy so that you can use SPIFFE ID to establish mutual TOS connections between ongoing proxy.

Pinterest has a project called Knox where customers of Knox can authenticate to Knox using SPIFFE identities.

Then we also have the ghost channel proxy, which comes out of the Square engineering team, where they can use SPIFFE IDs to establish mutual trust connections between ghost channel proxies as a whole.

So, we have a set of increasing number of issuers and a set of increasing number of consumers that are of varying forms, adopting the SPIFFE standard and also building upon the SPIRE implementation.

Marc: So if I want to add SPIRE into my application, it sounds like you mentioned earlier that I don't have to go make a bunch of code changes and add a new library in and implement that all, Envoy is one method.

Is it as easy as just literally adding a sidecar in and that'll bootstrap the identity services for my micro services?

Sunil: Yes, there is Envoy right there.

In the case of Envoy, one of the components of Envoy is this thing called the Secrets Discovery Service, or SDS.

Envoy uses the SDS to retrieve and maintain its updated secrets from an SDS provider.

So basically, what we're doing is we're making SPIRE one of those secret providers, so that transparently whenever Envoy needs a secret to be able to utilize for authentication or for encryption, it can just lean on the underlying SPIRE infrastructure and then you're off to the races.

If you've implemented Envoy correctly and with standard formats, you can pick up some of the integrations that we've done with that Envoy project so that you've basically got that money.

If you don't use Envoy, you've got the idea of sidecars, you can run these things as sidecars as well and get the same benefit instead of having to necessarily roll it into the back end itself.

Marc: Can you help explain a little bit more about what's involved in that bootstrapping process?

Bootstrapping crypto and identity is hard, and that's what SPIFFE and SPIRE in particular is doing.

Like, what happens if I have the sidecar injected into one of my pod specs and spin it up in Kubernetes, what does it actually do?

Sunil: Let me take you through the basic bootstrapping process that we're going through itself, because I think that's probably a worthwhile exercise to talk people through as a whole.

First and foremost, what we're doing is just-- We've got this documented on our website, so you can obviously follow along, but it's a good way to understand what's happening here.

Here's how it basically works. First and foremost, you're going to launch a SPIRE server on some host that you have running inside your organization.

That SPIRE server is then going to automatically generate something called a set of "Trust bundles," and these trust bundles basically define what are the world of SVIDs that are available in a given trust domain that any given workload should receive an SVID from some other service, and then go query against to determine whether or not this is a valid identity and what the authentication and attestation flows should look like for that.

So, first you've got this SPIRE server that's installed.

Then there's an assumption that you've got SPIRE agents that are running on whatever nodes that you plan on using, so your nodes could be plain vanilla VMS, it could be a sidecar.

You could be running a sidecar proxy, pods itself, it doesn't really matter to us.

When the agent turns on the first thing it's going to go do is it's going to go perform a process called "Node attestation."

What node attestation is, is a process to basically prove to the SPIRE server the identity of the node that the agent itself is running on.

Before this, the SPIRE server starts communicating and sharing privileged information with the SPIRE agent, the server is basically saying "Who are you? Prove to me that you're running on a legit system," etc . So for example, if we had an agent performing node attestation on a AWS EC2 instance, it might tap into the database instance identity document that is a privileged API that Amazon produces. Then it sends that over to the server, over a TLS connection, using a bootstrap bundle that's pre configured on the agent itself.

We had to ship something with the agent so that it can establish an initial TLS connection back to the server itself, so that we didn't have that going over open non encrypted communications as a whole.

So once it sends whatever proof it needs to the server, the service says "Cool. I need to go verify this."

In this case, the SPIRE server would say, "I see this as an AWS instance identity document. I know how to go read that. Let me go ask AWS to go validate this."

It sends a request down to the AWS API to acknowledge the document is valid, and if that document is then valid the server basically says, "Cool. I see that this is legit. Let me go see what else I can actually glean on from the platform in terms of being able to do any of this node and node resolution activities."

And then it then says "OK, agent. Now I've confirmed you're legitimate, here is an SVID."

Which basically is the identity of the agent itself, we're not even talking about workloads at this point.

We're just talking about bootstrapping the underlying systems, so now we've bootstrapped the identity of the agent on the host itself and it now has that legitimate document, because we verified that through this back end attestation flow.

The agent then contacts the server using that SVID and its new TLS client certificate to obtain whatever registration entries it needs to be able to serve and sign any given workload SVIDs.

Now at this point, the system is fully bootstrapped.

What happens is the agent then turns on something called "The workload API."

This is the interface, the northbound interface, to any workload that's going to spin up on that host itself or on that node itself.

So now, let's assume that a workload spins up on that host .

What would need to happen in the back end application is that there would have to be some configuration that says "Upon boot, as part of a config script or boot script, one of the first activities has to be to call this workload API."

It can do so in an unprivileged manner.

You don't have to authenticate to the workload API , so when the workload calls down to the workload API from that agent running on the same node, that request is basically saying "Agent, I'm a new workload. I don't know who I am. Tell me who I am."

What the agent then does is, it says "Hello."

It begins this workload attestation process by calling its different workload testers and providing them, for example, with the process ID of the workload process itself.

It goes through this iteration where it says, "OK . I see you are a new workload running on the same node that I'm on. Let me go see if I have a match against all the other set of identities that have been predesigned and predesignated by the centralized team, and it then goes and runs through a variety of matches.

If it doesn't find a match, it doesn't give it an identity.

It says, "Sorry. You don't have an identity."

Now the operator can say, "OK. This thing didn't get an identity from the underlying system. Do I want to keep booting this thing up? Do I want to raise a flag? What do I want to do there?"

And that's really up to the operator to decide what they want to do in terms of allowing that workload to continue running or not.

But let's assume we do find a match, when we do find a match what ends up happening is that the agent will then basically issue to the workload the correct SVID itself.

And that SVID can either be delivered to it, as we said beforehand, in the form of either that JSON web token or in the form of an X-509 certificate, and then we would deliver that to whatever system is requesting it.

In this case here, if there was an Envoy proxy sitting in front of the service, it would basically be a transparent transaction with the actual Envoy service itself where all of those SVID X-509s are held.

If there was no Envoy and it was held directly in the back end in the back end client library would be aware of that, or in the case of the pod itself the pod would be given that X-509 certificate or that sidecar would be given the next X-509 certificate as well.

Marc: So, you describe the bootstrapping process really well.

I get that there's a lot of complexity that SPIRE is handling for us.

What about rotation? Does SPIFFE and SPIRE start thinking about how these are short lived credentials?

At some point, if my pot happens to live for a long time, that's going to expire.

I need to handle rotation of the certificate. How does that work?

Sunil: That's a good question.

The general idea of rotation is that this is something that we want to make configurable by the operator.

Based on whatever threat modeling, based on whatever type of scenario you have in place for a given workload, based on the geography, based on the sensitivity of the information its processing, and based on whatever factors you care about.

That's the key point, is that we don't know what you care about.

We've built a system in place such that there is a detailed process so that for any given SVID that's generated, it can be predesignated with a certain title that the entire SPIRE system is aware of.

At the point where that SVID reaches the end of its TTL, it would then effectively force a reattestation of that workload.

So that we have to go back and say, "OK. This cert is no longer valid. If we want to keep serving up legitimate comms, we need to go make sure and reattest the system."

What it will then do is it will go right back through that attestation process.

The agent will go back and check that process ID, it will go check its table to see what corresponding identities it has, and if any of those factors of attestation have changed it will just automatically pick it up, reverify those factors against that requesting workload, and if the match is still found it will then issue a brand new certificate.

If a match is not found, it basically doesn't have a legitimate certificate anymore.

Then again, out-of-band it can issue some sort of an alert or some sort of secondary notification that says, "We've got something wrong here. There's not a match."

Then we can actually go triage and debug whatever is happening after that, but that's a second order consideration.

Marc: OK, yeah. It's a complex problem that you're attempting to solve here.

I'm curious why when you created the project, why the team decided to donate it to the CNCF and make it a CNCF project.

Can you help explain that back story a little bit?

Sunil: Yeah, no. For sure. So, maybe just take a step back--

Technologies like identity or concepts like identity, specifically around service to service identity, are not for the faint of heart. They're very complicated in terms of the implementations, the standards, the technology, the crypto.

There's a lot going on and there's a lot going on at a layer of the infrastructure that not a lot of enterprise developers really understand intimately, and not a lot of security folks understand intimately.

So we realized that if we were going to be proselytizing newer concepts born from older ideas, similar in nature, we felt like the most appropriate thing to do was to make sure that this was going to be built out in the open.

First and foremost, what we wanted to build is in the open so we could be very transparent to anybody that has questions about how we're doing what kind of crypto we're adopting, "How are we doing key exchange, how are we doing bootstrapping?"

So that we can have the best and the brightest come poke and prod and contribute to make this as robust and bulletproof as possible. So first and foremost, we wanted it to be open.

After we decided that we were going to make it open, we decided that it needed to be held in an organization that we thought represented and embodied the characteristics of the stewardship of great open source projects like Kubernetes.

So that naturally led us to the Cloud Native Computing Foundation.

So, that's why we ended up bringing this project to the CNCF and offering it to them, and then along the way this CNCF project through its organization and leadership and its community realized that these two projects were pretty critical for continuing to promote some of these cloud native models, like container orchestration and beyond.

Marc: Can you explain the governance model that you're currently adopting for SPIFFE and SPIRE?

Sunil: Yeah, sure. Right now there's this two layers of governance inside of the project, so there is something called the SPIFFE Technical Steering Committee.

This is an organization comprised of a number of individuals whose responsibility is to provide broad governance of the project itself.

The TSC itself, it has responsibility over the direction of the project. It's got oversight in terms of the contribution, policy, things of that sort.

It is not necessarily responsible for the day to day review and approval of any given PRs that are submitted by anybody to the project itself.

Instead, you can think of it more like a broad general oversight group whose primary job is to make sure that the entire project is staying on the rails and directionally heading in the direction that we plan for in the long term.

The TSC is comprised of at least five members, and we have a whole bunch of rules around what that structure looks like.

Voting, diversity, representation of organizations, term, things of that sort that you can read all about as well.

So in addition to the technical steering committee, we have maintainers like any other project itself.

This set of maintainers have a variety of activities, we are looking for people that are active SPIFFE contributors.

These folks have to respond to PR review requests, they're responsible for ensuring that any code submissions meet any of our coding conventions that are consistent with our goals and directions of the project, and then they're also responsible for merging those PRs in as well.

We have a set of maintainers as well, some that come from the Sitel team, some that come from outside the Sitel team as well.

We're always looking for more maintainers to come on board and help share the load.

Marc: You mentioned a couple of times there, ensuring that the project is moving directionally where you want it to go, where is that?

Like, what is on the roadmap and what are the next things that you're trying to work on and deliver for SPIFFE and for SPIRE?

Sunil: Something like this that we've worked on for the last two years, really making sure that we had a solid core engine in terms of making sure that the specs are looked at a thousand different ways, open for a lot of interpretation and analysis, and then to make sure that we see some level of stabilization on what the specification calls for and what it doesn't call for.

We've done the same thing with the initial implementation in SPIRE by making sure we build a strong, scalable, flexible and expandable SPIRE attestation engine that people can then build upon and build around to do something with these SPIFFE specifications inside of their enterprise as a whole.

One of the things that we're spending time on as a community over the next 12 to 18 months is to continue to focus on scale, to continue to focus on "What does it mean to run SPIRE in production?"

And "What does it mean to expand the set of integrations of SPIRE into third party systems that might need to become SPIFFE and SPIRE aware?"

Those are three top level goals that we have here.

The first one in terms of scale comes primarily from some of our implementors.

If you go to any of our community days, we run community days every quarter.

We open it up to everybody in the community and it's a good opportunity for everybody to showcase what it is that they're doing with SPIFFE, what are their use cases and where they're going.

It's no shock or surprise that Uber is one of the participants in this SPIFFE community.

They're obviously moving forward with some of the technologies, and I can't give too much of the details but they're operating at a scale that is going to be really pressing us as a project against the boundaries of thinking about, "What does it mean to deploy SVIDs at the scale that Uber operates?"

There's a number of things that fall out from that, that if you go into our backlog you can see how it instantiates in terms of the types of PRs and the types of chokepoints and scaling components we need to deal with. That's one part of it. On the production side, a lot of it is tooling. I want to be able to have the ability to plug in my existing operations tooling and understand what's going on with our security systems, I want to have granular reporting, I want to have metrics that I can actually be able to start understanding and analyzing to determine if I'm seeing any funkiness or flapping or any kinds of failures that might indicate something that I need to go look at.

You've got to remember that SPIRE is at the base, if you're going to adopt this, it sits at the base of your entire platform.

There is a lot of dependencies on this thing working really well, and if it's ever that we need as early a signal as possible to identify any issues so that we can actually address that, so that it can continue to serve whatever downstream systems and processes an enterprise has in place.

Again, you'll see PRs in the backlog that correspond to some of these production oriented issues.

Then lastly, as I said, expanding the support . Expanding the integrations into third party systems, third party cloud, I should say.

Being able to increase and make more easier the set of plugins that we can offer for orchestrators like Kubernetes, expanding the coverage across more service proxies, looking at direct back end database integrations so that you could perhaps even terminate and use SVIDs inside of a database itself.

There's a number of existing systems that need to become SPIFFE and SPIRE aware, and that's being driven by where our community is taking us as a whole.

As I said at the earlier part of this conversation, one area in particular that everybody seems to be coming at us with right now is serverless, so we're spending quite a bit of time looking at "What does it mean to provide SPIFFE SVIDs to Serverless instances running on any of the public cloud providers as a starting point? How does that work? How do you authenticate there? How do you attest that for something that is even more ephemeral than the idea of a container?"

So, that's another area of major work that you're going to see coming forward from us in the next 12 months.

Marc: I'd love to understand a little bit about that, because that's actually a great question that you just asked.

If I'm going to run a bunch of LAMBDA functions on AWS, could I do it today with SPIFFE and SPIRE?

Or is it just in the design phases right now?

Sunil: It's just in the design phases right now.

We just actually spun up a working group inside of the project to look at this more specifically .

Actually, Square just released something that talks about utilizing SPIFFE and SPIRE in the context of AWS LAMBDA functions.

Of the organizations that we know of, they're the ones that have taken the furthest steps in terms of thinking about how to make attestation work in the context of AWS LAMBDA functions.

So I can point you at some of that as well, but the working group is--

Their job is to take a couple of steps back and see, "What else do we need to be thinking about more broadly?"

Are there fundamental common elements or characteristics of what we are as a community calling a serverless function that we can then encode and say, "This is the baseline implementation to attest serverless?"

Then that might be more nuanced based on implementations from platform provider to platform provider.

Marc: Cool. The last question that I have for you around the roadmap is SPIRE and SPIFFE have been around for three and a half years, pretty mature, pretty stable, ready to run in production.

Are there specific use cases that you're looking to see more of, like other than integrations and things like this?

Anything new that you would love to see more use cases that you're waiting on, or any thoughts around taking it out of the incubation phase and applying for graduation in the CNCF?

Sunil: We have a whole bunch of use cases that we can go through, and I can I can point you to a link to that.

We've got being able to establish a mutual TLS connection between two SPIFFE identified workloads, where there isn't necessarily a load balancer or some sort of a proxy in between.

So, direct back end to back end communications, which we're seeing at a lot of enterprises.

We want to be able to support that with middleware in between, we want to support that with Envoys more richly, we want to support authenticating to back end data stores more richly, we want to expand the scope of the cloud platforms that we're supporting as well.

We want to support more secrets management systems , we shouldn't assume that the world just all of a sudden woke up and was never doing something before.

People have been using secrets for a long time, they've been using projects like Vault from HashiCorp and others to be able to still use static credentials.

If you wanted to bootstrap trust from a workload to HashiCorp itself, "Can we use SPIFFE and SPIRE to do that?" for example.

There's a number of places where SPIFFE and SPIRE can actually provide value.

There's interesting use cases around authenticating to a message cue, like RabbitMQ.

There's things like being able to tap into notaries that you can only attest using signed workloads via notary, for example.

I think the point here is that there's an ecosystem within the cloud native landscape of technologies that go hand in hand that have their own authentication and identity needs, but then also can be part of the authentication flow from one workflow talking to another.

So, broadly speaking, we're following our community and seeing what components are being really used and then taking a cue from them to prioritize the work that we as a community do to better support and simplify whatever combination of open source and commercial technologies we need to support as a whole.

Marc: So, if I have a Kubernetes cluster running or non-Kubernetes, but I'm running containerized workloads, and-- T his is still a big problem.

SPIFFE and SPIRE do a lot of the hard work, the heavy lifting, give me best practices to follow.

Do you have any roadmap or guidance for somebody for how maybe they can divide it into a smaller chunk and get started, and deliver some value quickly?

And be able to not think about, "How am I going to take my 150 or 15,000 micro services and apply identity to all of them?"

But what do you what do you recommend for somebody just getting started?

Sunil: Yeah, I think it's a daunting challenge, to say the least.

I think for most organizations, if you're going to get started with this it's more likely that you've got some sort of an operations SRE, security engineering organization that is probably 6-12-18 months out from really standing up an initial -- Almost like a test cluster in a public cloud provider.

They can prove and replicate existing controls, and if it doesn't replicate it, you evolve the state of the art of those controls to utilizing new technologies, whether it's things like SPIFFE and SPIRE and beyond as a whole.

In those situations, it usually doesn't start with 15,000 micro services, it usually starts with one.

Start with one micro service that you intend to move out to, a more dynamic computing environment happens to be running on public cloud, if that's the model of operation it can be running on.

Then work through the simple scenario of figuring out how you would actually bootstrap that trust into that single instance so that you can then have that instance of that workload, have its SVID identity that I could then use to communicate with another system.

So, bootstrap trust into one system and then figure out where it needs to communicate to.

If it's going to need to communicate something back out into the enterprise datacenter over a VPN, or something like that.

That's fine, then comes the next part of being able to teach the back end system how to actually parse these SPIFFE identities as a whole.

Now, some of that work is being done in the open, a lot of that's being done on the commercial side.

Sitel is a company that was working on those commercial bids. Others, I think we're looking at this as well.

So there's a lot that's happening there to facilitate early stage trying and testing of these technologies from cloud, to on prem, cloud to cloud as a whole.

But starting with one implementation in the public cloud, determining whether or not you're going to be adopting the service proxy model or not, you have to answer some of these questions.

Then once you've answered those questions, you can then join the SPIFFE Slack channel, join our SPIFFE website, and you can get connected with a number of folks who have already done what you're probably going to do and could provide you with a couple of reference examples that tell you all you need.

We've worked hard from a documentation standpoint to try to provide as much knowledge to the community about being able to use SPIFFE and SPIRE in different environments with different combinations of technologies.

We keep going back to Envoy because Envoy tends to be the most obvious way in which people think about introducing service management concepts into their enterprise, and if you do so, we've got lots of documents that talk about how to plug SPIRE and Envoy together.

If you don't want to go down that path and you want to do something else a little bit more bespoke, a little bit more tied to the client back ends, our documentation has a lot more details there as well.

Marc: Cool. SPIFFE and SPIRE started three and a half years ago at Sitel, and now it's part of HPE.

Since the acquisition, has that changed? HPE is a much bigger company than a startup.

Has that changed anything in the day to day, the operations, the roadmap?

What impact does that have on the project, if any?

Sunil: There's two things that that have impacted.

We should probably go back to why the acquisition happened, because this acquisition happened because as a company HPE is realizing that there is a tremendous amount of opportunity to deliver value to an existing massive customer base that we have from years and years of selling amazing hardware and software and beginning to deliver value increasingly to operations and application developers in the enterprise landscape.

So when this acquisition happened, we very much believe that there was an opportunity to help provide stronger roots of trust to the world of enterprise that ties together the software centric attestation bits that we've been working on as an open source project and as an early stage company.

But then tying it together with the fact that we have strong understanding of hardware roots of trust and more that can, for the enterprise, change the game in terms of how they can establish strong forms of rooted trust all the way down to chips and TPMs and things of that sort. All the way, and then surface that up to any given instance of a workload, whether it's running on HPE servers running in the data center or perhaps in a private cloud or in a public cloud even, for that matter.

For us, SPIFFE and SPIRE continue to serve as a pretty foundational component of our security architecture as we move forward with technologies like the container platform and the data fabric and some of the other technologies we have coming out in the future, so that means that we're continuing to expand our support within HPE.

We're looking for great engineers, great operations folks who got the desire to step in and work on some bleeding edge technologies like this, but to also connect bleeding edge with the existing .

In this world at HPE and even before HPE, it's not enough just to hope that the world wakes up and says, "We'll all evolve to containers eventually."

Or, "We all evolved to SPIRE." You have to light a path, and you have to light a path and you have to sometimes hold people's hand through that path into that future as well.

So at HPE we're going to be spending some time and some energy and resources to do that with our existing customers as a whole.

As far as our work on the open source side, nothing stops.

In fact, we're going to be even contributing more so on the open source projects than ever before.

I'm excited that we get to do so with a bunch of other organizations that are also taking those steps forward, us and other vendors in the software ecosystem, other end customers that are adopting this.

It's becoming really interesting, and that in part is why this project was recently promoted to the incubating phase.

Because I think the community recognized that we're seeing the uptick we need to that justifies moving it to the incubating phase, and hopefully to the graduated phase at some point in the future as well.

Marc: That makes sense.

I think the opportunity to come work for a mature company like HPE, but actually only work on having the opportunity to work on this bleeding edge open source software and getting paid for it.

It sounds like a great opportunity out there.

Sunil: Yeah, I think it is.

I think it's a great opportunity for a lot of folks that want to be able to push the envelope but do so within an organization that I think has a tremendous base to work off of.

That's what got me so excited, that's what got my investors and my co-founders so excited about the possibility of joining forces with HPE, so we're going to go do that now.

Marc: Great.

Sunil: I think it's difficult in a podcast to really effectively showcase the power of this technology, I think for people that have been tackling this world of security and service authentication as they're moving to more dynamic computing platforms, the ways in which they have historically done service to service authentication are absolutely going to show their wear.

Not because those ideas were bad by any stretch. Ideas develop and they evolve, they get better hopefully over time.

But I think what happened in this case is you saw a magnitude jump from the scale and the dynamism of computing that came with the introduction of containers and container orchestrators, a nd I think that technology leap is changing the way in which we think about the longevity of a given workload.

Going from spinning up a single use server in Iraq and having a one to one mapping of a workload running on that server for the next 12 months without ever changing it, to that thing lasting for five minutes and being torn down, and then 10 minutes later having a thousand instances of that same exact code spun up for another seven minutes and then having that torn down.

That volatility up and that volatility down, I think, is really where you're going to start to see the benefits of technology.

Things like SPIFFE and SPIRE, because it goes along for the ride. It does so automatically and keeps you apprised of what's happening there.

That, I think, is something that's difficult to convey in this podcast, so I would encourage your listeners to take a look at our website, SPIFFE.io.

There's a bunch of information there, use cases and case studies, videos of people that have been using this, where you can learn from as a community to understand what those use cases might be applicable to where you and your organization are going as a whole to.

Subscribe to Heavybit Updates

Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.

Content from the Library

Visit library

Apr 24, 2024

Podcast

The Kubelist Podcast Ep. #41, vCluster with Lukas Gentele of Loft Labs

In episode 41 of The Kubelist Podcast, Marc and Benjie speak with Lukas Gentele of Loft Labs about vCluster. Together they dive...

Jun 29, 2023

Video

Open Source Go-To-Market and Enterprise Readiness

One of the advantages of a bottom-up open source go-to-market is that when done well, it’s the most efficient flywheel for...

Jun 13, 2023

Article

Growing Together: Building Ecosystems to Grow PLG Startups

How to Build an Ecosystem to Support Product-Led Growth Any time you build a strategy for any org at a product-led growth (PLG)...