Ep. #32, LocalStack with Waldemar Hummer
In episode 32 of The Kubelist Podcast, Marc and Benjie speak with Waldemar Hummer of LocalStack. This conversation focuses heavily on how LocalStack is emulating AWS services locally, speeding up the development cycle for cloud developers, and maintaining their large open source community.
Waldemar Hummer is Co-Founder and CTO at LocalStack. He was previously Tech Lead & Architect at IBM Research AI, and Team Lead at Atlassian.
In episode 32 of The Kubelist Podcast, Marc and Benjie speak with Waldemar Hummer of LocalStack. This conversation focuses heavily on how LocalStack is emulating AWS services locally, speeding up the development cycle for cloud developers, and maintaining their large open source community.
transcript
Benjie De Groot: All right, as you've just heard we've got Waldemar Hummer with us today to talk about LocalStack. Waldemar is a CTO and co-founder at LocalStack. Very excited to have you on.
To get us started, Waldemar, I'd love to hear a little bit about your background, how you got started, where LocalStack came from, but even before where LocalStack came from, just tell us about the early parts of your career and how you ended up here.
Waldemar Hummer: Yeah, sure. It's a pleasure to be here. I'm Waldemar, and I'm the CTO and co-founder of LocalStack. Originally from Vienna, Austria, started my career actually working in academia so I did a PhD in computer science back in the day. I think it was 2014 I defended working on distributed systems, cloud computing and the things that were coming up at the time.
Then I decided to not continue that academia track and started a position at Atlassian, so I worked for Atlassian in the data team, actually I relocated to Australia. I was back within 2015-'16, and that was actually the first lines of code of LocalStack, they were written at Atlassian. We had this use case of essentially developing cloud applications and we wanted to make that more efficient so we came up with this idea of you could actually emulate part of the AWS APIs on the local machine and that would enable people to richly do things like develop their applications offline or while they're commuting on the train.
So it started back then as an Open Source project and then it was growing over the years. I was the core maintainer for the last couple of years, I was treating it more as a side project to be honest.
I had been in different roles throughout my career since then, I worked for IBM in the States for about two years and then also at a large insurance company in Switzerland, and now basically the last year or so we've actually been all in on LocalStack, really built the company and the team around it.
Benjie: That's really cool. So this really started off as a side project and then you said about a year ago you went all in and now you're developing it all the way. By the way, we do have Marc here with me as always, and I'm Benjie. I forgot to say that at the beginning there, but always a pleasure to have Marc here. Hi, Marc.
Marc Campbell: Hey, Benjie.
Benjie: Okay. So tell us a little bit more real quick, what is LocalStack? Give me a quick pitch, I am a developer at a, let's call it, 50 person company and I'm looking at LocalStack. I do use AWS because I'm one of the everybody, but just tell me a little bit about the pitch. What is LocalStack?
Waldemar: LocalStack is essentially, in a nutshell, a local cloud emulation platform.
At its core what we're doing is we're basically taking the AWS APIs and we provide an emulated, mocked version of that on the local machine and that helps people actually speed up their development cycles quite dramatically.
Because whereas previously if you were a cloud developer, basically your day to day job looks like you're making some changes on your local machine, you push them to the cloud, deploy the changes, run a few tests and then you see if something breaks and you need to redeploy.
This cycle of deployment and testing is becoming quite slow and slowing developers down, so we have this local emulation where you can now literally run your serverless workloads, your Lambda functions, dynamic DB tables and all that on the local machine which is really, really speeding up the development cycles for cloud developers.
Marc: So I have a question for you to kick us off here a little bit more, Waldemar, AWS has a lot of services. How broad of the emulation layer are you providing with LocalStack? How many of these services are you able to recreate?
Waldemar: Yeah, that's a great question. I think right now we have something in the order of 55 to 60 services that are supported in LocalStack. The services have different degree of parity and depth of support, I would say, so some services are really very well supported, for example, lambda functions, dynamic DB and a few other core services.
We also have a few more exotic APIs, things like the whole big data suite of Glue, Athena or even some database systems that you can spin up using LocalStack. We came to a focus on the core services that AWS provides, then I think it's currently a suite of 200-ish, and some of them are really pretty much out of scope which we're not going to tackle in the foreseeable future. But we definitely take it with a very strong focus on the core services that most people are using on a day to day basis.
Marc: You mentioned that there were mocks and emulations for dev environments. I'm curious as to how deep the emulation goes? If you look at Dynamo DB or S3, are they actually fully functioning databases? Or are they mock endpoints that you're providing that allow integration tests to complete?
Waldemar: Yeah, so for the most part they're real endpoints that we mimic and replicate the internal logic of these real APIs. One example is SQS, the Simple Queuing Service that AWS provides, we have essentially almost a 100% complete replicated emulation version of that. We also run what we call parity tests against AWS, so basically we have a set of integration tests that we first run against AWS, record the recursive responses and then compare them against run requests in the local stack as well. So that gives us a pretty high confidence in terms of achieving high levels of parity with the services.
Benjie: How do you stay on top of the changes that Amazon is making? Some of the services are relatively mature so the API for S3, as an example, not changing day to day but they're releasing new services. They do change new functionality. Amazon, these are proprietary closed source services, they don't necessarily always have a lot of upfront notification that there's going to be a new version of the API available. So what tools and what methods do you use to stay on top of all that?
Waldemar: Yeah, absolutely. So that's a great question, and generally speaking when we look at the evolution of LocalStack as a platform, essentially the codebase as it looks today is very different from the early days when we just started emulating a few services here and there. Today we're much more systematic about the approach of how things are tested, how we keep up to speed with the changes of AWS.
One thing that we're now doing increasingly is using the API specifications that the service providers at AWS is publishing, and we work a lot with auto generated stubs of the service providers. That also allows us to basically track the changes of the API specifications and then do things like automatically creating pull requests against our repositories to regenerate the stubs and the service providers.
It's essentially a matter of being very much based on what the specifications provide, and this is a very rich information that we can leverage in terms of all the types, input-output messages and schemas. When it comes to the semantic inner workings of the service, here we're working with these things like parity testing and also leveraging frameworks, for example, like Terraform.
Terraform is actually something that we use in our own internal testing because they have the very comprehensive test suite for the AWS provider of Terraform, and we actually run that against LocalStack, again, to increase the parity and get very high fidelity of the services.
Marc: That's cool, so you're taking these tests that were made to test the actual upstream service and then using them to validate your mocks, your emulation layer still conforms to those tests?
Waldemar: Exactly.
Benjie: Yeah, you mentioned this concept of parity tests, I'm just curious, can you just pragmatically walk me through how that runs and how you do those? Because that seems like a really interesting thing.
Waldemar: Yeah, it's actually a great innovation that was coming out of the team the last couple of months. There's actually two levels to parity testing. One is that we basically write all our integration tests in a way that the SDK clients, the clients that make the call to the AWS or the API, essentially, they are very easy to exchange so we can configure clients running against LocalStack, we can configure all the clients running against real AWS.
What that basically means is that we just run the integration tests twice, once against real AWS to make sure that the actual functionality is covered and then we run the same test against LocalStack and all the test assertions are in there, and hopefully it's going to pass.
So that's the more simple version. An extended version of that is what we call snapshot testing, which is really in addition to running it against the two systems. It also takes very detailed snapshots of the responses and records them in a JSON file, and then we make detailed comparison of are we actually returning the right data types?
Are all the dates and all the integer formats in the correct form and so on? Obviously we need to do a bit of cleaning of these snapshot files, it's auto generated IDs and timestamps need to be... We don't want to match against those, but essentially it gives us a very good mechanism to compare the core parts of the responses and make sure that they have good parity.
Benjie: That's super cool. What's your CI/CD process look like for building out LocalStack?
Waldemar: Yeah. It's essentially currently two mono repos, we have the Open Source version, the community version of LocalStack which is one large mono repo, which is basically the core platform, the core framework that does all the request parsing and has a lot of the core services. Then we have a second repository which is our pro extensions, which is a bunch of additional features, additional services that we support and we provide for our pro users.
So the CI process is basically we have quite an elaborate set of integration tests that we run on every PR build, we make sure that things in the pipeline is stable and we try to eliminate flaky tests as much as we can so we have a pretty stable pipeline at this stage. It runs for about 45 minutes, it has all sorts of parallelization build in.
We're actually building, as part of the CI process, a Docker image so that's the core artifact or the way how we ship the software, it's shipped as a Docker image. We built that for AMD64 and also ARM 64 so it's a multi arch Docker build. For example, also Macbooks, the M1 processors can use the latest ARM images.
Typically we just push out the latest tag, which is really the latest commit on the master gets pushed out as the latest Docker image. Then we have a sister semantic versioning for any versions that are--. We're still at zero vers, we're currently at 0.14.4 I believe. We actually have some simple plans in the pipeline to release a 1.0 version pretty soon.
Benjie: Yeah, and with the 1.0 the rules change a little bit and you really start to focus on compatibility and ensuring that customers know what to expect when they're upgrading, right?
Waldemar: Yeah, exactly. This is something that we've been putting a lot more emphasis on recently because we see that there are a lot of people are... Just to give you an idea, the Docker image of LocalStack, we keep track of some of the pull stats, it's being pulled something around 200,000 times a day. So that's the peak usage of Docker pulls.
We just see a lot of dependencies, if things are breaking in LocalStack or if we introduce a breaking change, then immediately our issue tracker gets filled up with requests and so we really try and have more stability and predictability in the changes in the versions that we push out.
Benjie: I'm going to go all the way back and talk about the origins of the project. It sounds like a cool project, we're going to dive a lot more into the technology and the what the product does. But you mentioned you were on the data team at Atlassian and that's where the first lines of code were written and then it was created, let's start there.
You said you wanted to be able to have developers, you wanted to be able to write code and build stuff while you were on a train and completely disconnected from the internet?
Waldemar: Yeah. So that's really also the story that we like to tell, of the origins of the project. The interesting thing is that we get the validation of this use case even today with a lot of customers who have this exact use case. They just want to be able to iterate quickly locally or even develop offline. But that was really the idea in the early days, we're basically using a set of services like Kinesis streams and Lambda functions and some S3 buckets and a bunch of big data jobs which just crunch some data.
It had a bit of the cost aspect to it as well, because for example if you want to run a very frequent CI build, then for example Kinesis shards are paid by the hour so whenever you trigger a build it actually used a shard for the Kinesis stream for maybe a couple of seconds but you pay for one shard hour. At least that was the pricing back in the day.
So the cost was one aspect, but then also just the convenience of having a very reproducible local setup of the same before you push it out into CI and remote. We really made it work with the local dev setup where you just have to run Make Install, and it was just installing everything for you and then Make Test and then it could execute all the tests we were running specific for these data pipelines that we were building back then, which is all work locally.
So that was the early days. It was a very simple version at the very beginning, so we also leveraged some existing tools in the Open Source base. There was a few emulators already popping up at the time and LocalStack essentially built some glue code and a framework around it to integrate these services.
One of the things that is bringing a lot of value is the integration between the services, so we already see out there some other projects that are building some isolated emulators for individual services. But what we bring to the table is also the integration where you can connect the SQS queue to a lambda function or you can put a file on S3 and that triggers some other SQS notification, so it's really the integration services that make it really powerful.
Marc: Yeah, that's a good point and it's definitely worth chatting about. While all these individual services that AWS offers are powerful by themselves, they're rarely used isolated from each other. You have these backend connections and they're like putting the file in S3 triggers a lambda function, the one you just described or whatever, and having these individual services running in your dev environment doesn't do anything if you can't actually reproduce that behavior.
Waldemar: Yeah, absolutely. That's really also I guess where today I would say if you look at... I can't back this with actual data so as I'm thinking about the GitHub issues that are currently being raised, a lot of it is about either entire services being missing or integrations between services. So I think that's something where we put a lot of focus on, and definitely integrations are growing in the AWS ecosystem, right?
And it's actually quite amazing what you can already do today. One example is cloud formation which is one of these APIs from AWS where you have a declarative specification of your resources and then it basically creates the stack for you. One of the concepts there is what they call custom resources, it's basically in your exact definition you can actually call a lambda function that has some custom logic that creates the resource and then returns the result back to the cloud formation engine.
That's one of the integrations we were able to build because we had the cloud formation engine, we had lambda support and just needed to connect them together. Now we have the full power of custom resources, which are highly used in frameworks like the CDK, for example, the Cloud Development Kit. They've been making heavy use of custom resources and we basically get that essentially for free now, which is nice.
Benjie: Just to be clear, these custom resources, that's an AWS feature, we're not talking about custom resources, CRDs in Kubernetes here right now?
Waldemar: Yeah, exactly. I know that your audience is probably definitely focused on Kubernetes, yes. So that's an AWS concept and slightly different.
Benjie: But let's talk about Kubernetes. Does LocalStack work on Kubernetes?
Waldemar: Yeah. We have a helm chart that's mostly community maintained so you can actually deploy LocalStack on Kubernetes. There's a few limitations right now, for example our lambda execution is pretty much based on Docker so we assume that we have access to the Docker socket and we can spawn new containers.
We're in the process of introducing an extraction layer that will extract away the spinning up of containers that we can either use just the Kube APIs to spin up a pod or even use things like Pod Man and other container frameworks. So yeah, it definitely deploys in the very basic version but there's a few caveats.
The other interesting thing is that LocalStack can be used to spin up Kubernetes clusters. We have EKS as an emulation in LocalStack and for that we actually provide two modes, one is just the mocking mode which basically just pretends that the resources get created which is useful if you're developing a control plain for some infrastructure where you're provisioning some clusters.
The actual emulation mode which really spins up essentially a K3D on your local machine and then you can deploy your workloads, so that's also supported.
Benjie: That's super cool, so you can emulate EKS by spinning up K3D. Do you start to be able to play around with things like managed node groups and things like this so my SRE team can evaluate what it's going to be like to operate some of that equipment?
Waldemar: Yeah, so we're definitely venturing in that direction. There's still a few limitations. We've built a couple of nice integrations, for example from the EKS cluster you can spin up pods that pull images from the local ECR registry where you push your images to. We had to do a bit of networking magic to make K3D talk to the other Docker container which is hosting the image registry, so that integration already works nicely.
Node groups themselves are something that we are still evaluating and working on, and things like Ingress for example, it works a bit differently than real AWS. Anything that has a public endpoint we obviously need to do some tricks to make that work locally as well, with some wildcard domain names and so on. But yeah, definitely something we'd be super excited to explore more and get the feedback on.
Benjie: Okay, hold on, hold. Everyone back up a second here, I want to make sure I understand this because this is cool and this is definitely cloud native. So if I want to use and emulate my EKS locally, okay, so I'm a developer and I'm using EKS and I'm developing something, I can actually use LocalStack itself to create an EKS Kubernetes cluster locally? Okay, so this is super meta here, but I can develop on EKS inside my local stack from my local machine and I've got basically the control plain and everything I want. Is that correct?
Waldemar: Yes, that is spot on. In fact we recently had a chat, we did a webinar with one of the core maintainers of the AWS Terraform module. That's a community project that Anton Babenko and a few others are maintaining. One of the use cases they had, they had a very complex Terraform module for EKS and the run it through their pipeline.
I'm not exaggerating, I think it takes at least 15 to 20 minutes, probably an hour sometimes to get this module tested. We're now thinking about getting basically the LocalStack EKS emulation into their CI pipeline which will dramatically speed up the entire test cycles for the EKS module. But yes, it's spot on, what you said. You can run your EKS workloads with the control plain and all locally.
Benjie: Okay, this is so cool.
Marc: I have to say, LocalStack totally makes sense. I completely get the idea, great. I have a team of engineers and we use S3, we use SQS, we use some Amazon services and we have shared environments that we're devving on and we don't want to manage actual access keys and secrets, and also the cost. But also just sharing resources is generally hard inside dev environments because somebody wants to upgrade it and it breaks other ones.
But the ability to get in and actually emulate EKS in a functioning environment, for folks who are working, building Kubernetes tools, building tools for developers that rely on Kubernetes. It opens up a path that really has been really, really difficult to dev on before.
Benjie: Especially iteration time, I just can't say enough how exciting that is to me because iterating on Kubernetes and stuff like that, it just takes so long. I know you just used that example of the Terraform module, but I have to come over the top and say that's really super cool.
Waldemar: Yeah. We always feel like we're standing on the shoulders of giants here, so we're leveraging K3D which is a fantastic project and we do essentially a lot of the wiring around it and making sure that it's compatible with the EKS APIs, but also frankly there's just a lot of great tooling already in the EKS or in the KubeX system we already leverage which is also fantastic.
Benjie: This brings up a bigger question around LocalStack which I want to understand. Each one of these mocks, and they're not exactly mocks, they're kind of like functional mocks or I don't know the right way to put them, but let's take a simple example. At Shipyard we use RDS and RDS talks to my S3, let's just call it, and I want to use my LocalStack version locally to dev on this stuff.
When I'm actually hitting the RDS API, is that just a PostgreSQL, well, let's say it's an RDS PostgreSQL, is that a PostgreSQL container? And for Kinesis or for Redshift or whatever, let's just get a little specific, how are you actually doing this for each one of these cool services? Maybe don't go through all 50 of them, but give me a few highlights here.
Waldemar: Yeah, definitely. So the majority of services are originally running in the main LocalStack containers. We have one Docker container which hosts, does all the request processing. We have one, we call it the Gateway or the Edge Port, which is port 4566 and all the requests are going through that port and we do the parsing and dispatching of the request. Then it gets forwarded to the services, the services are predominantly written in Python.
There's one Python process running in the container and then we're using some external processes for different services. For example, for Dynamo DB we use an external Jar file in a JVM so we spin that up and basically the Python process talks to the JVM process. For things like what you mentioned, PostgreSQL, there's actually different ways we can do this.
Either we spin up the PostgreSQL container and we actually install some Debian packages into the container, which is APT. If users just want to have one of the major versions that are supported in the package manager we can use those. There's a few examples where we spin up external containers, one is the example of Airflow.
There's the managed Airflow services in AWS, it's one of the more exotic services but still a couple of our users are using it. That really spins up the Airflow Docker container and we do the wiring with the request forwarding, basically. So there's different ways as to how the services are deployed, and also they have different mechanisms for things like persistence, for example.
We can maybe talk about that separately, but persistence is something that is also supported through services but there's different mechanisms as to how we do it.
Marc: So I have a question for you about all of that, what I do on a day to day basis, my company, we help folks ship on prem versions of their software and it's just going to run in someone else's environment. But a lot of vendors will build into these APIs and there's a whole different class of problems that aren't just dev environments and I'm wondering if LocalStack is potentially an on prem replacement for a single tenant, production grade use of some of those APIs or you would just recommend not going down that road?
Waldemar: Yeah. We definitely get requests for running LocalStack as a replacement for production workloads, not having to spin up resources in real AWS. We generally tend to not recommend using LocalStack for any productive workloads, simply because it was not designed for these kinds of things, for performance reasons, for security reasons. So it's also, frankly, a bit of a liability question because we don't want to be responsible.
We're not in the business of hosting production applications of customers. We do know, though, that there is a few installations out there which are hosted in some, for example, internal intranets which are then used. Like shared instances that are then used for all the CI work, for example, because you can even speed up the CI builds even more if you just use a pre installed, pre deployed LocalStack version that's running somewhere in the infrastructure and is shared.
We're probably going to move more and more into the area where LocalStack becomes really stable as a long running container that you can have, essentially almost like a replacement for production workloads but currently we don't really recommend doing it, to be honest.
Marc: That's totally fine. So while it may work, it's not the supported workflow. I totally understand that, and hear that. I have a question though, so I'm running it in my dev environment, but my team is like 20 or 30 engineers and we have a platform team. Is the recommended path for the platform team, for orgs to manage a shared LocalStack installation and then all of the individual developers or development teams can use it? Or, is LocalStack really built so that it's actually in my namespace, every developer has their own installation of LocalStack?
Waldemar: Yeah, so I think that the preferred way to use it is literally everybody has their own local version that's running in their Docker environments. We had a few use cases where customers were essentially developing on remote, online IDs, things like Git Pod or Cloud 9 or these online IDs. There we have essentially an easy integration where you can spin up the LocalStack Docker container in the context of this remote ID, for example.
So that's also possible, but generally speaking to get the most benefit from using LocalStack, you should just literally run it on your local machine because then you can leverage things.
We have one feature that we call the lambda hot swapping or hot reloading, which basically means that you create a lambda function and the regular process would be that you upload a ZIP file that contains the code of your lambda, and then it gets put into a free bucket and it gets deployed and started, and so on.
But what we offer with this hot swapping is you can actually mount a local directory on your local host, on your local machine, into the lambda container that's running your Docker so any change you make to the handler files immediately reflects in the next invocation of the lambda.
That allows you to do even quicker iterations in your dev cycles, and you can leverage most of these benefits really the best if you just run it locally. It fits into LocalStack, LocalStack Start and it basically starts off, so that's the preferred way.
Benjie: Yeah, because this is the shell section of Kubelist today, I have to say that we have a few customers that use LocalStack and actually we do as well at Shipyard, and it's really cool because you get these self contained external services for every pull request and it's really, really, really cool. We have not done any of our own testing of our own EKS clusters, but I was just inspired by that and that might be something that we will be playing with later this evening or tomorrow.
But yeah, Marc, I can tell you as a LocalStack user at times or a peripheral user, if you will, my team uses it, it's really good at just giving you a look, a one to one for a dev, or a one to one even for environments. It's a really good use case for LocalStack. One thing I will ask a quick question on as a user of LocalStack, am I going to be able to pick and choose which particular services I want with my helm chart? Or at Shipyard obviously we use Docker Compose?
But one thing I have noticed is that it can be a little bigger than I want it to if I'm just dealing with, say, SQS and S3 and PostgreSQL. I just wanted to ask, is there plans or are you already doing this and I don't know about it, how do you break these things up? Or is it all just one big image?
Waldemar: Yeah, that's a very, very good question. Currently the way to distribute LocalStack is essentially as one image that has most of the code baked into it. So most of the services, if they're written in Python then it's just some P Packages available in the container. Then we also do things like lazy reloading, some dependencies, so even though you can actually run your workloads offline there might be some connectivity required to lazy reload some dependencies.
But ultimately it's still this monolith, currently. What we're now doing and is also something I'm very excited about the upcoming release that we're now working towards is opening up LocalStack as a platform where you can very easily plugin extensions. We envision that over the next couple of weeks and months even the core AWS services that we provide will essentially become just extensions and plugins in the LocalStack platform.
We're also going to demonstrate how you can build your custom image, for example, that has just the service that you need, like SQS, S3 as you mentioned, and you can build a custom Docker image, a stripped down version that only has what you need. So we're investing a lot in the installation route to Unix services in the whole lifecycle, what are the different stages and phases of the service lifecycle as it gets loaded, as it starts up, as the health point responds, as we are reloading and storing into a persistent state.
So those things are really helping us to componentize it much more and open it up as a platform that's highly configurable over time.
Benjie: I'm really excited to ask you about future plans for opening up it, but before we do that, we're recording this episode about a week or two before V1 goes live and we're probably not going to get it out for a few weeks after that. Maybe you just told us, but what's going on? Why's it going to V1? What's the big news? Tell us what we can look for and what the big differences are.
Waldemar: Yeah, so we're super excited and it will be very interesting once this episode goes live, to see whether we actually made our timeline for July 13th which is the planned release date for the Version 1.0 of LocalStack. Essentially what we go out with is a polished version that puts a lot of emphasis on new features on the one hand side, polishing existing features and also new documentation, and some rebranding of our website and the corporate branding that comes with it.
In terms of the new features that we're to be providing, there's a couple of highlights that I can talk about. One is a new version of our cloud pods feature, cloud pods in the Kubernetes context, it's not to be confused with Kubernetes pods. For us a cloud pod is basically a persistence mechanism, it's a persistence snapshot of your instance that you can take while LocalStack is running and then you can store this state snapshot to a server and later on pull down that state and create the exact same copy of the state that you had before.
It's almost like operating with Git Objects where you do push and pull, you can now push and pull your cloud pods. This essentially also enables team collaboration where your team members can easily share the application state. That is one of the features that we're super excited to have go out with the polished version.
The second piece is the extensions framework that I just mentioned. We are basically demonstrating to the world out there that we are going to open up LocalStack as a platform and venturing out into new territory in addition to AWS, which we're now focusing on. We're going to have some demonstrations of additional API emulators that you can easily plugin.
For example, we're looking to emulate the Stripe API, a very simplified version of Stripe, and then you can develop your application to use AWS LocalStack and also the Stripe emulator to develop your apps locally. We really hope to create some corrections in the community to create these extensions, and ultimately come up with some sort of registry of extensions that people can then exchange and can easily share with each other.
Benjie: Hey, everyone. Go check out LocalStack, it hopefully came out. V1, it hopefully came out July 13th. But also, this cloud pod thing just real quick, is that a memory snapshot or is that just a volume snapshot?
Waldemar: It is basically a memory snapshot. By default when you use LocalStack, its state is ephemeral environment. Benjie, I know you're a big proponent of ephemeral environments. Basically you spin up your instance and you tear it down, restart, and you get a fresh state, no persistence by default. Cloud pods is now really a mechanism to take a snapshot of the current running instance so that you can then later on inject it back into the instance and restart from the same state.
Sorry, just one point. This is also quite nice for if you want to do some things like pre seeding some CI environments. Let's assume you have a CI build with some tests and you want to depend on some S3 buckets, some Kubernetes users and some lambda functions. You can just pre file one of these cloud pods and inject them into the CI environment and it's like up and running within a second and ready and available.
Marc: A little while ago you were talking about a CI process, and you were saying, "I'm not exaggerating this, the CI process takes 30, 40, 50 minutes sometimes to complete testing at the end." I think it's worth pointing out that that's not unusual really anymore. It's a problem, it's slow, it massively impacts velocity for most engineering teams.
But what you're doing cuts that down tremendously so I can actually take what you just described with the cloud pods, I can have a snapshot and say, "I have a test that I need to run, a migration from Point A to B," and instead of having to go through the entire process to just spin up Point A I can be like, "Boom, here it is," one or two seconds and then actually test the part that I care about, which is that migration.
Waldemar: Absolutely. Yeah, totally spot on. It's both the ability to iterate quickly with the local APIs, but in addition also having a way to prepare these snapshots, these cloud pods. That's something which I think really distinguishes us from the real cloud providers because that's something that I would assume is very hard to achieve, taking an actual snapshot with all the entire account information and then moving that to a different account.
There's a few scripts for doing this, these AWS clone scripts for example that are just pulling all the resources and then recreating them. But then you have things like the identifiers are changing and so it's very hard to almost impossible to have a full, exact replica of an environment. We can easily do that because we just take a snapshot and you can spin it up again, and that's pretty powerful, we believe.
Marc: So we've spent all this time talking about AWS. Are there plans or anything in the product right now to emulate other cloud providers?
Waldemar: Yeah. That's a great point. We have been working on an Azure version of LocalStack. There's been an initial version with a beta program for a few selected users. Frankly, we still have quite some work to do on the AWS so we're currently, at least in the next quarter, Q3 of this year, focusing on AWS. But then we're now working on a lot of foundational pieces of technology that we can easily replicate and reuse for the next cloud provider that we're onboarding.
The whole HTTP handling framework, the request response parsing, the state management, persistence. We definitely anticipate venturing out into the next provider will be a smoother process. Also from what we've seen, we've spent a lot of time looking at the specifications of the APIs, and you can really see that AWS has grown quite organically over the years with the different services with vastly different types of specifications.
Azure looks a bit more homogenous, so the services are quite comparable and they're all based on an Open API spec so it seems like there is an opportunity for us to get that out much quicker than it would be for AWS.
Marc: Yeah, that makes sense. You definitely don't want to build LocalStack n number of types from the ground up, where n is the number of cloud providers you support. There's definitely some reusability hopefully there.
Benjie: Okay. So LocalStack is an Open Source project. I know that you guys have been heads down for a year making a company out of it. Talk to us just a little bit about how you monetize and how you're going to be sustainable.
Waldemar: Yeah, sure. Absolutely. We're basically a very much Open Source-first driven company. We have a very strong footprint in the Open Source, the community version is very powerful, has a lot of services, the core services that most users are using. We also contribute a lot to upstream projects that we depend on and we use in our ecosystem.
That's the Open Source part, and we get a lot of traction from the Open Source users. Our GitHub repository is quite popular with something like 40K stars. Then the upgrade path is usually that people are discovering maybe some API or some feature is missing in the community version, and then they can upgrade to pro which is our first commercial tier.
We then have actually two more tiers, one is called Team which focuses on collaboration features and the cloud pods which I mentioned before. Then we also have an enterprise feature or an enterprise tier, which is for more high touch customers, larger organizations who, for example, need customizations of things like an offline image of LocalStack or a special customization with integration with their auth system, SSO and so on.
But yeah, basically this is what the model currently looks like. We do get a lot of requests for the pro version and team and upwards, so it seems like there is definitely a demand for this. We're also trying to feed more and more of the innovation that we do in the pro version back into the community version, to the Open Source because we really see that over time we really want to focus on the platform aspect and opening up LocalStack really as the platform that enables integrating a lot of these different extensions that people can actually contribute as well.
Benjie: That makes sense, the pro, team and enterprise version focusing on collaboration features and some of the custom stuff you've built on top of it. Do you ever differentiate the tiers based on AWS emulations, emulated services? For example, would you ever have some services that AWS offers only available in your pro extensions or team version or would you always keep those in the Open Source version?
Waldemar: That's actually what's happening already today, that some of the services which we know are a bit more maintenance and we put quite some effort into maintaining them, things like Athena with this whole big data ecosystem. Also Cognito and a few others. Those services are actually exclusively only available in the pro version and above, and for the community version we have a set of, I think it's by now 30 plus services, which are really the core services that most serverless projects will use for individuals or smaller teams. Then once they need something that's a bit more specialized, they'll happily usually upgrade to pro and to the tiers above it.
Benjie: Sounds like some pretty smart PLG growth there for those. I learned a lot about product led growth these days, right? But that's really cool, to see it working and to see an Open Core project like LocalStack growing. You mentioned that you folks have 40,000 GitHub stars, that's pretty big. Talk to us about the community, tell us where it started and when you started to see it really take off. I mean, 40K stars is one of the bigger projects that I know about, not that stars are the best measurement necessarily.
But tell us about the community, tell us how it started, and then also what I'd love to hear is... I know you guys are growing as a company and it seems like maybe you've recruited some folks from the community and I think that's really interesting for our listeners to hear about how all that went down. So just talk a little bit about community, how it started and where you started to really see ramp and growth and what you've done to cultivate such a vibrant community.
Waldemar: Yeah, absolutely. Essentially the early days, I think it all started with some Hacker News posts that we put out there. We put out this new idea of a local emulator for AWS services and we put it on Hacker News and it just got some initial traction. Back in 2017 or so some folks from AWS actually came across this and I think Jeff Barr, the chief evangelist of AWS, he put out a tweet on Twitter and that basically overnight got us, I don't know, like 3K, 4K stars on the repository. That really was the point in time when we started to take off quite dramatically.
We got a lot of contribution from the community, people started creating issues, a lot of samples, reproducible cases, started contributing pull requests. We've actually been in touch with a few companies over time that have entire teams, developer experience teams who's main purpose is to provide a dev environment based on LocalStack to provide a very smooth experience.
So we've got a lot of nice contributors there and we're now also launching some community events and webinars where we actually talk about the usage, what are the best practices, how to use LocalStack. We also want to engage more and more with the community through our Discord forum, page. We have discussed the LocalStack with Clouds, which we just recently launched where people can make feature requests and it helps us prioritize which are the features we should be focusing on.
So yeah, we're definitely learning a lot from the community, that's our daily bread and butter. Also a fairly active Slack channel with quite a few active members in there. That's also very enjoyable about the type of work we do, it's from developers for developers. We understand their problems and pain points quite well and it's very nice to work with this kind of customers because you just know what the problems are and how you can help them.
Benjie: So tell us a little bit about how you've built out the team, though? I'm going to follow up on that one because I think it's really interesting. Tell us about the people that have joined you, and not anything in particular, but it seems like they came from the community itself. Is that correct, or am I misunderstanding?
Waldemar: Definitely partially from the community. We just realized that a few people who were really active in the repository, we started connecting with them and got a few of them on board in the early days of LocalStack. Now in the last, let's say, year or so where we've been a bit more focused on really growing a team of engineers, we've also worked our own networks and really got a few people on board just from our extended networks.
But yeah, there's definitely oftentimes people from the community were just reaching out and interested in making contributions, talking about LocalStack, or even hosting some webinars, for example, like with Anton Babanca recently with the Terraform modules. So it's really nice, this way of interacting with folks and just also reaching out.
It's usually quite a nice conversation opener because a lot of people are aware of LocalStack, maybe have used it in the past and gained some experiences. Then we try to follow up, "Hey, have you used it recently? We actually have a team behind the company and come join us." So that's definitely quite helpful.
Benjie: It's really cool to talk to you about this because it seems like this is the ideal Open Source journey that you guys have been on and it seems like it's really exciting. This new pluggable stuff that you guys are working on is very exciting to me, I can see. Obviously I have a few biases here about running local environments in ephemeral environments, but it's super exciting. Let me ask you this question, if I wanted to start contributing, what's the best place to start if I want to be a contributor? Where should I go? You mentioned there's webinars, is there a monthly meeting or what's the best way to do this?
Waldemar: Yeah, so we have a newsletter that people can sign up to and we have the Slack channel where we also put out announcements about upcoming webinars and community events. I think we hosted the last one a couple of months ago, I believe it was in May. We're going to host one after the release.
We want to establish a cadence of every two months or so, getting together with community and just learning from them, having some presentations, use cases, learning how people are actually using LocalStack in different organizations, what the limits are.
Also from the contribution aspect, we've been doing a lot of refactoring of the codebase in the last six months or so where we've really prepared it for more scale, growing the team, and also enabling community contributions in an easier fashion. It's now much more standardized in terms of how you can plug in a new service provider, so it's a fairly nicely laid out, documented process how you can do that.
Especially with the extensions framework that we're now getting in place. I think this will be another boost to make contributions even easier and getting started with LocalStack from a depth perspective, really. So we definitely want to focus more on this experience there.
Benjie: Okay, so it's an Open Source project. Talk to us a little bit about the license that it uses and what you guys are doing, and are there any plans to change that? Are you happy with it? Was it a good choice to start with? Tell us a little bit about your license choice.
Waldemar: Yeah, sure. We're using an Apache 2.0 license, so fairly standard and quite popular in the community. It's a commercially friendly license, so as opposed to some other license models like the GPL license which are very restrictive in terms of how you can use the software. We deliberately chose a license that was very open and also commercially friendly for other people to use LocalStack in different settings so there is no real restrictions in terms of how it can be used, redistributed and extended.
Then we also obviously depend on a few third party systems that are all integrated with LocalStack, and we also there obviously make sure that they have a license that complies with what we do. Typically BSD or Apache are the main licenses we look for, so we don't intend to change that as I think it's been working well.
Also, if you've been following some of the Open Source projects like if you look at elastic search for example, or others, we had this issue that their software started being hosted by large cloud providers and then they had to change the licensing model. I don't think that's necessarily the case for us because if somebody came along and started a business around hosting LocalStack, they would basically compete with AWS and not with us so I don't think there's a big concern there.
So yeah, we just want to have a very open ecosystem and make sure that we maximize the adoption of LocalStack. That's really what we're after. Also as a company, we just want to make sure it gets out there and we establish this notion that local development is possible because there's still a lot of, let's say, opinion making to be made in the community also because there's a few people who are a bit skeptical, "This is impossible. What are the limitations?" And so on.
We demonstrate on a daily basis that it's possible and it's even superior in a lot of cases, rather than using the real cloud environment. That's really our mission and what we want to push forward.
Marc: Waldemar, I think you're right when you talk about Elastic or maybe hit a little bit closer to home with Min IO was emulating S3 APIs and they changed their license. So a lot of folks start to look at how do we get some assurance that we're going to be able to use this long term? There's foundations that exist, Apache Foundation that this podcast and Benjie and I both work a lot closely with the Kubernetes ecosystem and the CNCF Foundation.
Have you given any thoughts about making LocalStack an active Open Source project in one of these foundations so that larger organizations who might have been burned by other license changes in the past... not that they're not trusting you, but they don't have to completely just take your word for it.
They have the foundation that actually exists to ensure that the license is going to be consistent, that they're going to be able to continue to use it and it's ongoing community support if something happens.
Waldemar: Yeah, that's actually a very good point. We're definitely looking into a project like CNCF and see how we could fit into the landscape. It's very exciting to essentially observe what's happening there in the space and there's so many fantastic projects under the umbrella of CNCF, also Apache and others.
So far it's been mostly a prioritization problem that we haven't had too much time to look into it, but I think over time as the adoption of LocalStack grows and these requests will be more frequently coming, especially from larger players in the space, I think that's definitely something we'd be considering.
One thing I'm vaguely familiar with in CNCF, not all the parts of it, but it seems to me that a lot of it is based on essentially cloud agnostic, cross cloud stacks and currently we're so much tied to AWS that I think we also need to demonstrate this next iteration of opening up the platform for other cloud providers and I think that would be a great time to think about under which umbrella could we fit the best?
Marc: Yeah, that's great, I think that's a good, fair point and also being super pragmatic about not shutting the door on it but saying, "Look, if the demand is there in the future and it becomes a priority," you're willing to entertain that and have that conversation seriously and really consider it. I think that's a great answer.
Benjie: I just want to meet you at KubeCon so that's my priority. Just kidding. Well, Waldemar, you had mentioned there's some other cool features coming in V1 and we're not even going to release this until after V1 happens. So maybe give us a hint about a few more? Just give us a few more tidbits of exciting things coming in V1.
Waldemar: Yes, absolutely. One of the things we've been working on is a new IEM policy enforcement engine. Currently what's happening if you run LocalStack, the default configuration is that it's basically a Permit All system. So all the API calls are permitted, you're basically like a root user that can do all the API calls on the emulated APIs, so root like an AWS user.
What we're now introducing is an actual IM enforcement layer where we actually check very detailed policies and you can define your IEM roles, IEM users. Every request is being made either as an assumed role or a user, an actual role, and we're very finely in control over enforcing these IEM policies then.
Because that's actually one of the feedbacks that we got from our users, was that, "Hey, these local iterations are great. Sometimes they're hitting some barrier once they actually start deploying to the real cloud because you then get faced with all these IEM issues where you don't have access to your lambda functions, no access to your S3 bucket or other things."
That's something that we're really excited to also provide, an emulation layer. In addition we actually also take it from the other side and we're working on a mechanism to allow you to define policy, record IEM policies based on the API calls you are making. So that is a bit like observing what the user is... the types of requests the user is making and then looking at the requests and then coming up with policies that are an exact matchup for your use case.
We're actually venturing into some of these security simulation aspects even, so you can actually use these and iterate quite quickly with the IEM enforcement engine. The other part that's a bit more of a technicality but we're also introducing what we call multi account. Previously most of the requests were handled under one AWS account, just some synthetic IT.
Now you can actually really create new accounts in your LocalStack instance and do cross account requests, so that's another quite exciting feature that makes it even more realistic and increases the parity with AWS even further because you now have these multi account features, basically.
Benjie: Just that IM figurer outer seems like a product in and of itself.
Marc: Figurer outer?
Benjie: Actually that is the codename of something that we're working on at Shipyard, is the figurer outerer so I shouldn't be telling everyone. But that's fine, yeah, I love figurer outerers, that by itself sounds like an amazing product, standalone. That's really exciting. And all that stuff is going to be available in the community edition?
Waldemar: So partially community, partially in the pro version. IEM as you mentioned is almost like a separate product that we're going to release for the pro version. A lot of the other things that we talked about though are available for the community version. We also introduced a new file system hierarchy, so basically previously when you started up LocalStack you had to configure a bunch of mount points and other configuration flags.
This is now much simplified, we basically manage a volume for you where all the persistence state gets stored, so just making it much more like a seamless experience in terms of just getting started without any hassle. Then also what I mentioned before, this parity testing framework where we really roll it out to all the different services that we have to really make sure that we have the highest parity that we can achieve.
Also, we're going to publish the actual metrics of that so you will basically have a website that you can then see, "This API, with this service, with these API methods has this and this coverage in our tests." It's a much more detailed overview for our users in terms of what's supported and what we're maybe still working on.
Benjie: Wow, that's exciting. I haven't dove into LocalStack a little bit personally in a little while and there's a few things I'm going to be looking at very soon. All right, well, this has been great, Waldemar. Really appreciate the time, and really cool project. Really looking forward to seeing where this goes, and I'm also looking forward to having you back in a year and telling us about all the other clouds and all the other services, and all this other emulation, and just speeding up our CI pipelines because that is a real problem.
As Marc said, yeah, 45 minutes is not an exaggeration. I know multiple companies that have three, five hour CI problems. So if you're having that problem, you should check out LocalStack and in general it's just a pretty cool Open Source project. So thank you so much for coming on, and thank you for contributing to the community and leading it with such a pretty cool project.
Waldemar: Yeah, it's been great. Thanks so much for having me, and I'm also looking forward to reconnecting in a year from now and seeing what we've achieved in the meantime. Check it out, LocalStack, yeah, looking forward to getting your contributions. Thanks for having us here.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
The Right Track Ep. #12, Building Relationships in Data with Emilie Schario of Amplify Partners
In episode 12 of The Right Track, Stefania Olafsdottir speaks with Emilie Schario of Amplify Partners. Together they discuss...
The Right Track Ep. #10, Getting to Know Your Users with Boris Jabes of Census
In episode 10 of The Right Track, Stef speaks with Boris Jabes, CEO and Co-Founder of Census. They discuss the impact of SaaS on...
The Right Track Ep. #8, Defining the Data Scientist with Josh Wills of WeaveGrid
In episode 8 of The Right Track, Stef speaks with Josh Wills of WeaveGrid. They address common misconceptions about data and...