
Ep. #52, Beyond the Hyperscalers with Hugo Santos of Namespace
On episode 52 of The Kubelist Podcast, Marc Campbell and Benjie De Groot speak with Hugo Santos. They discuss the evolution of Namespace Labs, why traditional cloud infrastructure isn't always optimized for developer workflows, and what it takes to build a vertically integrated platform for builds, testing, and developer productivity. Hugo also shares lessons learned from nearly a decade at Google and years of startup building.
Hugo Santos is the Founder and CEO of Namespace Labs, a developer infrastructure company focused on accelerating software builds, testing, and developer workflows. Before founding Namespace, Hugo spent nearly nine years at Google, where he became a Principal Engineer working on large-scale infrastructure systems. Originally from Portugal, Hugo has spent his career building high-performance distributed systems, from networking research and startups to hyperscale cloud infrastructure.
transcript
Benjie De Groot: All right. Welcome back to another episode of Kubelist. This week we have Hugo Santos from Namespace Labs. Super excited to have him on. Let's dive in. So, Hugo, welcome.
Hugo Santos: Thanks for having me. It's great to be here.
Benjie: Well, we have been having a lot of folks in your space of late on the podcast and we're really excited to dive in. Before we talk too much about Namespace and what it does, we'd love to hear a little background on you.
Obviously you're an engineer, but where did you grow up? How did you get into computer stuff, development, all the things, and then kind of take us through the early parts of your career and then obviously we're going to dive into Namespace.
Hugo: Yeah, well and again, thanks for the opportunity. Yeah, I'm an engineer. Sometimes folks get a little bit surprised by that. I still write a lot of the code that we ship at Namespace. I started very early on.
I remember buying magazines that had small snippets of C before I even ever had access to a computer. And that was really kind of intriguing. They would explain Markov chains and kind of different types of text generation. And I thought that was really interesting.
And my mom, she went to work at the university near where I'm from. I'm originally from Portugal. And there I had access to a computer and that was kind of the start of it all. As soon as I could leave school I would just run to the university and just spend as much time there as I could. And it started with DOS and just kind of hacking through, kind of learning about the systems.
And then our upbringing was a little bit tough but eventually my mom realized how important technology and computers were for me and we managed to get a computer at home. And that was the end of my life as she saw it. Haha. I was 12 and I would just devour as much computer time as possible.
Benjie: What kind of computer? What was your first computer?
Hugo: Yeah, I think my first computer was actually a Pentium 586, and just kind of fairly standard, you know, just the cheapest we could find. And no internet.
Marc Campbell: It didn't exist, right?
Hugo: It did exist. I'm old but not that old. So Internet-- Haha.
Marc: Well, I mean it was not very accessible back then. I actually remember the era and I don't know, maybe we had Pentiums at different eras but like I was on like connecting to BBS's at the time, not to the Internet proper.
Benjie: I mean I had my 386SX, I didn't have the DX, I couldn't do any graphics stuff, but I did have like this old 14400 modem that would connect. I guess you guys didn't have America Online in Portugal. Was there a Portugal Online or like Prodigy or any of this stuff?
Hugo: There was a telco that was servicing the local market actually. I think in many ways Europe was ahead of the States in terms of bandwidth and services. I just didn't have it at the beginning.
And I remember actually saving floppy disks back at the university where I had access to more stuff and just bringing them back home to try out different things. And then the next Christmas and I probably was like 13, I got a US Robotics 33.6, which was a modem and I was so excited.
And then I get this modem out of the box and it has like an RJ11 plug on one end which kind of makes complete sense. But where I lived, our apartment didn't have any RJ11s. We had like an old school 3 prong plug for phones.
It was really kind of old school and I was so frustrated and I went to bed and next day I woke up and my mom found me tearing the plug out of the socket and I just started dialing, trying to dial because I didn't know, but kind of trying different combinations of the wires with RJ11. And eventually I got a dial tone and I was so excited.
Benjie: Oh my God.
And so like December 25th of some year, first time into the Internet and just the famous sounds coming out and I was so, so happy. And that was kind of the revolution.
Benjie: So to be clear, you hot wired your first modem.
Hugo: That's right, that's right. And it was kind of hanging out of the wall. My mom wasn't super happy about it. Eventually we got an electrician to come in and install like a RJ11 socket into the wall. But yeah, those were the first days.
Benjie: For the youngins out there, RJ11 is the equivalent of like a telephone plug. Kind of like an ethernet cable. It's like a smaller little ethernet connector. But we're definitely waxing about our childhoods right now. So you've now wired in or hot wired in your modem.
Hugo: Yeah.
Benjie: You have a Pentium, by the way. The only time I've ever heard a Pentium referred to as a 586 was in the movie Hackers. Do you remember that? They're like, "oh, I have the new 586."
Hugo: Yeah.
Benjie: I don't think it was ever referred to it other than that. And you're finding snippets of Markov chain C codes?
Marc: In C. In C.
Hugo: Yeah, that was what got me into computers.
Benjie: Where did you-- Were you walking down the street in Portugal and they're like, "oh, here's a Markov chain."
Hugo: You know, it was just like I found them in like regular magazine shops. Again, this is a pre Internet world. Like you would go out and--
Marc: Yeah, I remember. Like it wasn't Markov chains in C, but I remember typing code from a magazine into a computer early days too. Like that was how we did it.
Benjie: I mean I was on Commodore 64 was my first computer and I was like copying over some BASIC that someone had handwritten me to make a triangle.
Marc: Same thing.
Hugo: Yeah. It's equivalent. Equivalent. Like I'm sure there was some BASIC in there as well. It's just the thing that stuck with me was, was the fact that it was Markov chains and it was written in C. Like I still remember that page. I'm sure there were a lot, many other things over there.
But I started then very quickly kind of getting into coding. And because it's Europe, the first language that I used was actually Pascal. And only then after did I switch over to C. So I got kind of imprinted with C to get started. But that C only came back a little bit later.
Benjie: All right, so this is the foundation now. Let's fast forward a few years. You end up going to computer science school? Did you end up dropping out? What'd you do?
Hugo: Oh yeah, like well before that. So I got into coding very quickly. So I got Internet. Then I found Linux, I installed Linux. And a great thing about Linux was that you'd had access to the source code and I was into KDE and I started learning like C. Actually I already knew a little bit of C and then I started seeing these keywords like template and what the hell are these things?
So I started trying different things and I built my first piece of software for KDR. Because to install packages back then you had to do like slash, configure, you know, dot, slash, configure, make, make, install. And I was just so tired of doing that.
So I actually built a GUI that would embed a terminal to do that so that you could kind of one shot package installations. Then I got really fascinated with macOS and I ran for it feels like years but probably was like 3 months emulated macOS 8 on my computer and because I was really fascinated with Macs but I couldn't afford one.
And then I found another operating system called BeOS and that was my daily driver when I was 14 and I built a clone of a photo organizer that we had back in Windows called ACDSee. I built a BeOS version of that and I sold it when I was 15. So that was kind of my first foray into commercial software.
So I was already kind of very much, you know into building things. And then I went to college, I went into computer science and I didn't like it for various reasons because I was, I really enjoyed building and I wasn't really a very social person. And probably the fact that I had a computer early on didn't help with that.
Now in hindsight actually university is a great place to build your social skills. But I was much more focused on building things and being forced into all these social activities. I was very unhappy about it.
And then I found some folks, like some of the professors took me in and I started participating on a bunch of projects, writing code. I got really into networking, so IP routing, multicast mobile, IP, when I was like 19 or something, I was already writing a lot of software around that.
And yeah, those were the first few years. So in Portugal now, it's a little bit different, but back then. So Portugal has a system similar to the French. Like you have a licentiate and that. I'm probably not pronouncing that correctly, but you can think of it like as bachelor's and master's combined. So five years. And I stayed until the end, but I didn't graduate.
I had a fallout with my professor with whom I was kind of finishing up, like we have like a final project at the end of the year and I had a huge fallout because I felt like I was contributing so much for some of their research projects and they just weren't paying me enough. They were paying me too little and I had a huge fallout. So I decided to leave.
Benjie: So, Hugo, hold on, let's summarize this a little bit. Where did you go to school?
Hugo: In Portugal, in a small town called Aveiro, which has a strong computer science and networking school. And I was very much into networking, so I wanted to go there.
Benjie: I mean, all of these things are kind of leading up to Namespaces. It's petty clear. So you end up leaving school and then what was your first job?
Hugo: Yeah, so the story to leave school was actually in my last year. So in the fifth year, I found a website from a company called back then, it was called Arista. And they had like a quiz and I solved the quiz and I emailed them, said, I solved your quiz. And they said, oh, actually this is an interview we're going through. Do you want to come through our interview process?
And I went through their interview process. I was 21 or 22 and they made me a job offer and I was going to the U.S. but that was the first year where there was a H1B lottery and I failed the lottery basically. Like, I don't know, 50% of folks got through and I didn't.
That company then became a company called Arista Networks, which is kind of big company in the networking space. Like all of the hyperscalers use them, et cetera. I was going to be, I don't know, probably employee 20 or something. And that was very frustrating.
And then so that combined with the fallout with my kind of end of year professor, I left and I went to work with a few folks that I knew in Germany. I joined NEC's research lab, so NEC is a Japanese company, and they have networking research lab in Germany. And I just wanted to get out, you know, it was huge, huge kind of emotional moment not being able to join the other company.
So I went to Germany for a regular job in a research lab, which then I realized that it's not for me. Like, I'm a builder. I need to be able to ship things to people. And when you're working a research lab, it's fantastic in many different ways, but you're kind of building things for five years from now, or that was the case back then.
Benjie: All right, so you end up joining this research lab in Germany and then let's get to your professional career. What ends up happening?
Hugo: Yeah, so this research lab was a regular job. I was a software engineer. That was my first official job.
Benjie: So your first job in Germany as a research engineer at a lab?
Hugo: Yeah.
Benjie: The timeline and how quickly things are getting into people's hands are kind of not what you're looking for. But you are building.
Hugo: That's right.
Benjie: So now what? So after that German lab, is that when you end up going to Google or what ends up happening?
Hugo: No. So I was there for three years and yeah, I kind of realized that that wasn't the right environment for me. And I started on my little bit of a side project, working on something I was passionate about. It was kind of a combination of compute and networking.
And I was browsing TechCrunch and I found a few folks that were in Finland that they had just been pitching about something similar. And I cold emailed them saying, hey, I'm thinking about doing something in the same vein. And they said, well, let's meet up. We had a Skype call. So Skype was the thing back then. And we kind of vibed quite a bit.
And then I went to visit them in Finland and we said, let's build a company together. And next time that I flew to Finland, I was moving in. So that was like July 2010, I think.
So that was my first company. We were doing edge computing before edge computing was really a thing.
We were building kind of an application platform that allowed you to build more smartphone type of applications, although that's a huge stretch, using web technologies that would then be streamed to feature phones, to BlackBerry's, Nokias and other phones like that. So that was my first company.
Benjie: Yeah, what was that company called?
Hugo: Blaast, with two A's.
Benjie: Okay, so you do Blaast with your friends from Finland. Does that end up exiting or does that end up--?
Hugo: I did exit. Haha.
Benjie: Oh, you exited. Okay, so.
Hugo: Yeah, as in, I left because it was painful in many different ways. It was just really hard to build a business.
Benjie: Yeah, but you learned a lot from it. And it probably went into Namespace, which we want to get to pretty soon. So then did you go to Google or what? When did you end up at Google?
Hugo: Okay, the short version is I left the company, tried to do something else, ran out of money, had friends at Google saying come to Google. Went to Google, moved to Switzerland, was there nine years, became a principal engineer, built a lot of Google's infrastructure. You know, a lot of the ways of YouTube and photos and search and other applications are built.
Benjie: What year did you join Google? Because I know this is a big part.
Hugo: 2013. May 2013 is when I joined Google and then I left 2021.
Benjie: So you're at Google, you're building out crazy infrastructure stuff. And obviously the growth curve from 2013 to 2021 or 22 is nuts, I would guess. And you're working out of the Switzerland office, so that's interesting as well.
Hugo: Yeah, Switzerland wasn't really on my radar. I moved there to join Google. I'm a very kind of destination based person. Yeah, for sure, Google was growing a lot. But you're numbed down internally because you're kind of riding the wave and you don't really feel it too much.
I mean apart from, you know, crazy "requests per second" increases year over year, it's just a norm. Like you just keep designing and designing and designing and shipping, but it's Google, so you don't really feel it.
And then later on, in search, like it was incredible, like small experiments that you would run and you would just reach like 100 million people and with the blink of an eye. It's just such a fantastic distribution that then leaving is-- You become jealous of it, because it's very hard to reach so many people.
And so Then end of 2021 I was done with Google for many reasons and I wanted to go back to building and I started Namespace.
Benjie: So now we're at Namespace, which is exciting to talk about. Maybe we'll go back and talk about one or two of the Google scale problems you worked on. But let's dive in. What is Namespace?
Hugo: Yeah, Namespace. We build high performance and efficient developer infrastructure for humans and for agents. Anything that is related with code, whether it's your builds, your tests, any sort of verification. We are a kind of specialized cloud provider. We build software together with hardware that we deploy to build high performance services.
We've been lucky enough to have a lot of iconic companies kind of building on us. Folks like Ramp and Vanta and Verkada and Framer and many other companies. And that's been a super interesting journey.
It was not what we set ourselves to do when we started Namespace. We actually built what Namespace became, we built it for ourselves because after leaving Google and being so spoiled with fantastic infrastructure, we suffered a lot of pain points stepping into the industry and having to build or having to struggle through building infrastructure on our own. And then we realized that a lot of folks had the same issues. Engineers crave performance and so we went all in on it.
Marc: So you initially started building out some of the tech behind Namespace as the developer tooling that you were going to use to build the idea, but then you were like, "now this thing has legs." What was that idea? I'm actually just curious, what were you initially setting off to build?
Hugo: Yeah, we built something called Foundation. We actually still use it today to build Namespace. It's kind of inspired by something that we built back at Google. It's a service development platform that combines build, test and deploy. You build services and it kind of manages the whole kind of ecosystem of different tools for you.
It's kind of a combination of Bazel with Terraform and test containers. It's a little bit of amorphous thing. And we adopted Kubernetes very early on and we wanted to build kind of end to end tests. Like that's one of the things that foundation enables is to build end to end tests really really easily.
And we were just kind of staring at the screen seeing these Kubernetes clusters starting from scratch when we were using GCP even though GCP when we got started was already probably the best provider out there and we were just suffering with that. So it was a bit of a weekend project for myself.
I know how these components work so I knew that it doesn't actually have to take multiple minutes to start the Kubernetes cluster. So we built something to start Kubernetes clusters really quickly, which back then was still 30 seconds until the API server was ready. Nowadays, actually Namespace still has that capability and we do it in 10 seconds.
But that was the starting point. We built namespace for ourselves so that we could fan out end to end isolated Kubernetes clusters or Kubernetes based testing for Foundation. And as I was showing Foundation to potential leads, folks were asking about that like, "wait a minute, how did you guys just start a Kubernetes cluster so quickly?"
And that was a little bit of a light bulb went up and we realized that that was a much more interesting problem to solve than this fully packaged, "change how you're building software and it will be better," which is very painful. So we went down the other path altogether to something very frictionless that brought value, day one. And then it was kind of very natural journey.
Benjie: So Hugo, let's get into a little bit on the how you got done in 30 seconds. So you're using GKE, the hosted Kubernetes service from GCP. This is 2022. 2021?
Hugo: Yeah, 2022.
Benjie: Okay, so you're in 2022 and you're like hey, this is taking three or four minutes to get an API control plane, get some nodes up and running. So then you're like let's bootstrap our own Kubernetes clusters. How did you get them down to 30 seconds? Are you using K3s? Are you using vanilla Kubernetes? What were some of the technical things you did there?
Hugo: Yeah, so back then it was 30 seconds, now it's 10 seconds. I don't want to overstate our numbers. So what we did was we wanted full isolation. So we wanted confidence that when we were running a particular set of tests and those tests could even touch on some kind of more exotic features.
So we wanted full isolation and from there we knew that we wanted to have our own virtual machines. And then we just started by, or actually it was me, I started to iterate with some of the options to run virtual machines out there like KubeVirt and others. And they just start like a regular Linux distribution. And then you see, you know, spinning it, starting and then it goes detecting hardware and just a typical systemd type of initialization.
And I'm looking at it, I'm thinking, well I know what is the hardware, you don't need to scan it. I set up the hardware for the virtual machine and it was a lot of these things. It's kind of well intended software that is made for general purpose environments. But when you control everything, you can be much more deliberate.
So one of the first things that we did was just rip out systemd and we put it into our own. We kind of built our own initialization system, fully declarative as well. Like it gets its own configuration injected and then it started a K3s based control plane. But it did that very quickly.
And then we also kind of realized, and I'm sure like especially nowadays a lot of folks have realized that, but back then it was still fairly new, that after you get things to start fast, you're spending time actually getting the software to get into the machine. And we are already running kind of many tests in parallel that don't fit into a single machine.
So we started building kind of systems to get the software prepared in a way that we would also start very quickly. So if you do anything related with Docker, you do like Docker Run, Docker Pull, you'll get not just the layers downloaded, but you also need to unpack them. So you'll spend a lot of time on that.
So we started kind of iterating on these components to just get a lot more things ready ahead of time. In a way that was done also just in time. So very much like a tracing compiler on a JavaScript engine. It sees it once and then it kind of learns, "well you want to do this. So I'm going to run an optimization pass so that next time that you do it you don't have to go and pull things from scratch."
And so we built these compute primitives which originally were driven by Kubernetes, but we realized they were generic. So very quickly, like in the first month or something, we generalized them and we built like a generic compute platform.
Marc: Cool. That totally makes sense by the way, like the idea of taking this thing that's general purpose built with a bunch of abstractions and it's designed to kind of work on various hardware. And you're saying I'm always going to run it on this hardware. I'm going to remove this discovery layer, I'm going to move this configuration, I'm going to pre configure it, I'm going to bake this image in.
We've done stuff like that too. But like the intended behavior of these clusters was for dev and test, right? Or were you also accepting folks running it in prod as production grade infrastructure?
Hugo: Yeah, that's right.
So we very deliberately, and still to this day, have focused on development and testing, with a twist that we kind of realize that development testing is really about anything around code. And what agents do with code, the lines between development and testing and productions get blurred.
But yeah, originally we focused on development and testing and we actually had a lot of folks asking us can we use this for production? And even though we built our systems completely production ready, like to be run 24/7, we have on call, like that's been there from day one--
We wanted to not do production so that we would remain product focused. So this wasn't so much about whether we knew how to build it or not or whether the infrastructure would support it, was more what other features should we build in.
Because when you start doing production then you want to. We had folks asking us about, you know, Cilium and all these other capabilities that it wasn't super interesting for the majority of the use cases that we had. But things like, well, before I can run a container, I need to build it, so how do I build containers at scale?
Like that was an interesting problem. So that was kind of immediately the next problem that we went after.
Marc: Yeah, I mean, and I think that also kind of helps you eliminate a category of problems. Right? So like production grade Kubernetes has, you know, I don't know, like this is a question like, you know, when you're building for dev and for test, you don't have to think about as much high availability and resilience and fault and you can kind of take some functionality of Kubernetes and not optimize for that.
And I'm curious if you were able to like, if you can think of anything that you actually were able to say, "oh, we kind of ignored this part of Kubernetes and we were able to get this huge performance win out of it."
Hugo: Yeah, well, not Kubernetes in specific, but generally early on we dabbled with suspend and resume and kind of live migration of virtual machines. And that kind of brings in non trivial costs because you need to do memory tracking, you need to think about like what kind of pauses you're including.
And because we could focus on development and testing, which are much more ephemeral. Like you would run these for some time, but not for days, months and years. We said, well, we're not going to do live migration. So the first version of Namespace you would get a, well still called an instance. So an instance is virtual machine plus policies, plus some prepackaged software that you provide, but you get assigned to a machine and you run there, but then you get all the performance out of it.
And I think still today we, I was just checking some benchmarks recently and I think we still beat folks on CPU and iOps because it's kind of non compromising in that way. But if I would be building, if I would be on GKE team, I would want my nodes to be as stiff as possible and if I have a physical machine failure or if I want to do an underlying kind of hypervisor update, I want to move things over.
Benjie: Did you guys do any forking of QEMU or did you do any-- Like you've talked kind of high level about this stuff. Let's get into some technical stuff. What were some tangible things that you, you guys built early on to, to speed this up?
Hugo: Yeah. So the hypervisor that we started with was Firecracker. Well, today a fan favorite, back then it was still fairly obscure. I think we might even have started before 1.0. And Firecracker itself was already very good. So a lot of the things that we did was around it.
So Firecracker is very simple. Like you boot a virtual machine from a file and or you know the networking that you have available is just on a tap device. So we built a lot of the infrastructure around it like this just in time optimizer that I was referring to that kind of goes and prepares disk images so that they're mountable into a virtual machine kind of almost instantaneously that was running outside.
And so a lot of our development was done around storage and networking, and compute stayed fairly bare bones for quite a little bit of time. So it was Firecracker--
Benjie: What did you have to build on the networking side? What did you guys build there?
Hugo: So one of the things that we do is you can use Namespace, like our building blocks, to do previews, an area that you'll be familiar with. And one of the things that we did was-- I was really annoyed that a lot of the solutions out of the box would kind of interrupt your connection at the startup.
So you boot up a new container somewhere and it's still starting and you try to open the URL and for a period of time it will fail because the thing that you're trying to access is not there yet. So that was something that annoyed me a lot. And we built our own Ingress layer that does buffering and it does kind of just in time routing as well, to any of the regions that we're running.
So you'd get a URL immediately. So that was also kind of very early on feature and you could have both kind of HTTP or GRPC or TCP kind of behind it and it would kind of understand if the thing was still booting, if it was not ready would kind of hold the connection.
So we focus a lot on the user experience. We want it to work the way that it should.
Benjie: Yeah, like seamless. So on the technical side though, you were talking about, that you were proxying stuff essentially, was this NGINX, Defork or do you write your own Go service?
Hugo: No, we built all of this ourselves.
Benjie: So these are like Go services that are like proxying traffic.
Hugo: Yeah, we are a big Go shop. So everything that we do today is in Go. So we have our own network fabric, including Ingress, but also Egress that is built in Go. It sees all the packets that come in.
Benjie: Is this sitting as like a sidecar in each one or is it DaemonAet? How does that work?
Hugo: Yeah, so our architecture-- We care a lot about high availability and we're deployed across many regions across the world. So we lean on a cell based architecture where there's kind of full kind of replicas that are homogeneous from an architecture perspective that get deployed to every single region and they mesh so they can talk with each other.
So you can enter any of our regions and reach any compute that is running in any of our regions. And that's done kind of transparently. But the ingress layer, it is separated on a data plane, a control plane. So the data plane never goes down. It's just kind of proxying things.
And the control plane is kind of learning about which certificates we need to generate and which new computer is running, which new regions exist, et cetera. It's actually fully programmable. You can even inject things like "hey, I need authentication for this path" and it will send you out to a customer provided authentication system.
So that is kind of replicated everywhere. And then next to each instance we also adopted a model that was, I don't know how common it is nowadays, but back then it was also very uncommon where each instance is wrapped by a trusted process which we call the host.
And the host runs the guest and the guest cannot escape. It always has to go through the host. And each host is instantiated specifically for that workload. So that allows it to provide authenticated services for that particular run. And that's also how we achieve multi tenancy.
Benjie: Okay, cool. So I think we kind of skipped over a big thing. What is Namespace? So we talked about kind of the origin story, we've talked about some of the tech underneath and I think we're going to get back to that shortly.
But what's a 30-second pitch for what Namespace did in 2022 and then what's a 30-second pitch of what you guys are doing now? And then we're going to kind of fill in the gaps.
Hugo: Yeah. So we built kind of an accelerator for anything that a developer would do. So whether it's you're running in your machine and doing builds that might be Go builds or Bazel builds we built transparent backends for all of the popular tool chains or most of the popular tool chains that are backed by our compute, which is kind of best in class for developer workflow.
We run our own hardware, we're vertically integrated and we realized how important the CPU and the storage and the networking and memory as well available to the physical nodes, how important it was from a performance perspective and we invested heavily into that.
So you can run builds, tests. That was kind of 2022. Sorry, that was 2023 because 2022 we were building foundation and then we pivoted in 2023 so that was kind of the original product and we were kind of specialized cloud provider and then we had application integration.
So you can run your GitHub actions, your GitLab, your BuildKite on namespace. Actually if you use BuildKite's product like their hosted agents, that's Namespace under the covers, that's provided by our infrastructure.
Benjie: So 2022, 2023 you build out these building blocks, these foundational-- Haha, Foundation is the name of your thing. And this is not open source, right? This is all closed source, correct?
Hugo: Foundation was open source. It still is. You can go to NamespaceLabs/foundation on GitHub and you'll see it, we still use it today. But Namespace itself is a fully managed service.
Benjie: But what were your first few customers like? What were they, very specifically? Are you like a white label thing for CI companies or are you directly getting customers?
Hugo: No, we are directly getting customers. The first customers were doing self hosted GitHub runners on AWS and GCP. Actually I think our first customer was actually on GCP and we had an integration with GitHub actions like you could run GitHub actions on Namespace and they saw like dramatic performance improvements, but not just because of the hardware.
It's also because of something fairly unique that we built called cache volumes, which is kind of a snapshotting system. Like you get a lot of incrementality on Namespace with basically no work.
Marc: Is that built on top of any cloud provider tech, or is that something that you built, cloud agnostic?
Hugo: That's built in house, running in our servers. There's no hyperscaler involved.
Marc: Okay, is it like super proprietary or do you want to talk about how you actually were able to get that performance?
Hugo: There's a magic sauce in it. But maybe just to kind of give you an idea of what it does: In Namespace, you can attach a volume, so a file system to an instance. It could be a GitHub job, could be a Docker build anything and anything that you write is into that volume, which is just an Ext4, you can then use at zero cost in subsequent jobs.
So that job completes whatever you wrote. You can now run a thousand jobs that use the same content.
Marc: In the same pipeline, the same workflow execution? Or like, literally, next week?
Hugo: Yeah, next week if you want. What ends up happening is that it tends to be used immediately because teams are just running things all the time. And if they do like some builds now and then the drops pass. And so next time around the second PR that they send, they kind of build on the incremental work that was done on the first one.
And we configure toolchains, whether it's the Rust toolchain or the Go toolchain or Java, like a bunch of different tool chains to use this cache volume. Or git. Many of our customers, they have Git checkouts of many minutes. So we also kind of snapshot Git state into a cache volume.
So there's this happy path at Namespace where most of the things that you're doing are incremental. They're just building on state that you have from earlier runs. And the reason why we can do that is because we are vertically integrated, because we understand where data is in our fleet and we have control over where a customer workload runs.
Like, as in, in which rack and in which machine. And we use that to our advantage.
Marc: At what point did you realize you're going to have to buy hardware, rack and stack and set this up and you weren't going to be able to run it on one of the cloud providers?
Hugo: Yeah, very good question. We actually started with AWS bare metal machines, but AWS has very few machines that have NVMEs. So that was kind of the first thing that we realized that we needed to have local storage. Like builds, they need iOps.
Benjie: Like high performance local storage.
Huge: Yes.
Benjie: And so the bare metal AWS instances do not have very good hard disks, typically.
Hugo: Well, there were instances with local NVME, but very few. Very few. But that was our starting point. And that was myself and a couple other folks in the company. We're massive geeks. And we had Threadripper workstations and we just do a lot of tuning and we kind of understood the value of different parts of the system.
And kind of measuring, we knew that iOps was extremely important. So we started Namespace like when it transitioned from kind of side project to the main thing on AWS bare metal. But then after we had a lot of iOps, we kind of started to realize that CPUs were also very important. And that kind of came from our kind of Threadripper experiments.
We first went to Equinix. Equinix Bare Metal or Packet because they had an interesting startup program and that was kind of the main reason. And we kind of got in managed to get a little bit of scale with them.
So this was still 2023, but their machines were actually fantastic APIs, but their machines were just old and we just couldn't build a great product with them. And then we found Hetzner, as many people do. And we were at Hetzner for a little bit of time.
We hit all kinds of interesting challenges at a certain level of scale.
Marc: Hetzner, you're not talking like Hetzner cloud US, you're talking like Hetzner, like Bare Metal, Helsinki and like that data center that they have over there?
Hugo: Yeah.
Marc: Okay.
Benjie: What were the Hetzner-- Give us an example of a few of those bottlenecks at scale.
Hugo: I must preface that Hetzner is fantastic. They provide a great service, very cost effective. But I think we are not the right shape of customer for them because we started to realize that the network fabric--
So we had a problem. So first was iOps, then compute, then networking. And then we kind of started to realize that the network fabric mattered a lot. And in Hetzner, if you go, you know, get off the shelf things, you can get one gigabit NIC, which is nothing, but you can type in their text box, like, please add a 10 gigabit NIC to my machine. It's kind of a hidden feature. And then we kind of realized 10 gigabit was not enough.
And we also started to see kind of east west types of challenges of how much bandwidth we were pulling between nodes. And so we wanted to build our own rack. And we tried to do that with them, but it's just not their thing. It was just too complicated. I think they're happy having customers having a few machines and that's it. And we just had a lot of machines.
And then we also had geographical expansion and back then there weren't really--like, Latitude wasn't the thing. A bunch of folks are on Latitude nowadays, which is great folks. I recommend them. So when we went to the U.S. that was our first bare metal deployment that we built.
So that was late 2023. And nowadays we're 99%. Everything is like literally cabinets that we own with hardware that we own, that we manage. Like the only thing that we don't manage is electricity and cooling.
Marc: That's cool. Yeah. I mean it's funny because at Replicated we've built a little bit of a tool that allows customers to spin up kind of customer representative Kubernetes environments. And I can tell you we started like that journey that you described. Like it's the happy path that everybody-- Equinix, and then we're like, ah, it's a little bit expensive. Oh, and also they're sunsetting the product. But we were happy with it.
AWS, bare metal, hard to get, you know, the infrastructure really expensive. Move over to Hetzner cloud in Europe and then you're like, yeah, that gigabit NIC is a problem. Oh, 10 gig. But then they dropped the firewall and happy that's actually where we still are running most of it. But then you end up with people complaining about latency from San Francisco to Helsinki or wherever, or you know, Germany or wherever, you're actually running the servers.
And then like, I think you're right that the only end result is what you've taken, what Cloudflare has taken, what everybody else has taken, which is racking and stacking servers and colos and data centers and being like, we're just going to do this ourselves and we can actually--
I've said this before, I think like the cloud providers have scared a lot of us into thinking that we can't run Postgres, we can't install Linux, we can't do this. We need them to do all these things. And it's not true. And you've been around long enough to know that you were installing Linux and you were dot, slash, configure and you know, make, make, install and things like this.
And like doing it again. It's probably freeing because you're like I need four NICs and I need whatever. I'm going to like get this thing that I actually need. I'm going to get this hardware the way I need, I'm going to co locate it with what I need. You know, I'm not going to pay bandwidth from server A to server B because guess what, they're in a rack together and it'd be silly to pay somebody Egress on this.
Hugo: Yeah, that's right. But I like to be nuanced.
Benjie: Hugo wait. I wanna pause for one second. Talk to me about how you got demand for this service, for like a 30 second version of it. Like, because a lot of people are listening to this and they're like okay, this is amazing about this infrastructure, but there's the other side to it. Like why did you need this? Like from a business standpoint.
So you guys kind of were like, "hey, replace GitHub Runners," at the early version of it and then how did that expand, how big a customer base did you get when that became a problem? And you started to need to build your own servers. I mean throughout this journey you obviously went to different, you went to different things.
Hugo: Yeah, and earlier you were asking about like the difference between 2023 and now.
One thing that is not different is kind of the demand for performance has not gone down. And that was the starting point.
Like every single company that ended up working with us, they just, they're fast growing, they're tech companies where they live and breathe by building software and their teams are constrained, they were back then constrained by CI. Today they're also constrained by CI. But that was 100% kind of the driving factor.
So just to put things into perspective, like when we started there was another company which we only learned about later on called BuildJet because the options that you would have were either you're on GitHub and on Azure servers, which nowadays are a little bit better, but they traditionally struggle. And it's not just GitHub, like generally the hyperscalers, they're set up for a different type of workload.
That's kind of the fundamental problem is that as a hyperscaler you tend to optimize for density or you used to optimize for density. So the way that you plan your hardware is completely different and for anything around building and testing, you want as much peak performance as possible. So we kind of shifted the story very strongly into performance.
And folks, it was like anyone that would kind of get to know about us and tried a product we actually had and our numbers are still extremely good, but we had kind of north of 80% conversion. Like if you actually try the product you just stay and stick. Because engineers love performance and we brought them a lot of performance.
It was a repeated pattern that we had folks coming in with 30-minutes CI runs that they brought them down to 4 minutes, 5 minutes on Namespace. And this is not like, you know, you see some numbers out there like 40 times faster because it's like a fully cached Docker build or you're doing, you know, emulated Linux ARM64 to non emulated. That's not what I'm talking about.
Like it's real numbers consistently. And we saw that kind of over, over and over. So the first customers that we got was through word of mouth, like people that love the product. The first few folks we knew within our network and then they were kind enough to start bringing us to others because they saw that there was kind of unmatched performance in the market and they didn't have to self manage. Right?
That was also kind of a big thing that a lot of folks were trying to do for performance reasons. But they still didn't meet the type of performance that they saw at Namespace.
Benjie: So I mean obviously the caching builds and you know, not emulating is a huge increase in performance. So what you're saying is, is you just had like super high end CPUs and memory. So this is like Bazel builds and stuff like that. A lot of people listen to us, probably are only really familiar with Docker.
Hugo: Sure.
Benjie: Give us one or two things that your hardware and software sped up in a traditional build system with like a Maven or a Bazel or a Gradle or whatever.
Hugo: By the way, you don't need Bazel to make use of Namespace. Like if you're a Go shop, like remember we built Namespace for ourselves and we're a big Go shop. Most of our tests at Namespace they place the Go mod cache and the Go build cache, so the kind of the intermediate Go objects, in the cache volume, so this snapshotting system that we have.
And that means that a cached run, like if you do a few changes in some files and you kind of go test everything where you have to kind of Build and test everything, It's just a few seconds and we have a very large code base.
Benjie: So this caching of the artifact thing that you guys have, the snapshotting system really is probably the core differentiator for speed. And then you guys have just been building out the hardware around that to get it faster and faster. Is that kind of the way to understand the evolution of Namespace?
Hugo: It's kind of two dimensions. Like you want to compress time by reusing, so cache volumes get reusable, so you don't have to go and do work that is a waste and people just have to wait. But then when you have a cache miss you want to have very high performance. So we use a combination of both.
That's why cache volumes together with hardware. Like from early on we would only deploy or we still to this day we only deploy the highest performance AMD CPUs in the market. We measure and we know that you get the best single core performance. So those are for those cache misses.
And we also Deploy Apple M4s and M5s for both macOS and for Linux ARM64. So also there we have extremely good performance. So it's just kind of part of our culture.
Benjie: Do you have like Mac Minis sitting in a Rack or something or how do you guys do the Mac stuff?
Hugo: That's a little bit more special sauce, but yeah. So there are definitely Mac Minis in kind of real data centers, many of them. But there are no M5 Mac minis. But we have a fairly large fleet of M5 as well.
Benjie: I like the vagueness of that answer. Okay, so you're doing something interesting there and that's totally okay. We'll find out what that is one day.
The rack and stack systems that you are building, those AMD systems, I have been following you guys so I know that I think you're on version three of those I believe. Talk about the specifics of the hardware and the networking and the hard drive stuff, just real quick, what the third generation machine looks like for you guys sitting in your data centers.
Hugo: It's a good question. So the third generation is less about this individual machine, is more about the rack. Because you have a certain amount of network capacity within a rack and then you have kind of capacity across racks. So kind of east, west. And then you have capacity in and out of a cluster. So north, south.
And we focus a lot on performance. And within a rack your switches, I mean depending on the switches that you use, but you'll have like terabits of switching capacity but you don't have terabits of you know, east west bandwidth.
So you need to take those things into account when you're building high performance systems. So it's more kind of the composition. Like there's a certain amount of storage capacity that goes into the rack, there's a certain amount of worker capacity that goes into the rack as well. And then control plane stuff which is not as bandwidth intensive lives kind of separately.
The first rack that we had, like version one, we didn't really think about any of this. It was just place things where they fit and then you need to start thinking about your power budget, your thermal budget, your networking budget and how all of these kind of different pieces work together.
And then that just comes through experimentation. Like there's different designs. Like folks, like traditionally hyperscalers, they focus a lot more on disaggregation but we don't like, we like local capacity a lot. So that goes to then the individual server design.
Benjie: Right. So what is that individual server design as it stands today?
Hugo: Well that's a hard learned lesson and there's a lot of folks that are trying to get very quickly into the business and I think it's good to that you go on your own journey of learning. Haha. But at the end of the day it's just finding the ratios that work for you. Like one CPU thread, how much memory do you have, how much network capacity do you have, how much iOps do you have? And just to find the right balance and the balance is important for the unit economics.
We think a lot about this. We want to be cost effective so that we can forward that efficiency back to our customers. That that was actually something that we didn't do at the beginning of Namespace and it became more of a thing towards the end of 2024 where our customers were larger and larger and many of them kind of growing with us.
Like we have customers that started on five figures and landed on seven figures in terms of growth. So they appreciate our focus on efficiency and for us to be able to be kind of cost efficient you need to find the ratios or else you have underutilization on your data center. Like you're paying for power that you don't use.
Benjie: So capacity planning is a huge thing for you?
Hugo: Yeah, capacity planning is a huge thing. Yeah.
Benjie: Okay. In regards to how many data centers are you coloed in, if it's appropriate to share, and maybe give us a crazy number. Something that's super interesting that you're like, I can't believe we're doing this much of this a day.
Hugo: So in terms of geographies, like, we first focused on only one because we wanted to make use of kind of the efficiencies of being able to reach higher volume within a particular jurisdiction. And nowadays we serve three different markets in the US.
So we have a presence, you know, very close to Amazon, but not Amazon, as kind of US-East-1. It's actually interesting, like Amazon had this outage some time ago and many of our customers told us like we could continue to build and test and ship product, but we couldn't serve it, which got an interesting kind of data point.
So we have something east coast, we have something a little bit more central in the States. And then we started having, for a new product that we build out now called Dev Boxes, which serves the agent use case and latency matters a little bit more there. So we also have a presence in the west coast because we want to be close to our customers and then we have a presence in Europe.
But we try to build-- We don't follow the kind of the Cloudflare model of like mini box because our thing is all about density and performance within this particular space.
Benjie: At a typical colo, how many racks are you filling? If that's appropriate to answer.
Hugo: Yeah, that's a state secret. It's not enough for a whole data center, but we start to be kind of big enough that-- Well, let me put it like this. In one of the buildings that we're in, in Virginia, we just contracted out the last energy that they had available. So we're kind of big enough.
Benjie: So growth is good. Growth is good. We're running low on time here, but let's kind of talk a little bit about Devboxes and kind of the new stuff that you're doing with these primitives you have.
Hugo: Yeah.
Benjie: I take it the Devbox is kind of like a sandbox type setup for folks, is that right?
Hugo: Yeah, like we've always had sandboxes, like from day one. Like an instance is the sandbox, like what a lot of folks are doing with sandboxes was something we did from very early on. What the Devbox is, it brings a few services out of the box, things that you would care about. Like you want to be able to capture the screen, you want to be able to run an iOS simulator, you want to be able to connect an IDE.
So it's a piece of compute that is either ephemeral or persistent, like you can decide, that is kind of modeled for developers and for agents. And the ways that folks are using it is we have a lot of customers that are not running agents on their machines anymore. They run them on Devboxes and there you have both the agent and what the agent is doing is running on a Devbox.
It makes use of our high performance and it also makes use of our security capabilities so that you can run without constraints. We have folks that are using Devboxes to scale out, so to run many shards of tests in particular, like they spinning up a Devbox ephemerally and then just shipping the intent of running those tests on demand into them.
And then they make use of the fact that the Devbox is kind of warm. So it's kind of for agent sessions as well, but also for developer sessions. They're just kind of much, much faster to iterate on. And now we have folks that are using Devboxes and especially on the mobile side, which is kind of a big segment for us where they have agents that use tools on Devboxes to kind of run mobile applications.
So that's also kind of a thing. So we kind of covered the full spectrum. So it might be, you might be a developer, like I use Devboxes to run agents, like me as a human, but then I also spawn Devboxes on demand to run many of our tests in parallel.
And what we're seeing is that I think directionally CI and this development loop with agents will collapse and become the same thing. And that's what we're kind of marching towards.
Benjie: That's really interesting. I want to know some number that was small in 2022 and some number that's giant in 2026 that you're comfortable sharing.
It sounds like you guys are scaling. It sounds like if you're going from, "hey, a piece of software that does some cool caching stuff and snapshotting," to "we're racking and stacking our own servers across three data centers, buying out all the energy capacity of this Virginia one, causing Amazon's servers to overheat." Or maybe that wasn't you.
Give us something that makes you excited that you're comfortable sharing. from a numbers thing, just a quick one.
Hugo: This will feel like a cop out, but it's not. I think the thing that I'm humbled by is just the breadth of our customers. We started with a few startups that were crazy enough to take a bet on us to have customers like Ramp or Vanta. Or Verkada or Framer and feeling like we're an extension of their team in many ways.
That's something that we also did very deliberately early on that we knew that it's not just about the technology and the product, but it's also about the user experience that your customer gets. So we invest a lot in being kind of a trusted customer.
I think one crazy number is on the revenue side. We have customers that started early five figures and nowadays are seven figures in the space of like 18 months.
So that's something that really gives us a little bit of pause and just shows the scale at which some companies operate and the growth that the overall space is seeing. Like, there's a tremendous amount of demand for anything related with code and compute. And it's something that we've been doing for some time. So we've kind of become experts at it. And we're humbled by the scale that we're seeing.
Benjie: So the coding agent stuff has been a big tailwind for you guys, where a lot of build systems, like scaling out your CI, scaling out your CD, scaling out your build systems, whatever, you guys are getting the benefit of that from, clearly from a growth perspective. Or is it just more people trusting you guys for their normal workflows? Or is it a combination of the two?
Hugo: I think it's a combination of both. There's definitely a huge tailwind with agents. Like there's a lot more software that needs to be built and tested. But at the same time, if you maintain the same sort of kind of exponential curve that you see in usage with cost, it just makes people very nervous.
So what we try to do is to make sure that that cost goes sublinear so that it doesn't kind of blow up in the same, the same sense. And I think that's where we've seen a lot of love from our customers, that we try to remain very, very efficient. But we also have a lot of new folks coming in.
One of the interesting stories is that we run our own IP space because we do our own networking. We really do kind of do everything. And we have a customer that one day saw one of our IP addresses and was running a whois on it and kind of saw like our name who are these Namespace guys?
So they went to kind of research and came over and became a customer because of that. So we love the craft and we go very deep all the way from the hardware to the software layers. The things that we run that are kind of available out there. We're big Clickhouse and Postgres fans, so we run them.
Obviously, we also still love Kubernetes. It doesn't work work in all of our use cases, but we still love Kubernetes and I think those are the three main things that come to mind as software that we run at scale that we haven't built ourselves.
Benjie: Well, thank you for coming on. I think we need to wrap up here. The thing that I am going to take away from this is that you went from hot wiring your modem into your wall to now--
Marc: Hot wiring data centers. Haha.
Benjie: To now racking and stacking data centers. So I think you have a consistent thing going on in your life and it seems like it's working really well. So I encourage you to keep doing it and we look forward to hearing how you guys persist. Thank you for coming. Thanks for sharing all these stories. It was great.
Hugo: Thank you so much for having me. It was a lot of fun. I love talking about this stuff.
Content from the Library
The Kubelist Podcast Ep. #51, CI Is the New Bottleneck with Kyle Galbraith
On episode 51 of The Kubelist Podcast, Marc Campbell and Benjie De Groot sit down with Kyle Galbraith. Kyle shares the story...
The Kubelist Podcast Ep. #50, Building Sandboxes for AI Agents with Ivan Burazin
On episode 50 of The Kubelist Podcast, Marc Campbell and Benjie De Groot sit down with Ivan Burazin to explore the rise of...
Open Source Ready Ep. #33, Retiring Ingress NGINX with James Strong & Marco Ebert
On episode 33 of Open Source Ready, Brian Douglas and John McBride sit down with James Strong and Marco Ebert. They discuss the...
