Ep. #35, Database Software with Karthik Ranganathan of Yugabyte
In episode 35 of EnterpriseReady, Grant speaks with Karthik Ranganathan of Yugabyte. They discuss Karthik’s enterprise journey, the fruits of maintaining a customer-driven ethos in business, and building modern databases.
Karthik Ranganathan is the Founder & CTO of Yugabyte, the company behind the open source YugabyteDB, a high-performance distributed SQL database for global, internet-scale applications. He previously held technical positions at Nutanix, Facebook, and Microsoft.
In episode 35 of EnterpriseReady, Grant speaks with Karthik Ranganathan of Yugabyte. They discuss Karthik’s enterprise journey, the fruits of maintaining a customer-driven ethos in business, and building modern databases.
transcript
Grant Miller: All right, Karthik, thanks so much for joining.
Karthik Ranganathan: Thank you for having me.
Grant: Cool. Let's dive right in. Tell us a bit about your background, maybe kind of how you got into enterprise software.
Karthik: Absolutely. Right before Yugabyte, I'll go in reverse chronological.
Grant: Sure, that sounds good.
Karthik: All right. So, right before Yugabyte, I was at Nutanix.
For the folks who don't know, Nutanix is a distributed storage company.
So, they've started out building effectively the equivalent of Elastic Block Store, EBS, in Amazon, but for the private cloud.
And then subsequently went into building the EBS plus EC2 combo, which effectively makes you create a public cloud-like deployment in the private data centers really quickly.
Grant: That was hardware too, right? So, kind of combination?
Karthik: Initially, it was actually all software bundled with hardware.
So, the whole thing was sold together, but pretty soon, it switched to a software-only mode.
So, effectively, Nutanix is a software company.
Grant: So, it started as both, and then now it's more of just software?
Karthik: That's right. It's distributed data.
So, I was a part of the engineering team, worked on some of the core data problems such as deduplication of data and erasure coding and a number of these core data technologies for storage.
I also was fortunate enough to be involved with the sales and the marketing side at Nutanix.
Nutanix is an enterprise company, it was quite small when I joined, but it's really big when I left.
I was there from about 2013 through 2016. And it IPO'd shortly after, like after I left at least.
So, it was great learning from the enterprise building side.
Anyways, we'll get I guess more to that a little later.
Before that, I was at Facebook for about six years.
So, this was from 2007 to about 2013, and was fortunate enough, again, in Facebook to have seen growth from about 50 million users, give or take.
I remember thinking how much more is the thing going to grow back then to about a billion or two.
So apparently, there's a lot of people.
Grant: Yeah, a lot of room for growth. Yeah.
Karthik: Yep. And there, I worked on a lot of databases actually.
Worked on Apache Cassandra, but before it was open sourced or called Cassandra.
In fact, I didn't know back then it would get that big, but I had the fortune of giving it its name.
Grant: That's cool.
Karthik: Yeah. And after that, worked on Apache HBase.
I'm still an HBase committer and worked on it for really massive scale use cases at Facebook, billions of ops, hundreds of petabytes of data, all online and so on.
And before that, I was at Microsoft.
Again, world class engineering team, worked on the networking stack and Windows Vista.
So, it's distributed software, but of a different kind.
Grant: Okay, cool. So, you've had a pretty distinguished career in storage and databases and sort of very technical problems here.
Karthik: That's right. Yeah, fortunate I would say. Yes.
Grant: That's great. Okay. So, let's go back.
Let's start with Facebook, because I think it's interesting.
When we think about a lot of enterprise software companies, they often have come out of these projects and things that helped some of the "web scale" companies really grow.
For you, it sounds like both Cassandra and HBase were projects that you were pretty involved in.
I mean, Cassandra, one. Did you think that eventually, DataStax and these other companies would sort of form around it?
Or it was just like a thing that you were working on, you had ...
Karthik: No idea. No idea. I mean, the problem was interesting.
What we did was novel.
I mean, we kind of take open source and enterprise companies and all of that for granted today, but this was back in 2007. Open source was not a thing. I mean, open source was known, but not too many people used open source software back then.
And specifically, not too many people used it for infrastructure.
People would be like, "I wouldn't take something somebody did for free in their spare time and put it to power my most critical thing. No way."
And the only open source thing that was used by people was Linux, and due with the backing of Red Hat most often.
The other thing that kind of had gained a lot of popularity and usage was the database, the open source RDBMS databases, MySQL, and Postgres.
But you would see, that also was a clean split.
The traditional enterprises and anything that was deemed mission critical would always go to Oracle and SQL Server. And you would see that the New Age web apps where people couldn't actually afford to pay Oracle or SQL Server the amount of money to run these things, because they were deemed less mission critical, at least back then, would go to Postgres or MySQL.
So, the open source revolution was kind of brewing at that time.
MySQL and Postgres couldn't scale. In fact, Facebook's own tier of critical data was on MySQL at that time.
So, Facebook was a pioneer in that sense, as were a lot of the top tech companies, the big giants that we see today.
The traditional enterprises, they were just getting started with MySQL.
Now, in the middle of all these forces, we wanted to--
We stored our Facebook user data in a sharded MySQL tier, and we had cutting edge engineers running it and building it.
And so, we had the technical firepower to do it.
But however, when we wanted to do inbox search, that's how we got started to build Apache Cassandra, or what would become Apache Cassandra.
The problem there was--
Grant: So, searching over the messages that you're getting?
Karthik: Searching the messages. Yes.
Grant: Yeah, got it.
Karthik: You search so rarely, if you think about it.
You read all your messages, and then you keep around a lot of messages, and then maybe you search something.
The probability of you searching versus the amount of REITs you get.
It is skewed heavily, heavily towards REITs.
And search indexes are much bigger than the data, because if I sent you a message with 10 words, you would have 10 entries in the index.
Whereas there's just one entry for your message. So, it kind of explodes.
And you need the index for you, you need the index for me, so it kind of really, really explodes.
We didn't want to put that in MySQL.
It was both not cost effective and operationally a nightmare, because we'd just be scaling.
That's all we'd be doing. So, we said, "We need to think through this problem slightly differently, and look to the literature."
And found that Bigtable from Google was solving a lot of these types of use cases.
At the same time, there was a Dynamo Paper out of Amazon, which is what they were using to scale their cart service, because that was also like a NoSQL access paradigm.
So, we hybridize those two and built what would become Apache Cassandra for this problem.
Grant: Oh, interesting.
So, you read these white papers that came out of Google and Amazon and sort of said, "These are interesting solutions. Let's learn."
Karthik: We need a combination of all. Yeah.
For our problem at hand. NoSQL was anything back then.
It's just didn't have to be SQL as long as it served your use case.
And in the famous CAP theorem, the consistency, availability, partition tolerance, we said, "Search indexes don't need to be consistent, but they sure as hell need to be available."
And the reason is because if somebody couldn't find their message, because it was eventually consistent, and that message was dropped, they would complain.
We'd go back to their messages and reindex this, problem solved, we're okay.
So, that's how Apache Cassandra was born.
But then we said after that, the next evolution was we saw the growth of transactional data itself explode.
I wouldn't call this transactional. This is kind of derived data, so it's secondary.
Like a search index on the messages is secondary, but the message itself is absolutely core.
Now, back then in Facebook, again, this was around 2009 to 2010 when Facebook had the vision of simplifying messaging for everybody, so they said like, "There's no use in actually making you send Facebook messages to some people, SMS like communication or chat to somebody else."
And phones had just come out and SMS was becoming popular.
And so, you had to figure out, "Oh, that person is of the previous generation, so maybe he wouldn't do SMS so often."
But these people, like our younger generation, they definitely do SMS and these guys in the middle, they want to chat and each going to a different system, and chat was stored only in memory.
And so, it was all fragmented.
The vision was just send a message and let the person receiving the message figure out how they want to receive it.
You don't have to figure out what they want, and what they would do based on what you think is right.
Just give them the choice. But what did that mean for the infrastructure?
It meant that we were getting billions of messages a day, like about 10, 20 billion messages sent through chat.
We just get hundreds of millions of messages through the Facebook messaging service.
If you combine the two, you now have a lot of messages flowing and have to get stored as opposed to in memory.
So, you really needed a scalable database, and you couldn't lose these messages, because imagine using SMS or something and it keeps dropping your messages.
You'd be like, "Oh, this service sucks, I'm going to switch."
So, we wanted to make sure we absolutely gave guarantees on consistency and as high in availability and correctness and so on.
So, the CAP theorem actually shifted from an available and partition tolerant to a consistent and partition tolerant and do whatever you can to make it available.
At that point, we ended up looking at, again, a lot of research and architectures and we picked Apache HBase.
And back then, it was a system that was not meant for OLTP.
It was like a project out of a few different companies.
They were using it mostly for analytical workloads that were real time, and it hadn't gained mainstream adoption.
So, we were some of the core committers that went in and started putting, like for example, sync support and sharding.
There was a lot of people and help from the community.
Definitely have to acknowledge that.
But you could see that by 2010 or so, the open source community and infrastructure specifically was starting to grow.
But anyways, we got HBase into production, I'd say, in 2010, towards the end of 2010, 2011 for Facebook messages.
Right off the bat, it was 20 billion messages coming in.
It stored the actual search index, and it stored the message data, and it was like hundreds of terabytes, took multiple petabytes of data that we stored.
Our target was to be capacity bound, just add more machines, because you couldn't store anymore.
So, we had to pack in efficiency and make sure latencies were-- A lot of good stuff.
But then, subsequent use cases started becoming attractive for this type of a database, now that we had built one.
And so, the second use case was Facebook time series and alert data.
So, it's called ODS, operational data store internally.
And so, that started off the bat with 150 billion messages a day.
These are smaller values, but nevertheless, it was also a great use case.
Pretty soon, by 2013, when I switched over to Nutanix, we were running eight to 10 massive use cases on top of HBase.
Grant: Okay. So, you're at Facebook, you get all this experience scaling systems and running these robust databases and data storage systems.
And then what made you decide to go to Nutanix?
Sort of someone got in touch and you decided it was an interesting opportunity to be a little bit more involved in strategy and thought leadership?
Or what happened there?
Karthik: Things were going phenomenal at Facebook.
I'd love that company. I still love that company.
The reason to switch was more like me and my current co-founder and CEO at Yugabyte, Kannan, we had this vision of building a database company at some point.
Grant: Oh, interesting.
Karthik: And Dheeraj, the CEO of Facebook, he reached out and he knew Kannan from before they had worked at Oracle together.
Grant: The CEO of Nutanix.
Karthik: CEO of Nutanix. Right. CEO of Nutanix, Dheeraj. Yeah.
He reached out and spoke to us and said like, "Hey, why not give an enterprise startup, I suppose--"
I mean, Facebook is a great infrastructure company. It's a great product company.
It builds great infrastructure, but it's different building an enterprise company from how you--
The dynamics are different, right?
Of course, we didn't know that back then, but the point was experience that, experience building infrastructure for the enterprise as opposed to our customers in Facebook or other Facebook developers whose customers were end users.
So, this is a different dynamic altogether, but that is also a very interesting set of learnings from a place like Nutanix, like what it takes to build and scale an enterprise company and what are the asks, what are the objections.
Grant: So what were you working on day to day at Nutanix?
If I guess, first, it sounds like you're co-founder at Yugabyte, you work together at Facebook, or you both came together at Nutanix?
Karthik: No. My co-founder, Kannan and myself, I'm another co-founder with three of us, three co-founders, all of us work together at Facebook on the same product.
We're HBase developers.
So, Kannan and I was the third and the fourth people to work on Cassandra.
This was in the really early days before open sourcing, like I said, early, early days.
We were putting build systems in place, so it's really, really early days.
And then we worked on HBase, so we actually did analysis of distributed data stores and ended up picking HBase, championing that and then taking it all the way to production at massive scale.
And then we went together to Nutanix and worked on distributed storage at Nutanix and also learning the ropes of building an enterprise company.
Grant: Okay. So, I think it's actually really interesting, because you basically went from this hyper scale consumer company to an enterprise software/hardware company to learn that side of the business.
Karthik: That's right.
Grant: And so in your role as technical staff, what did that entail? What were you doing?
Karthik: No, we were primarily technical.
We were building a lot of features into the core database.
I think what is interesting is-- And, of course, we got to travel with the sales teams to talk to a number of customers or getting the customer exposure, work with the marketing team to figure out what it entails to do enterprise marketing.
So, we were--.
Grant: Product marketing.
Karthik: Product marketing. Yep. The whole shebang.
And we were fortunate enough to see all that, but I think even on the core product side, there's a number of learnings that at least I would think are pretty interesting contrast between a hyperscale consumer company and a fast growing enterprise company.
In a hyperscale consumer company, you could think of it as having 10 to 20 workloads, but at ultra massive scale.
So, each of these workloads would have billions of operations and petabytes of data and crazy demand.
In an enterprise company. I mean, it's a similar domain in the sense it still distributed data, but the type of problems are so varied even within a single customer of yours and enterprise customer.
Specifically across customers, it's not 10 or 20 workloads at billions of operations. It's about 10,000 workloads at tens of millions of operations. So now, the trick is some of the stuff you would do at a hyperscale consumer company is the last 2%, because that actually has a material impact on the ROI of running infrastructure. But in these slightly smaller companies, you still want to architect it in such a way because there could be breakout customers that end up growing that big.
I mean, after all, Facebook was small at some point and then became bigger.
Every company is. So, you still have to get that part.
But there's the additional part of tuning the infrastructure for a wide area of workloads.
So now, you'd have one person that wants to use a really, really small amount of data, but read a lot.
The other person would have really large amount of data, but read very little.
There would be somebody in the middle. There would be people doing updates mostly.
So, you'd see all of this crazy type of use cases that come in.
They'd be people, very security conscious, like extremely security conscious.
They'd be people that want integrations into five dozen ecosystems.
So, all of this results in technical work, but it is a slightly different kind of technical work, and that was very fascinating for me.
The level of depth and the type of technical strength you need, like the engineering strength you need doesn't change in a sense, but it changes in its application.
Grant: Yeah, it sounds like you were focused primarily on the core features of the application.
Did you do any of the enterprise features around administration, or access control, or single sign on and those kind of reporting? Are those projects --?
Karthik: Yes. Actually, I was involved with some of that too.
So, some of the things that we were doing at Nutanix, for example, was to help people figure out the alerting story, like monitoring, alerting, and uptime, because Nutanix, again, is the storage layer for mission critical applications, in some sense.
We're doing storage layer as a database, but that's a storage layer as a file system.
It integrates underneath your virtual machine, and it actually stores data, so it is very critical.
So, that means you need to figure out how to monitor it, manage it, and you can imagine, it's not just monitor and manage it, you also need to track how many assets you have, how many disks have you created, how many VMs have you created, so that causes a sprawl.
So, there is a configuration management side of the house, and then there is actual runtime and alerting, and I was involved in both of those, and in the product that stitched all of this together.
And obviously, it had to work inside the data center. Some of Nutanix's customers are, for example, very, very sensitive about their data.
These could be government agencies that are not actually connected to the Internet, so the software had to work in a way completely air gap.
Nothing comes out, nothing goes in unless they ...
Grant: Sneakernet it in there.
Karthik: Yep, exactly.
Grant: Cool. And did you spend time sort of across that customer base, like understanding use cases and collecting feedback from customers?
I guess you probably worked close with the product team as well to do that.
Karthik: That's right. Worked closely with the product team.
And in cases, actually spoke to customers also, both on their evaluation phase of what they're looking for right up front and also in the post deployment phase of what it takes for them to keep growing their workloads and keep running whatever it is they want to run and make them successful.
Grant: Great. Okay. And then, at some point, you decided that you had either enough experience and you were just itching to get out to get--.
Karthik: Faster databases. Yeah.
Grant: So, to talk about to what led up to the founding? What was the moment? And then talk through what Yugabyte is doing and where you're going.
Karthik: Yeah, I'd say two things.
The first thing was at Facebook, we were fortunate enough to see the growth of the cloud.
It was a private cloud. It was Facebook's own cloud, but we had gone from a couple of data centers where really one data center was used a lot and the other data center was a read replica and a DR target.
It was that type of a setup.
Grant: And when you say it was a cloud, you mean there was basically programmable compute where you had an API that could give you access to new machines?
Karthik: Yeah. I also mean the deployment paradigm. It's a bunch of things.
Maybe I'll just go through the short genesis of what I mean by a cloud.
So, initially, it was just two data centers, one primary for all the rights and consistency and the other as a failover target.
That was the old days of everything. It's like 2007, 2008 or so.
Grant: Active-passive.
Karthik: Active-passive, two data centers.
And if you look at traditional enterprises, most enterprises, or a lot of enterprises still follow this paradigm.
Now, it went from there to, "Hey, we just need to utilize all this hardware and data centers better for better ROI and for better expansion."
So, that means go from a pure active-passive to active-active, and it need not be active-active for just one application, it could be active for one, active for another in different days.
Just try to disperse it. So, went through that to expansion, to multiple geographic regions, to figuring out, "Hey, if you had just one data center in one region, and all of that is connected to the same power outlet, and somebody, in an honest mistake, trips on that power outlet, your entire data center is out."
You don't want to take on this type of unnecessary risk. I mean, you can also think of natural events like floods.
This has actually happened. I think Katrina, or one of these floods actually took out some data centers for people.
So, there's a variety of reasons why you wanted geo dispersed data.
So, we started pretty soon getting into the organization of multiple nearby data centers, and spread across faraway data centers.
This is known commonly in the cloud parlance as regions and zones, but this was not there 10 years ago, for sure, and maybe not even five years ago.
It wasn't that common. We didn't really have a name for it, because it didn't matter.
In Facebook, you don't have to name. It's like nearby and far away, it's good enough.
But organizing ourselves that way and really seeing the velocity of app development, like we were talking about microservices.
So, essentially a lot of the transformation inside the company was to make sure they could be independent, smaller units that could move really fast and they're not dependent on each other.
So, that you could build a lot of these micro applications like people you may like, or here are some suggestions, or so many applications that go into something like a Facebook app overall.
And then from there, it went to being able to deploy with consistency.
So, we built what is the equivalent of Kubernetes today is called Tupperware inside Facebook and airtight deployment, and management, and ability to specify how many replicas to run and some level of scheduling, orchestration and fault tolerance.
And then we went into the data side to figure out how to really geo disperse data to deal with scale, and failures, and rolling upgrades.
And so, seeing that whole journey, some of it directly, some of it indirectly, of course. Yeah.
Grant: It was like the Tupperware. A little more indirectly, you were like a user of Tupperware?
Karthik: As a database, we hadn't quite used Tupperware, because it wasn't quite necessary inside Facebook.
We were like a managed service for the database inside Facebook, and so our users didn't really care where it ran as long as it just worked.
Grant: I was kind of assumed it was somewhat Borg inspired, kind of like these white papers as well.
Karthik: Yeah. I don't know the genesis, whether it was Borg or it just grew organic.
I'm not sure of that, but I did know a lot of application deployment started happening through Tupperware for the right reasons.
You don't want mistakes, like somebody was using a machine for development and suddenly that machine went into production, and that now you have hacked code all over the place and getting exposed to users, and you have 10,000 machines, and you can never find this one.
Users hitting that machine are seeing weird things, but nobody knows.
So, you want an airtight deployment, for example.
So, that's how it really started.
But after that, people say like, "Hey, if I want 10 instances of this running, I really want 10 instances of this running or the machine goes down. I don't care. Just make sure it keeps running all the time."
Saw a lot of that organically build up.
And, of course, not directly involved with how it got built, or why, or what the reason was, but I did get to see the building of it and the reason and the high level design paradigms around it.
It's pretty remarkable, the one to one application to the enterprise at large, because all companies are now--
The digital transformation is really about moving all your data to the cloud, building a lot of applications with agility and serving your users really, really fast.
Grant: Yeah, it's cool. Okay. So, then back to kind of how you came about.
Karthik: At the time, we figured out at Facebook that HBase, for example, in itself is not going to lend itself to a nearby data center or multi-zone and faraway data center type deployment, so we had figured that out.
And we were already working on other solutions at Facebook to figure out what to do.
There is this solution called ZippyDB that's used quite often.
The MySQL tier itself, if you think about it, is a very sophisticated, replicated, fault tolerant, geo distributed layer.
So, we had done a bunch of that whether as a coherent product or as a bunch of enhancements on top.
Because like I said, larger companies like the hyperscale consumer companies have a fewer set of use cases, and they have to optimize for it, so it is possible to change the APIs.
So, TAO was born inside Facebook, the associations and object server.
So, that it's easier for end users to map their data model in a flexible way on top of this layer, and that layer would do the cash plus database, plus geo distribution story.
So you don't have to deal with all of this yourself in the app, imagine you have to invalidate the cache or you know somebody did it right and I cannot read from this replica, let me go there. So, this stuff gets really hard very soon, and it doesn't scale right.
At Nutanix, we saw that the enterprises--
I think the big aha for us there was the enterprises are always going to be where the hyperscalers are pretty soon.
It's amazing to see this.
It's because as the rules of the game going from, say, only brick and mortar to digital, or all of that get changed, the people that were born with that thesis often are ahead.
I mean, it's not anything different, right?
And you don't want to, as a large enterprise, take the risk of not knowing whether this thing will pan out and jump into the new fad, until you find out it fizzles out, and then you're like, "Why did I waste all that time?"
It's the right thing. It's not anything wrong.
But having seen that, and we were seeing that-- At Facebook, we thought that, "Oh, yeah, this is the world. Everybody does it this way. Everybody scales. Everybody has geographic distribution. Everybody needs low latency."
So, we just thought that's just normal.
But when we went out and actually dealt with the enterprises, we saw that they were trying to go through this journey, but it was offset by about a few years.
That was a very interesting and telling thing. Specifically, we saw Kubernetes come out and we could liken it to our own Tupperware growth.
And so we knew that, "Hey, in this stage, one of the things that's going to come next is databases."
And databases are close to our heart, so we all love it as a company.
I've worked on three or four databases, and I'm not anything unique in my company.
Many people have, so we all distributed database. We just love that technology.
So, putting those two together, we said we'd start it and we started early 2016.
Grant: Okay.
Was the impetus like, "Oh, look, the world is really moving towards how we were doing deployment and containerization and all these kind of things. That's going to disrupt."
You kind of identify that as the platform shift? Or what was the--
Karthik: Yeah. We had figured out every one of the platform shifts that were happening or paradigm shifts inside Facebook was pretty much happening, and pretty much roughly in the same order too.
So, we figured one of the last frontiers at Facebook was data, because data is always hard.
None of these apply one to one.
I wouldn't say the way Facebook built, for example, Tupperware just applies to the world, because you have to kind of figure out that many use cases and many enterprises angle to it.
So, it seemed like the right time, because as more and more people--
It seemed pretty clear that the shift to the public cloud was just starting.
And in 2016, I remember talking to a lot of enterprise customers who were unclear if public cloud for OLTP was going to be on their roadmap or not.
They were deciding. In 2017, it became a lift and shift.
In 2018, it became I need to figure out ROI.
And from 2019, it's like, "Let me really figure out how to geo distribute data, how to get scale failure, all of this for mission critical."
So, that shift is playing out exactly the same way. It's always a bet when you start a company.
You figure this should be a trend that you really feel is right based on your intuition, your experiences, et cetera, et cetera.
And so, that's the bet that we took. Although what was not clear is in what form this would apply to the enterprise.
Should we build a completely new API? Those were just learnings after talking to people.
Yeah, I think that's generally a good way to go about an enterprise company anyway.
We talked to a ton of users and customers and what they're using now, what they'll be looking for.
And we found that the number one question was, "Why the hell are you building another database? There's so many databases."
It didn't bode very well for us starting a database company.
But I think after enough talking, we found that the objection is not to building the database, it is to coming up with a new API.
So, people are saying like, "Don't build another good thing that I'd have to really, really struggle hard to learn."
Why? We're at fault too. Cassandra or HBase, or you have Mongo, which is a new paradigm, or there's Redis.
All of these databases that come out, like DynamoDB.
Every database is a new paradigm, you got to figure it out.
At first, when you just look at what the database offers, you can only get so far with its architecture and then you really have to go into the data model language to figure out, "Okay, I know what I'm giving up on the architecture side, but what am I giving up in terms of features?"
You never know, until you really learn the language, and it takes you a couple of years to really learn the language and it reaches the limit and you're like, "Uh-oh, okay, I didn't know I would be giving up that if I did this."
I can tell you from my own experience having been in the early days of Cassandra and HBase, any number of people have come up and said, "Hey, could you just put in that secondary index, please? That's super useful."
I mean, of course, we didn't need it, we didn't think about it, but it was not that easy to do.
You needed to change the core.
And we saw that there was really no database addressing those needs like of hyperscale, hyper-performance, and with transaction, and indexes, which are really useful.
Grant: Just for the listeners, kind of talk through the core value of Yugabyte and how you describe it.
Karthik: Yeah, in a couple of words, Yugabyte is a high performance distributed SQL database.
So, it offers all the features that you would expect of SQL.
So, there's no new API, and it's multi API.
So, what we said was people are going to different API's for different reasons.
They actually have their strength in their ecosystem.
So, the two APIs we have are the YSQL API, which is Postgres wide compatible, and offers all features of PostgreSQL.
The other API we offer is YCQL, Yugabyte Cloud Query Language, and this has its roots in Apache Cassandra, but it is still semi-relational in the sense you cannot do joins, you cannot do a few things, but you still get geographic distribution, transactions, document support, indexes, you get all of that stuff.
The difference is, if you really are thinking of scale, you probably don't want to be doing joins. I'm talking upwards of tens or hundreds of terabytes.
But if you're thinking of not so much scale, but I want my feature set, and I don't know how my app will evolve, and I want to do really complex things, and I want my relational integrity, we do the YSQL with Postgres compatibility all on this common core.
To you, it just appears as a couple of tables, you interact with it using the appropriate drivers, but internally, the database manages the whole thing seamlessly.
And vision wise, where are we going with this, our aim is to build the default database for the cloud, because we think it's really hard if you have to patch together three or four databases to build your app each time.
You slow down on the app, and you keep learning more about databases and how to deal with them.
So, that's really what would end up happening and is ending up happening.
So, the vision here is, so what do you pick if you really want to build an app?
You probably would start with Postgres or MySQL.
You don't need anything else, except it doesn't scale, and then the rest of the problems come in, and then you have to do fault tolerance.
And then you have to do caching for speed, and you have to do a bunch of stuff.
So, we said like, "Suppose we built a database that had every single feature of Postgres, nothing taken out, so everything."
So, at this point, your objection to, "Hey, I cannot do XYZ when this database goes away."
And now, it's like, "Can I get performance and scale out of XYZ operation?"
Which is a far easier operation.
So, Yugabyte is really everything relational with all the no-SQL underpinning, so the scale, geo distribution, performance aspects built in.
Grant: And that wire compliance is pretty key, right?
Karthik: Yes.
Grant: Because ultimately, that's kind of a migration strategy.
Karthik: Absolutely.
So, migration, new app, either ways, there is an unbelievably rich ecosystem around Postgres.
So, it's not just wire compatibility, it's also the feature set. You could say I can understand what you're saying, but I may not be able to execute what you're saying.
It's slightly different. So, in order to bridge that gap, actually, I think we did something novel.
But anyways, let the viewers be the judge of that.
We actually took the entirety of the Postgres code base, and we had worked closely with the folks building RocksDB at Facebook, because it was basically a lot of HBase stuff.
We wanted to do the storage layer under MySQL, similar to how HBase was doing it.
Log-structured merge-tree storage, because it gives you a lot of win on SSDs. So, we took RocksDB as a starting point.
We took Postgres, the entirety of the code base, and we took Apache Kudu, another project that was inspired by HBase, but is doing the next generation of analytic processing for the raft layer for the distribution of data. And we actually did a tight integration by really going into these code bases and changing a lot of stuff and tightly coupling them together and to make a new database from scratch. So, this database pretty much can support everything Postgres supports on top.
So, we're actually the only horizontally scalable database that supports pretty advanced Postgres features, such as stored procedures, or triggers, or partial indexes, extensions.
We want to do foreign data wrappers, so you can interact with external tables.
So, the works. There should really be no objection, because we've seen this over.
You take a shortcut, and they should really be no objection in terms of feature set.
Similarly, we had worked on Cassandra and HBase, and there was a ton of database code that we knew intimately well sitting around, but it was written in Java.
We kind of felt like memory sizes were increasing, and people wanted high performance in the cloud, so we said, "You know what? Yep, it's the gap between the cup and the lip, right?"
So, we said, "Let's rewrite this in C++ because that's what it's going to be."
So, we rewrote the whole thing by assembling all of this in C++ from scratch.
I mean, our knowledge around the paradigms and what would happen and running it at scale and getting the P9, that helped tremendously, obviously, as a team.
We put the whole thing together in C++, soup to nuts, all the way from the bottom to the top.
So, that was the second major thing that we had decided on.
And then the third thing was the fact that it had to be operationally easy and work at massive scale as well as at a small scale.
So, it should just work.
And multi-zone deployments are going to be the default was our bet, so we should just make sure that, that is easy to do and it just works really well.
And go from there to other more advanced multi-region deployment.
So, we support a ton of multi-region features like read replicas, async replication, bidirectional async replication, and so on and so forth.
Even just geo stretched clusters that you can have some nodes in the US, some nodes in Europe, some nodes in Asia, and you can get geo consistency, but read from the nearest data center.
So, we support a lot of those.
Grant: Wow, it's cool.
Karthik: So, those are the three pillars. High performance, all of SQL, and scale, and geo distribution.
Grant: Great. Okay. And you open sourced it?
Karthik: We open sourced it. Yes. Again, that's an interesting journey.
2016, when we started the company, it was not clear that open source as a business model would be viable.
It was unclear. There was a lot of people saying it's not, and we're just new to building businesses.
It's not like, we had built a lot of businesses before, so we said--
Grant: Nutanix wasn't open source, right?
Karthik: No, it's not. No.
There's a bunch of other people saying the future of the cloud is Database-as-a-Service.
If you're building it as a service, you definitely should not open source.
And so, Snowflake had an open source, but there's a bunch of other guys open source.
So, it was a confusing time. We knew that we always wanted to open source, because Apache Cassandra, Apache HBase, what do you expect?
So, we said like, "Let's fight our own biases here. Let's hold the horses a little bit and really understand what's going on."
So, we took about a year or so building the database, but not doing anything.
We're just building it in stealth and saying, "We'll figure out what this company should look like."
We know how we want it to look like, but we should really see, just make sure we don't make a silly decision.
2017, it became pretty clear to us that open source is the viable path. It's really going to be the path to go forward.
Grant: That just based on the success of open source infrastructure? Or what was the ...
Karthik: Yeah. A couple of things, having spoken to a lot of customers, I think a few things came out pretty clearly.
First thing was a lot of customers-- I'm going to say this objectively.
Hopefully, no one takes offense, but a lot of people don't like Oracle for it's closed behavior.
You don't know why they're paying Oracle or how much they should be paying, et cetera, et cetera.
Grant: I don't think that, that's fair. Yeah. That's not very controversial.
It's okay to say that people that maybe who don't like Oracle--.
Karthik: I mean, there may be Oracle people who like Oracle. I don't know. I'm just saying what I heard.
Grant: I mean, the cloud CEO, Thomas ...
Karthik: Kurian. Yeah. He's from Oracle. Yeah.
Grant: He just commented how he didn't like Oracle Cloud.
Karthik: Okay. So then maybe I'm okay saying that.
Grant: Yeah, you can say it.
Karthik: Okay.
Grant: He worked there and ran it for a long time.
Karthik: That's it. I'm way better than-- Yeah, that's okay.
Grant: You're just a casual observer, so I think you're okay.
Karthik: All right. Yeah. We did hear that, and one of the reasons for that is the lack of transparency.
And a lot of people associate open source with transparency, which it is. Unbelievable amount of transparency.
Grant: 100%.
Karthik: Yeah. You know the architecture, you can go work on it, you can go fix the feature, you know exactly why you're paying or why you're not paying.
All of that is fine. So, that's number one.
The second thing about open source is it accelerates both your maturity cycle as a software.
A lot of people using it, but also accelerates your feedback from the customers.
Because you would now have to get 100 different customers to have used it in 100 different ways to mature your product.
You just have 100 different users using it on day one and saying, "Hey, here are some of the issues and some of them will help fix it."
So, that's the other great part about open source.
The third big learning was that there were a set of people saying like, "Database-as-a-Service is the future, and that doesn't need open source, or don't do it."
I mean, these are all legitimate points.
For that point, what we realized and what we're still realizing is that as a new database, or even philosophically, many companies do not want to give up control on their data, on their machines.
You should probably be seeing this a lot given the set of tools you're building, because people want both the ability and the flexibility to move from one cloud to the other for whatever business reason, or stay on-premise, or keep straddled or move back to the on-premises, because it's expensive for any number of reasons.
The thought is if you give up all the control and put your-- Analytics did well as a service.
Snowflake is doing well. All of these analytic companies are doing well.
And that's because analytics is bursty.
You just need to spin up 1,000 machines, you need to get 1,000x to speed up, and then you shut down all the 1,000 machines, and you can go home.
I don't need to own anything, I just paid for 1,000 machines of compute.
But your other option would be to have 1,000 machines for those crunch times, and what are you going to do?
So, analytics has a natural affinity for the cloud.
Different people using 1,000 machines at different times is better off on the cloud provider side and not on your side.
But OLTP is different. OLTP is the lifeblood of a company.
If your users cannot log in, if you cannot see those orders, if the orders cannot get placed, you cannot check out, all of these are killer problems, and they're nowhere close to the scale of analytics and they run all the time.
Grant: And definition OLTP real quick just for--
Karthik: The OLTP term is online transaction processing. That's the acronym.
But it's really anything that an end user will interact with, or is transacting.
So, the time to serve this, in an analytic style use case, you're trying to figure out your usage patterns.
You're trying to figure out, say, "How many people visited my site from North America between 15 and 40 years of age from these states?"
That's a great analytic query. It's okay if it takes even an hour to get you the answers.
Grant: It's not really in line.
Karthik: It's not in line. Yes.
But if you're trying to log into an app, you're trying to browse and place an order--
Grant: It can't take an hour.
Karthik: No, you're placing that stock trade out. One hour later, everyone is out.
So, OLTP is growing really fast. This is a part of every.
At least, I never used to think about it until I spoke to the people.
For example, an Amazon's retail experience, we kind of take it for granted, and it wasn't there four years ago.
I mean, at least I never used it as much four years ago.
Now, a number of people are actually going out and shopping physically in stores.
It has dropped quite a bit before Christmas shopping like this holiday shopping. A lot of people do online.
The online sales have skyrocketed.
Every year, the Black Friday and Cyber Monday sales have been growing higher and higher.
And this year is no different.
It's the new record for how much.
So, it's clear that, that pattern is going up.
Now, what do all the traditional organizations do?
They all have to digitize.
And every app they build for the user to place an order to figure out, "Where is my shipment? I didn't received. I want to return my shipment. Now, what do I do with that?"
Or, "I want a coupon. Give me recommendations. What did people buy?"
All of this stuff has to go online, and all of this is OLTP.
Grant: Yep, that makes sense. Okay, great.
Karthik: So, OLTP, that space, people still want to hold on to the machines in their data, and for the right reasons.
Because it is always running. It's not something that's bursty.
And so, it has probably nothing to do with the cloud.
In the sense, yes, cloud can decrease my number of operations, somebody else can take care of it, but I may want to move it to a different cloud at a different point for whatever reason.
I may want to have one app in one cloud, and another in another cloud, because TensorFlow and Google is awesome, but Amazon and Alexa is awesome, and I want one app running processing Amazon Alexa data, and I want another app running and doing TensorFlow machine learning, right?
It's perfectly legitimate use case.
And now, if you have your databases only as a service in these clouds, it's going to be difficult, you need to hire a set of people to deal with this and another set of people like developers, I mean, to deal with that, right?
Which is not exactly easy. So, having a neutral database that just works anywhere is super useful.
Plus, the larger companies often acquire smaller companies, which would have made different decisions on different clouds.
So, you end up inheriting other clouds anyway.
So, there's a number of reasons why this is becoming an interesting paradigm for you to go by.
Grant: Again, on the open source side, right?
So, you basically sort of decided for transparency for portability, let's open source it.
When you first open source it, you didn't open source everything, right?
Karthik: No. Again, traditional wisdom at that point said, if you open source everything, no one will pay you.
So, you will just end up having software out there.
And little did we know that this space was also rapidly evolving, the open source space. It's all retrospectively obvious, but wasn't then.
So, we said, "We're going to keep some of the security features." That was a traditional wisdom.
So, things like security backups, like all of this stuff in the closed, so they're going to be enterprise features.
The open source folks are going to get a fully functional, really good database.
If they wanted to figure out how to secure the database, well, it's on them.
They can do that themselves. But if you want it to run in production, they would have to come talk to us.
This was twofold. I mean, one part of it is the revenue aspect.
The second part of it is that way, you have leverage against a larger cloud provider taking your software and just running it and running you out of business like Amazon famously is in the news for doing that, or allegedly doing that, whatever.
Grant: You're very careful. You don't want to step on any toes .
Karthik: No, I'm not. I actually think there's nothing wrong with it.
But anyways, my thesis is actually on the other side, hence the open source fully, right?
So, I'm not saying there's anything wrong.
I'm saying that's the big discussion going on, on both sides.
Anyways, our thought though, after functioning as this open core, I guess is what it's called.
It's open source part and the closed core part that you monetize on.
What we realized was a couple of things.
Firstly, MySQL and Postgres was natural successors to Oracle and SQL Server.
We want to be a natural successor to them.
And they were successors because the internet happened, and people wanted a different database at a different price point with different set of features.
Now, the cloud happened and we want to be that database.
We think we actually have a shot at it, but people actually came back and told us like, "Hey, neither MySQL nor Postgres held anything back. And if you don't let us do that, how do you think the adoption is going to be at that level?"
And it's a very fair point.
So, the first step was user feedback. The second thing was this whole Amazon taking stuff and if they're running it, if you're not or whatever. A bunch of companies have tried doing other things before us. That's because the cloud came after those companies were born. And so, they had to think about it differently. So, they had to go into the cloud as the cloud was racing. And so, it kind of became a race, a two-man race.
Elastic, for example, held back a bunch of core features.
Amazon ended up re-implementing it anyway right in the open and said, "Hey, I'll do one better? I'll even open source it."
You got all the features, and now you have them in the open source on my version of Elastic.
Similarly, Mongo was AGPL, which is supposed to deter all the cloud providers, but it really didn't stop Azure and AWS from putting a Mongo compatible service.
They both have it. So, I think the bottom line is, if there's success, there will be multiple players.
I mean, I kind of liken open source to the sun.
If you have solar technology, everyone is going to make power out of it.
It doesn't mean you can't make money, but you cannot stop everybody else from making money.
So, it's just one of those things that lifts all boats.
The difference that we realized is that the money now has shifted from just the database core features being held back and just support to actually making it really turnkey and easy to run in the cloud, right?
Famously, Aurora is an example. The fastest growing AWS service.
Like the fastest across all services. Yeah.
And they offer a cloud version, like a cloud-ready version of MySQL and Postgres, the two most popular open source databases.
So, Atlas, and MongoDB, no different. MongoDB, massively popular database.
Atlas is skyrocketing. I think they did about 170 million in revenue in just two or three years, and they're growing really fast.
So, it's pretty clear that if you have a valuable tool that's widely adopted, and you make it very easy for people to run, there is money to be made.
Grant: So, don't hold back the features, just make the enterprise part being make it really, really easy to run and scale.
Karthik: Exactly. So, to tie it back, I guess we all knew this all along, like this is how Red Hat worked with Linux.
Don't make it hard to run. I run Linux.
I don't pay Red Hat, because I don't do anything critical with it.
I just like to compile my software, do something.
But then if you put Linux on top of a production server that you're running a mission critical workload on, you probably want to pay Red Hat.
Because, A, they make it really easy to secure, to manage, to figure out what's wrong, to patch it, to give you that level of support, to ask you about what are the other features you need.
So, there is a place, especially in enterprise when you're doing mission critical stuff, there is a place where you can make money out of it.
Now, the Red Hat model is moving to the cloud effectively.
Grant: There's also just something about how enterprise buyers perceive open source where I think they understand that part of the give and take is if you want open source to exist, and you don't want it to all be like Oracle, you have to pay open source companies.
Karthik: That's right. Otherwise, there's no more open source.
It's the two sides. If you make money as an open source company, there's actually more open sources.
Grant: Yeah, it's like reciprocation.
You want the same to be around, so you pay the company that's building it, because it's critical for you.
And it's like, "Sure, you could have a team internally trying to do all that stuff."
But it's just a lot harder, so we centralize it into one company.
People pay them to do it really well. I guess, the model now right is sort of manage Yugabyte and then the enterprise, distribute that same management layer to enterprises to run privately.
Those are the two different options.
Karthik: On the commercial side. Yes.
The one that Yugabyte as a company manages is called the Yugabyte Cloud.
It's still in beta. And far more popular commercial offering that we have a lot of customers, happy customers using is called Yugabyte Platform.
It's the one that goes into the customer's account.
People are using it for doing billions and billions of operations per day.
Yeah, the scale is phenomenal, and we're pretty happy with it.
Grant: That's great. Can you talk about just, in general, some of your customer logos, like who's working with you?
Karthik: Oh, yeah, absolutely.
I'll talked about some of the high scale customers and interesting stories.
We have, for example, this customer, Plume. Plume is a Wi-Fi device company.
They get quality of Wi-Fi from people's homes.
For example, if you're using some of these big ISPs, like Comcast, for example, you're already using Plume, because they try to figure out, "Are your devices at home secure? Are they functioning okay? Is the communication between them fine? What is the quality of Wi-Fi at home?"
Because, slowly, our homes are becoming mini data centers.
We got so many devices going all over the place. They're growing rapidly.
They have a large base of customers. They work with multiple of these Wi-Fi providers.
They sell a Plume device that you could stick into your home to do it yourself, to see the statistics yourself.
So, Plume uses Yugabyte. They had looked for a number of databases to run at scale.
I mean, they developed originally on Mongo and they looked at Cassandra and looked at a bunch of other stuff, but they wanted real scale and with transactional consistency, because they're a B2B company.
They cannot say they lost some data. It doesn't work that way.
They wanted multi-zone deployment, high availability.
But what is the coolest thing about Plume, and they spoke at our first user conference last year, around September is as of then, they were doing about 27 billion operations per day on about a 35 terabyte dataset.
And they were projecting that it would go up to 75 billion operations per day this year. I'm just saying like we used to be pretty proud at Facebook of few operations per day.
And now, many companies are crossing.
So, we have about five or six companies in our paid customers that are north of a billion ops per day.
Apparently, that barrier is not that hard to beat anymore.
Grant: Yeah, it's funny. It reminds me of when people started introducing these projects outside of the hyperscale companies.
People would be like, "You're not going to run at the scale of Facebook. You're not going to run at the scale of YouTube. You're not going to run at the scale of Google in general."
And it's like, "Well ..."
Karthik: "I'll tell you why."
Grant: Yeah. You probably will never actually catch those ...
Karthik: Now.
Grant: Those now, but you'll be where they were five years ago in five years.
Karthik: Which is formidable. Yeah.
Grant: Yeah, and it's a lot, right?
Karthik: It's a lot. Yeah.
Grant: I think that's partially why these technologies follow.
A lot of the groundbreaking tools come out of these groups.
Karthik: That's right. Yeah.
Another interesting example is this customer, Narvar.
We were actually on a Google webinar, because they're using Google too, and it was like Yugabyte, Narvar and Google Cloud just talking about the thing.
So, I learned a lot myself from there.
But anyways, Narvar does over, again, a billion ops per day on Yugabyte.
Their customers are retailers, and they help the retailers customers get a good post purchase experience.
So, after you place your order, they do shipment tracking, return tracking.
So, we talked about what happens to your order after it's placed.
We know everything up to then, but after that, it just shows up on your doorstep, but there's actually a lot of stuff that happens.
And if you return it, there's a lot of stuff that happens.
So, Narvar takes care of that, and their customers are the who's who of the retail world.
They have, for example, Macy's, and Home Depot, and Gap, and Neiman Marcus, and all of these big brands who are using Narvar to do the post purchase experience.
Now, the funny thing is, the retail industry absolutely goes crazy and on fire the last few months of the year, because Black Friday, Cyber Monday, Christmas shopping, all of this stuff.
And it's just getting bigger and bigger every year.
And I remember from my Nutanix days, because we had a bunch of retail customers.
The typical thing in retail is that, and it's often a joke.
I think Ram, the CTO of Narvar was sharing with me is that in the retail space, there's six months of lockdown, because you don't want to touch your cluster when there's peak going on.
And there's about three months of peak and you want a couple of months to prepare.
So, by six months of lockdown and six months preparing for the lockdown.
So, that's the cycle. I think in our case, with Yugabyte, what Narvar did was they painlessly used our platform to expand the cluster to double or triple its size and just let it run, easily absorb all of the growth and the spikes, and then shake it back.
So, it was really painless to do. And so, that's something that was really cool.
In Plume's case, they were at billions of operations per day.
They were able to survive a zone failure of Amazon. Yugabyte outage didn't impact them.
So, there's a lot of value these type of customers are seeing.
I mean, we have a number of others. We have a lot of FinTech customers, in the financial--
Grant: Yes. Are you kind of going to market in these different verticals like to retail and to FinTech and to telco?
Karthik: No. We're a horizontal database, so we apply in multiple verticals.
So, our path has been more looking at users and customers that want to do digital transformation and move to the cloud.
They're interested in micro services, they're interested in geographic distribution scale, so we just use a horizontal message.
But of course, we have to echo the message in every vertical.
But in every vertical, there is a subset of thinkers that are currently already reasonably transformed.
Some of them undergoing transformation, and some of them thinking about it.
So, this is kind of true of every vertical.
I mean, the average, maybe it's slightly larger in one vertical versus another, but we don't focus vertical, we just go horizontal.
Grant: And then do you kind of go through who's adopting on the open source side?
Karthik: Yeah, that is a big part.
Open source and inbound is a big part of people finding us.
Our technical content, like for example, our blogs, our docs, podcasts such as these are often very informative for people seeking this type of information.
It's mostly just genuine stuff.
It's just things that people want to know and giving it in a way that's easy for people to consume, and that has a big pool.
So, that's one. There's also the second order network effect where our customer's friends, partners, users, they end up interacting with the service one way or the other, and they get intrigued, and they ask.
The recommendation or the discovery happens that way, so that is definitely another channel.
And we have some partners. They help spread the word also.
Grant: Yeah, because you probably get OEM into some products as well, right?
Karthik: Yeah, we're not OEM yet. I mean, we're OEM to the sense.
Some of these customers, for example, OEMs like Plume for example, OEMs as with their product, and a bunch of others do too.
We have this other super interesting company called CipherTrace, these guys help figure out fraud in the cryptocurrency space.
So, Bitcoin, Ethereum, et cetera, and they help law enforcement and other agencies catch money laundering in this space.
Grant: That doesn't happen in the crypto world.
Karthik: Well, I guess it should be happening a lot, because they're doing really well.
But for them, they just have to go through the entire set of crypto transactions and mine data really quickly.
For example, they would OEM us for their partners possibly, like we've been talking about.
So, there's a number of these type of places where you would want both a cloud service and an OEM.
You have this parallel in motion going on. So, that's another place where--
Because if you've OEM'd your product, which depends on a database, the whole stack better be very easy to monitor, manage.
And you guys know this really well, yeah?
Grant: Yes, of course.
Karthik: The Yugabyte platform product therefore helps a lot.
It gives you REST APIs to do most of the things.
It's well alerted. It's very easy for them to become supporters of the database.
They don't have to depend on us, and it's very easy for them to deal with their customers and keep them happy.
Grant: Yeah. The entire database and all the features of the database are actually open sourced.
Basically, the thing that you're selling is the operational management and that part of--
Karthik: The turnkey nature of it.
Grant: Yeah.
Karthik: Yeah, exactly.
Grant: And this is something we're seeing more and more sort of particularly in the Database-as-a-Service spaces.
In the Kubernetes ecosystem, there's these operators that are being used to sort of like codify a lot of the operational tasks.
So, that things instead of being like manual operations, which you'd have with like a Database-as-a-Service, where there's somebody in the backend that's managing that service.
It's all kind of done in code. So, it sounds like that's what you've implemented as well.
Karthik: Yep, that's right.
Grant: And then it's like you can buy that thing to ship alongside of Yugabyte, and that will sort of take care of the manual operations for you.
Karthik: Exactly. And we do use Kubernetes internally in the platform as well.
You can deploy the database on Kubernetes and manage it yourself, and we have a separate operator for that, but you could use the platform, and the platform actually makes it agnostic, whether you deploy the database in your stack on a bare metal VM, or Kubernetes.
It's the same to you.
And it doesn't matter which cloud you go to, it's pretty deeply integrated with that particular cloud service, which is actually not something Kubernetes would do.
For example, I want turnkey security by integrating with Amazon's KMS, because I'm on Amazon, and I want encryption and my key is rotated every six months, and I want nightly backups.
All of that is a couple of REST APIs away.
Grant: Got it. Okay, cool.
Karthik: Kubernetes does codify, and I probably will get there.
At that point, we'll also keep adapting. Adapting and adopting Kubernetes.
But there's still, I think, a vast majority of the deployments are still outside for stateful, especially given the networking challenges around Kubernetes.
Grant: Yeah, there seems to be the final frontier there in terms of what's happening.
But again, we think it's getting more and more solved all the time.
Karthik: It is. It definitely is. Yes.
Grant: Okay. And then team structure.
So, you've raised some money from Lightspeed and--
Karthik: Dell Capital.
Grant: Yeah. So primarily, engineering focus still, or really moving more into go-to market. What does it look like?
Karthik: No, we have a go-to market team.
I mean, a large part of the company is still engineering focused.
We do have a marketing team, and we have a sales team. So, we do both of those things.
This one falls between engineering and marketing, actually, but we have a community team.
They are open source side of things too, because--.
Grant: DevRel kind of--.
Karthik: DevRel and developer success and making sure we're answering questions and getting people successful on the product.
So, that's a huge part of our focus also as an open source company.
We want people to contribute, because especially if somebody adopts the database, and they have these couple of small or big features that they want, and they think they can get it done is, A, more exciting for them to do so.
But it's also more timely for them to have the control.
With a cloud, the problem is somebody runs the service, you want something done or something doesn't work well, you open a ticket, and you don't know what happens.
In an open source company, that's not the case.
You can actually control the destiny by getting some people to work on it and just changing exactly, and it will go through.
Those parts also we do. So, we do have all of these. We're still small, but we have all of these teams.
Grant: Sure. And I'm guessing fundraising just--
We know folks were pretty excited because of your background from Facebook and Nutanix, so it was pretty easy to get those first rounds done.
Karthik: Yeah, the first round was really easy.
The second round, it was actually very exciting to see our customers slowly turn into our champions.
So, that's starting to happen.
And now, we have a lot more vocal champions and a lot more people that really love both us as a company, our product, our support and the way we interact with them through the community.
So, it's trending the right way. All our community numbers actually are like about a 10x over the last year.
Grant: Oh, wow.
Karthik: Yeah. We opened our Slack channel less than a year ago, because it takes a while to build a database.
Grant: Yeah, sure.
Karthik: That's what it is, right?
So, after that, we said, "Hey, let's really focus on building a community around it."
I think we're almost hitting 800 people now.
Grant: Do you have any thoughts around-- You worked with, obviously, the Apache Foundation.
In the cloud world, a lot of things were happening, the CNCF.
Karthik: We've worked closely with them too. Yes, we're well aware of them.
Grant: But the database isn't owned by foundation, it's owned by your company, which is a model at HashiCorp and many others that Replicated has taken.
Have you considered donating to a foundation or going that direction?
Or how have you thought about that?
Karthik: We had thought about it, but I think there's pros and cons.
I'll speak to the cons, because that's why we haven't done it yet.
Grant: Sure. Yeah.
Karthik: So, firstly, a foundation requires a certain operating procedure to be established on day one, and the stage at which we are and the speed at which we are iterating, and the speed at which customers give us feedback, and we satisfy their feedback in the open source.
It's all real time. So, putting a process in place to get what we feel would slow it down dramatically in the early years, specifically.
So, the thought is, right now, putting it in a foundation would mean we would all get busy with figuring out how to put it in a foundation and how--
I mean, we're still functioning as a fully open company, but there's still a procedure to how you want to plan your roadmap, and you have to have a community call.
And so, all of these things slowly will start adding.
It will definitely add structure, but it will also add overhead.
At this stage, we're still at an early enough stage where we need to iterate really fast.
So, that's one of the big things.
The second thing is actually looking at what happens if we go into the foundation, what we give up, what we don't and tying it to the business model, which also has to evolve, and we have to understand our business model.
I mean, we know our directional business model, but we had to really understand the details of it of how it would work and what we can and cannot do. It's just raw work.
Grant: Yeah, it's an interesting consideration.
What do you want to put in the foundation? What do you not?
I don't think there's a clear answer yet, particularly, as everything continues to kind of grow.
It's a really kind of intriguing world to think about having an open source company, potentially even a foundation at some point, maybe not.
I mean, HashiCorp has done it really well without--.
Karthik: Really well. Really well. Actually, Elastic and HashiCorp were our inspiration.
So, they're not a part of a foundation.
They're phenomenally useful and successful open source projects.
They've truly done good. So, I think that's been part of our motivation too.
Grant: When you think about sort of the future as it continue to build out as much as you can on more wire compatibility, is that part of it?
Do you want to have more than just sort of Cassandra and Postgres?
Is there other APIs you want to be able to work with?
Karthik: Yeah. I think as a longer term vision, yes, I think that does come up.
Because if you think about just building an app on one database, you need all of these APIs.
But in the shorter term-- That's definitely there.
So, people do ask for things like search, for example.
Because I want to put my data in, I don't really want to figure out how--
Just deal with search for me, please.
If you can do it transactionally across my data and my search index, even better, because many times, if you put the search index outside and people go search for it, and you find the record only to go to your database to figure out, it's either modified or deleted.
It leads to confusion. That's still a ways away for us.
Right now, we feel we still have to earn our place as almost like right next to Postgres and MySQL as a default.
People have to be able to get to a point where they realize that Yugabyte offers everything that a Postgres does, if not more, and it works just as well, and it has really good performance.
But if you go deploy it in the cloud, if you do on Kubernetes or what have you, it actually can be fault tolerant, scalable, and geo distributed.
So, those are the three aspects. I mean, it seems like a simple thing, but we need to get everybody to realize this.
And then we can go about doing a lot more.
Grant: Okay, great. I mean, that sounds like a lot of sort of continuously doing community building and focusing on ...
Karthik: And even core features. There's still core features.
One thing I've learned is Postgres has an incredible amount of features. I didn't know.
I mean, I learned SQL and U-SQL, but I didn't know Postgres had that many features.
Grant: Oh, funny.
Karthik: So, it's unbelievable amount of features. You could do crazy level stuff with Postgres.
Grant: That's funny.
Karthik: It's almost like a database that you can extend to do stuff.
For example, some of the eye opening requests are--
Postgres has this thing called extensions, where you can write third-party plugins, code kind of thing.
There is an extension where you could do your language push down operations in JavaScript.
They have a VS8 extension that does JavaScript runtime, so you could actually write your procedures in JavaScript and push it into Postgres.
And when you insert, it will process it this way.
I'm like, "Oh my god, okay. I didn't even know you could do these things."
Grant: Yeah. It's probably some of those things that had you known when you started a company might be like, "This is going to be harder."
We all kind of come into--
Karthik: SQL, I mean, yeah, it's tough.
I know SQL is tough, but there's also the ecosystem around, which makes it interesting in the distributed world.
Like for example, we just recently announced our Change Data Capture where you can take the changes out of the database and do stuff with it.
It's in beta. But the way Postgres does, it wouldn't work for Yugabyte, because there's not one node, there's multiple nodes.
So, you need to figure out how to do this thing across nodes.
So, it's still interesting challenges all over. This is all hardcore tech, right?
Grant: Yeah, it's cool. Well, Karthik, thanks so much.
I really appreciate all your time. It's really interesting learning.
This is a super fascinating space.
And I think it's evolving in such a unique way, the whole cloud native space, everything is happening.
I think we'll see this as a really important area over the coming years in order for everything to become truly automated and reliable.
So, thank you for all your work.
Karthik: Thank you for having me, Grant. Great insightful questions.
I really enjoyed my time here. Thank you.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
Jamstack Radio Ep. #144, Financial Accounting Databases with Joran Dirk Greef of TigerBeetle
In episode 144 of Jamstack Radio, Brian speaks with Joran Dirk Greef of TigerBeetle. This conversation explores financial...
Jamstack Radio Ep. #136, Serverless Postgres with Nikita Shamgunov of Neon
In episode 136 of Jamstack Radio, Brian speaks with Nikita Shamgunov of Neon. This conversation explores how Serverless Postgres...
O11ycast Ep. #63, Observability in the Database with Lukas Fittl of pganalyze
In episode 63 of o11ycast, Charity and Jess speak with Lukas Fittl of pganalyze about database observability. This talk explores...