Library Podcasts

Ep. #87, Multiple Databases with Tanmai Gopal of Hasura

Guests: Tanmai Gopal

In episode 87 of JAMstack Radio, Brian speaks with Tanmai Gopal of Hasura. They explore the variables that come with running multiple databases including new security risks, authorization considerations, and problems around caching.


About the Guests

Tanmai Gopal is the Co-Founder and CEO of Hasura, an open-source engine that gives developers instant realtime GraphQL APIs and event-sourcing on Postgres. He has been active in the cloud-native, postgres & GraphQL community, is a frequent conference speaker, and is the author of 3factor, an architectural pattern for building scalable and resilient backends with GraphQL, event-sourcing and serverless.

Show Notes

Transcript

00:00:00
00:00:00

Brian Douglas: Welcome to the installment of JAMstack Radio.

On the line we've got Tanmai Gopal.

Tanmai, welcome. How are you doing?

Tanmai Gopal: Hey. Hey Brian. Hey everybody. How are you?

Brian: Excellent. And you are co-founder, CEO of Hasura.

So do you want to tell us why you're here and what you're up to at Hasura?

Tanmai: Awesome. Yeah. I'm one of the co-founder at Hasura.

I started Hasura just about three years ago now. I'm the CEO and head of product as well.

We started Hasura to solve what we saw as one of the critical problems in making application development faster, which is a data access problem.

And we leverage a lot of GraphQL for doing that.

And so if you think about what Hasura is, Hasura is a solution that gives you an instant GraphQL API across your data sources and across pieces of logic that you have.

And so Hasura helps you convert that data and logic that you have in your domain to a GraphQL API that then developers can use, internal, external developers can use to build stuff with, which is the purpose of all of the work that we are doing.

So that's where Hasura is.

Brian: Yeah, excellent.

And so you were on a previous podcast, actually episode 35 in probably 2018, couple years ago.

We talked a lot about the GraphQL stuff, but you've done a lot since 2018.

So I'd love to know what's new in Hasura since then?

Tanmai: Yeah, a tremendous amount of work underneath.

What we've been doing, I think mainly is gearing up for adding support for multiple databases.

Back when we chatted last, we used to have support for Postgres as a primary data store.

Now we've added support for SQL Server, MySQL, Bitquery, distributed versions of Postgres so that you can do geo-distributed workloads or charted workloads like site hyper scale you provide, Cockroach is coming in soon as well.

So that's been the main piece of work underneath, which is to make sure that we can handle multiple kinds of databases.

And the other piece of work has been also launching our cloud solution, which we've creatively called Hasura Cloud.

And so Hasura Cloud is a managed service offering which adds certain things that are unique to being able to manage Hasura as a service for our users, so for example, things like caching, auto scaling, incident monitoring because Hasura is connecting to a lot of critical systems.

So those have been the two main areas of work.

Something we launched just recently in our annual conference just a few weeks at Hasura Con is the ability to do cross database joins, which is just amazing.

It's an exciting problem, of course, from a technical point of view that you can join data in two different databases.

Postgres-Postgres, or Postgres-SQL Server whatever combination you have, you can join across them and fetch data simultaneously.

And that's, I think, been an amazing piece of work especially from a performance point of view, where you're able to fetch data from these multiple sources because it's a hard problem to solve, and especially if you think about that federation approach, you think about, you can have one API that talks to two APIs that themselves talk to a database each.

And then the Hasura approach is a little bit different, which says we'll federate on the databases directly.

So you have one API that's talking to two databases simultaneously.

And apart from the deck problems, which are very interesting to solve there, because they're fundamentally different kind databases, how do you do perform joins?

The very interesting thing is that, especially since the last time we chatted and we've been seeing this becoming more apparent is the future is not going to be a single database, even for a single application.

The future for data workloads is evolving rapidly, and what's happening is that people really like having different sources of their data in logic, which are optimized for certain kinds of workloads.

Let's say you're building some a messaging or chat functionality or application.

You have parts of your application that would come from a transactional store.

But when you think about the messages or events that you're storing, that needs to go into a different database, because the kinds of queries, the volume of data, the amount of ingestion that you need to have is different.

So it's the same application and it's semantically or conceptually the same API, but you do need two different data workloads to scale that and optimize your workload.

And this is just increasing, because the database industry is also just doing amazing work and there's all kinds of stuff happening.

But even from a JAMstack point of view, when we think about it, you have this CMS ecosystem, and you have like this transaction data, you have time series data, you have so many different places that you're getting information from that you need to be able to join in a sense to make sense of it as a unified, semantic API or data graph or whatever you want to call it, that I think that's been really, really interesting to both see that change in the ecosystem, and also work and build the technology that makes those kinds of things possible.

Brian: Yeah, that's awesome.

And that's interesting that you mention the future is multiple databases and not just one, because I know, I've been at companies where I built on the monolith, then we had the one database that talked to all the different flavors of the application.

And if you make a mistake and you got to roll back, you're making mistakes across multiple instances.

So if a user is touching this data and touching that data, and you corrupted the user, now you've got corruption through the entire through and through.

But I find it really intriguing though, because having a separate database for a different use case or a different flavor of the product is intriguing because one, you can sandbox, you can iterate without worrying about taking down the entire system.

But two, it really leans into what the JAMstack, its mantra and being able to--

If you don't want to have everything deeply coupled together, you can also decouple your database from what you're working on.

Tanmai: Exactly, exactly right.

And the what you're working on is here is almost like the what data you're working on you.

One of the reasons why people really like JAMstack is that they're just able to conceptualize these layers that they'll have in their application, and they're able to choose for each layer, the best of breed software or technology or stack that they want to use.

Like, I want to use this particular stack because this is my application.

It's a static application that I can update periodically, but it's largely static or it's mostly dynamic, or it's catering heavily to mobile workloads and it needs to be performative, lightweight.

And we see this massive fragmentation in the JavaScript framework ecosystem, because all of these frameworks are amazing because they're all geared towards slightly different sweet spots.

And because you've decoupled your API layer, you're able to focus on choosing the best framework that you want for your users that will give you the best experience.

Very similarly to that, when we think about the backend portions of it, now you have the API and the data layer, and you want to make the same choice for your data layer. You want to choose the best place to keep certain parts of your data, whether it's images or video or your fast moving events, transactional things, analytical things, the data is just exploding.

And the kinds of solutions that you have are also exploding, very similar to what's happening in the JavaScript framework ecosystem.

So it's really interesting to see that and help be the middle layer that allows the best choices to happen on either ends of the spectrum, the data layer and the JavaScript layer, but then provide that sanity by having a unified API in the middle.

Brian: Yeah. And what's interesting about having those choices too, as well that you mentioned, not only were you originally working with Postgres and making that really seamless and interactions, but also you added MySQL as well as Bitquery.

I'm curious now, have you seen the adoption curve pickup now that you have options for other folks and what their decisions and choices are?

Tanmai: Yep. Yep. Yep. So we've announced reviews for some of the new databases that we've been working on.

The feature set is not at parity with where Postgres is, but even with the initial feature sets that we have, which is say read-only on these new databases that we're supporting before we start adding rights and events and stuff like that, that adoption has been phenomenal, especially for production workloads.

It's just been amazing. Bitquery was a very interesting database for us to add because it's our first analytical database.

We were thinking about transactional databases so far, now we're thinking about analytical databases where you're just storing a order of magnitude, two orders of magnitude, larger amounts of data.

And it was very interesting for us to see that because typically when you're thinking about, let's do JAMstack, the kinds of databases that people usually think about are Postgres and MySQL and Mongo, that's roughly where the minds of folks is at.

And it was just very interesting to see that pick up with in the OLab space because there's just so much data that people want to analyze and use and build dashboards off.

Data is increasingly just becoming a product, and that data is often in these analytical stores.

And so making that accessible securely, performatively, safely, is really valuable.

So it's just been tremendous to see that over the last few months.

Brian: Yeah. Yeah. And even the last couple years.

You mentioned Cockroach as well as another flavor of databases, very familiar it to a lot of folks, but slightly different in the approach and how you interact with it.

I've seen a lot more database startups start with their flavor coming from open source projects, actually.

It's really where this has all been centralized from.

And I'm liking the pattern of adopting these other solutions because if I know how to use Hasura but I don't want to sort of build my own interactions and databases and being able to have my own SQL pancake, I love the fact that I can leverage these tools and then it's familiar for the entire team across the board.

Tanmai: Exactly.

Brian: So I like where you're going and I totally get it as well.

Some other things that's come up when it comes to some of these solutions are things like caching and scaling and security.

Do you want to speak on some of those updates and changes?

Tanmai: Yeah, of course.

I think caching's been one of the most exciting things we've been working on.

It's not an easy problem. They say there's two hard problem in computer science, caching, naming things, and then offset by one problems.

So I think when we think about an application, even a JAMstack application, when we think about API in the middle, it's a means to an end.

It's this necessary thing that means to exist so that people can build applications.

But at the core of it, what you really want is access to that domain's data and logic.

So when people build these APIs, and especially as applications start scaling, and scaling can happen in many different kinds of ways. It doesn't have to mean just "web scale."

But whether it's scaling by users, by functionality, by just workload.

One of the problems that is a huge time sink for people to solve for is the caching problem, because it requires just a tremendous amount of, in our opinion, unnecessary expertise to build, because what happens is that, especially for dynamic data, caching for static data and what CDNs have done over the last decade or two is amazing.

And how that ecosystem itself is evolving is amazing, but it works largely for static content or for static data.

Caching for dynamic data or transactional data is a hard problem.

And it's a painful problem.

If you think about it from the point of view of a developer, they just want it to work.

They don't care, they just want it to work.

And Hasura is at that right layer where we understand the kind of APIs, the data access, how data's laid out, what the different security rules for different pieces of data are, like you're accessing an event, but you can only access it if you have a ticket to have bought the event.

You're accessing a document, but you can only access the document if it's public or if it's in your organization or whatever. There's rules.

Let's say for example, on this event platform, you have 100,000 users that just suddenly came on to access this event.

Because of their ticket, they need access to the event.

So you need this caching information to cache this dynamic data that describes the event, but this caching information, this caching policy needs to be aware of how people are accessing that data and who has access to this.

And this is just painful to set up because you have to build it. You have to write it by hand.

Operationally, you have to think about scaling it, deploying caches, cache invalidation.

There's a whole set of things there.

And we've made really good, interesting foundational work on automating that piece entirely, so that as a developer, you don't have to think about or worry about how stuff gets cached or what is cached and worrying about caching at all.

You have an API, if you want it to be cached, it just gets cached.

And even if it's dynamic data, even if it's private data, it just works.

And I think because of solving that caching piece, this now becomes a really important part of making this whole API process self-serve.

Because the reason why these APIs are hard to maintain, deploy, manage, operate, and there's a whole cycle of people talking to each other to deploy an API and scale an API, one of those big problems there is just thinking about how will we do caching?

And if you automate that, then developers are just free to build stuff again.

They're like, "Hey, here's my data. Here's what I want it to be. Here's how it's laid out. Here's the access control rules for it. And here's me, here's my application. And I'm building this application."

And the API just works, and you don't have to care whether you have 10 users on your application, you have a million users on your application, whether it's dynamic data or static data.

So I think that's really, really powerful for our users.

Brian: That is awesome too, as well.

And having multiple databases, in my thought, multiple points of failure, multiple opportunities for security risk, how are we combating against things like that?

Tanmai: That's a really good question, because again, when we think about security and again at this layer, which is just so critical because all your data has to be accessed through it.

There's a variety of different problems around security.

So for example, the first aspect of security is just application authorization kind of security, which is what users get access to what data?

This is a part of your authorization business logic.

Like I was talking about, you get access to an event, if certain conditions are met, you get to read an article.

This is your fifth article. And you're a user that's not logged in.

And your sixth article, you can't read.

That's one part of the security piece, which is something that we've had as a part of the Hasura product, this authorization policy engine, which handles that.

But then especially, and to your point, as you start thinking about multiple data sources, it's not just the author logic that we want to make sure we've done well, but you start thinking about other aspects of API security.

It becomes easier, especially with GraphQL to do a deliberate or an accidental denial of service, which is what if you ran this query to fetch data where it's fetching it from multiple sources, one of the sources is slow, or the nature of the query just makes it so complex that the underlying database just freaks out.

If it can't handle that load, it just crumbles and dies.

That's that other aspect to API security, which is not linked to your application's logic, but it's security that you have to worry about, especially if you start moving towards product and you start scaling.

And then we've done some amazing work, again, getting inspiration from the database ecosystem, but really bringing that to the API layer.

One of the most exciting pieces of work there-- So there's typical stuff, there's rate limiting and allow listing and things like that.

But I think one of the most exciting things there has been what we call operation timeouts.

So it's like having a cost wallet, so that your API consumers, depending on their session, their role, who the API consumer is, it's a logged in user, not logged in user, enterprise, tenant, whoever, that particular session has almost a quota of compute and time that they can run on an API. So you tell them that, "You can do whatever you want. You can access whatever API you want, but you have a quota of computing time that you spend and you can't cross that."

What Hasura does and what we've built out there is this ability to say that we measure that quota.

And if that quota's crossed, we're able to kill execution, both in the API layer and in the data layer.

And this is really nice because normally when people think about this, I don't want to have an API call that runs longer than a second.

So the way that you think about it is that your CDN, your API gateway or your API server will just time out, and will just say, "Well, we're killing the connection because we've crossed a second or we've crossed 30 seconds."

So this API call can't run.

The problem with that is that the poor database is still processing this query because nobody told the database that you have to stop processing this long running query.

And it ran by mistake.

Please stop running it because if the database keeps running it, it's going to just kill all of the other consumers that need to access data.

And it's a massive quality of service risk.

And so what Hasura can do is maintain that quota on that API execution time, not just for users and handle that inside the API layer, but also pass the right signals to those upstream data sources and databases and tell them to kill their execution as well.

So if you set a 500 millisecond per request quota, then if it crosses 500 milliseconds, Hasura will kill that execution everywhere.

And this now becomes another massively important building block in thinking about making those APIs self-serve.

Now you can really start thinking about it as saying, "I want to have this amazing JAMstack application. I want to have my developers use the best stack possible, and just go forth and build stuff. And I want to provide all of the infrastructure that makes this completely self-serve."

One part of that problem was, can we let the front end development team build an application, deploy an application and own that entire life cycle without affecting the backend?

And that was amazing with JAMstack, because now you're not coupling that API layer to the UI layer.

You're able to own that independently.

But another part of building this application is to say, "Can I make my API self-serve so that no matter what the developers do, they're running a small experiment, they're iterating quickly, whatever they do, my infrastructure, my critical structure, my critical data is protected and will keep working."

So now you can really just go and tell your developers, "Go do whatever, that's fine. Nothing bad will happen."

That's just amazing power. It's just the logical next step for people building JAMstack applications. Am I making sense?

Brian: Yeah. No, that makes a lot of sense too, as well.

I'm just having flashbacks of my earlier--

While I was doing full-time engineering at Netlify doing JAMstack integrations, having the confidence of saying, "I'm going to make this button work in the way it's going to work, but I'm going to talk to this endpoint and I'm not going to have to worry about all this stuff that happens at the backend that's owned by the infrastructure team, by the backend team."

I can just move on with my life, but also prototype different features in sandbox them and half behind feature flags without having data bleeding into production and people now sniffing it out and saying, "Oh, it looks like this feature is going to be shipped, because when I looked at the database call, randomly, I saw this little endpoint or whatever."

Now we were able to protect against that too as well.

Tanmai: And this is the reason why it's always easy to start building a new application and then as you're iterating on stuff, things just start slowing down, and especially for applications that become more and more mission critical.

It's like when we think about enterprise and we're like, "Oh, enterprise is so slow."

There's a reason they're slow because they have to be slow.

There's so many challenges and there's so much operational risk to being able to iterate quickly, that it's just hard for them to say iterate quickly, because if they make one mistake, it's either downtime or a data security risk for millions of customers, for super critical information and you don't want that to happen.

So this becomes a really important building block for them to say, "Risk free experiments, go do whatever you need to do. Experiment, sandbox, prototype, prototype introduction, and nothing bad will happen," which is awesome.

Brian: Yeah.

And another thing that's awesome is the evolution of the JAMstack as you've touched on.

So I wanted to actually pick your brain as we're winding down the converse of where do you think the JAMstack is heading?

Because we're now we're seeing a lot of maturity in the space.

We're seeing companies announce VC rounds that are associated with the JAMstack.

We see a lot of companies too, as well that are now pointing fingers at the JAMstack and combating against it and why it's not working and why their solution is better.

Where do you see the JAMstack going in the next five to 10 years?

Tanmai: Yeah. That's a very good question.

I have some interesting, maybe occasionally counter-intuitive thoughts, but I think the JAMstack concept and what it envisions for team productivity and the nature of the application, I think that'll stay, and I think that's probably the way applications are going to built this entire decade, at least.

And the reason why that's happening and why that will be the case and other approaches will lose out is because fundamentally, the focus that JAMstack has is a focus on team productivity.

And it solves not even a technical challenge, but really a process and a mindset challenge that says, "Here's a team. How do we make it independent?"

It's the whole microservices versus monolith conversation.

And it's how, when the microservices ecosystems are taking off over the last more than 15 years than 15, 20 years, has really been accelerating nonstop.

And you just see so much innovation in the microservices ecosystem.

There's so much tooling there, so many event there, so much cloud vendor stuff there.

A lot of engineers look at microservices and are like, "Nah, nah, it's just the wrong way to do it. You've over microserviced it. There should have been a monolith, let's return to the monolith."

And everybody keeps saying return to the monolith, but it's just impossible to do for most people at scale.

Very few companies can pull that off. It's just very hard to do because the process and team challenge around the monolith, that's really the problem.

It's really hard to hold somebody accountable from business point of view and product point of view to say, "I want it to build quickly, I want to ship quickly and I don't want risk. I don't want operational risk. And I want security risk. I want that to happen."

That becomes hard with the monolith, because the project is really large and like you said, I might have a data workload that doesn't work.

And one mistake on this part of the database bleeds into another part of the application or for other features that are being worked on, and that really hurts productivity.

It's that similar evolution, it's that similar analog, but focused on the front end ecosystem, which says, "Here we have a front end team, let's just make their entire life cycle independent and let's get them to be productive."

And so I think that works really well.

Now, when we see these more full stack frameworks or return to rails, or a lot of people who talk about why isn't there rails for JavaScript?

There should be rails for JavaScript. We need to have rails for JavaScript.

And we want to return back to a full stack world where the server is rendering, templating everything, and then sending that to the front end.

I think the evolution of the JAMstack to move applications towards the edge where the edge is a combination of a little bit of what's happening on the cloud, "server side" and the application device itself, I think that's the future of the JAMstack, but I don't think we'll make a return to full stack development because that productivity is really geared towards individual productivity, not team productivity because as applications become more modern and richer, the team is actually the front-end team.

The front-end team needs to work really quickly. They need to iterate fast.

They're listening to business requests 24/7. They're adding value to users.

So that's the most important team that you need to empower.

And for that setup, this JAMstack setup works better compared to other approaches.

But that said, I think the JAMstack ecosystem is also evolving to move a little more towards this edge style, rather than just be entirely on the application device itself.

So I think that's the evolution that's happening, but I think the full stack thing and the rails for JavaScript or the rails for JavaScript doesn't exist because the ecosystem has moved on.

Maybe we just have a different number of developers today and a different kind of application stack that we need to make productive.

And that's why we don't have rails for JavaScript. It's not a technology problem.

It's not like it's impossible to build the rails for JavaScript. It's not an ecosystem problem.

The non JS ecosystem is arguably larger, much larger than the rails ecosystem.

It's not that. It's just that the ecosystem has moved on.

Needs have moved on. Teams have moved on. So that's the way I think about it.

Brian: Yeah. The ending point about teams have moved on, it's actually completely like yeah, of course.

That makes a lot of sense.

Because I think when we had rails or we had Django and everything else, all the full stack monolithic frameworks really take off, we were all building very similar applications.

It was essentially a Reddit clone or Twitter clone. And that's what we were learning on and our To Do app.

And I think we don't need full, end to end login functionality.

There are now these one off tools that we are leveraging for this situation are these plugins that we're leveraging, built to JavaScript or an Electron that don't need the Twitter treatment or the authentication through--

And I say authentication because I did the rails guides and that's how I learned how to code is going through that, you build a Twitter clone.

And it shows you how to do blogs and up vote and stuff like that, and like, and reply.

It's an amazing way to learn how to build an app.

But I don't know-- Like if you look at TikTok, would I really need that same sort of understanding for the idea of TikTok and how the algorithm works?

And though TikTok does have some flavors and remembrance from Twitter, it's a different playground.

It's a different place that people are trying to leverage and everybody's trying to innovate and move forward.

And as you're trying to innovate and move forward, you do want to have what we were talking about before, being able to have your own database for this one innovation, but not mess up the existing thing that's paying the bills and creating funding.

So yeah, I'm right, right there with you. I honestly really enjoyed this conversation.

Unfortunately we got to wind this down, but the best part about this is we get them to transition into picks.

So these are JAM picks, things that we're jamming on, could be music, food, technology related.

And why don't I go first? Because I've got to pick I want to talk about.

Speaking of decentralization and stuff like that, I have a newsletter subscribe.bdougielive, and in that newsletter I've always added content.

I've created videos, blog posts, et cetera.

And I've noticed that my list is long. I just do a lot of stuff.

So whether it's a talk or this podcast gets published, I throw it into my newsletter.

And what's been challenging is trying to figure out, I don't need to put everything in there, but I need to know what to put in there.

So instead, I've started using this tool called Polywork.

I think it's still in beta as of the recording of this podcast, and it gives me that same sort of thing, being able to list the stuff I on things I want to talk about, but not necessarily, I don't want to tweet every single time I write a blog post or tweet every time a video has uploaded to YouTube, because I feel like on Twitter I've already built an audience and a brand over there and it doesn't necessarily need to be me promoting the things I've created.

So I've always needed a place to talk about what I've created and Polywork has become that place now.

It was a thing that I wanted to actually embed into my own blog, in my own website and I just never got around to it.

They've already jumped the gun, gave me a place where I can just upload a post, talk about the behind the scenes on what my approach was and stuff like that.

And it's become a great place, so much that I think more and more people are starting to figure out it exists.

So I'm really intrigued. As I talk about, we don't need another Twitter, Polywork looks very familiar to Twitter, but different.

I'm looking forward to seeing what features they ship.

I think what it really is, is probably going to be a replacement for LinkedIn for me in the future where I can connect other dev role folks, other engineers, and follow closely to some other side projects as opposed the piece that they had last night and stuff like that.

So I'm definitely looking forward to seeing the evolution of that product. And if anybody needs an invite, just hit me up.

I think I have a couple invites left or if not, you can probably find an invite for someone else that you follow on Twitter as well.

Tanmai: That's awesome.

I'm going to check out Polywork, looks amazing. So super interesting.

My picks on a bunch of the stuff that we were talking about and some of the stuff that we're doing, of course do follow me on Twitter @tanmigo or Hasura @hasurahq.

So there's a great number of deep dives that we're doing into some of the topics that we were talking about today.

Otherwise, my picks on the reading front, I've been lately reading a lot of interesting science fiction fantasy by an offer called N.K. Jemisin.

She's just amazing, amazing work, very different in the characters, the peoples, the settings to what we saw in books that we grew up with.

So it's really interesting.

And then on the flip side, I've been rereading a lot of P.G. Wodehouse, which is very dated now, but still super funny.

So that's been what I've been up to.

Brian: Excellent. It's always nice to get a read in, especially during the summertime where I've been lucky enough that GitHub is now-- we're taking off Fridays for the summer.

So I honestly forgot that Friday was off this week.

I was already prepared what I was going to be working on this week, and then realized I only four days to get everything accomplished.

But I'm looking forward to that because stuff slows down a little bit as people take vacations and take their weeks off.

So having the Friday off makes it so much easier for me to plan around some of my leisure time.

So I've got a couple books in the hopper on my Audible.

Actually random extra pick, I've actually taken up gardening very recently.

So I've recently moved into a new place and we inherited a lemon tree, a apricot tree and a plum tree.

Tanmai: That's awesome.

Brian: We happened to move in the week the apricots were blooming, and then we've spent there a couple weeks to actually get a harvest.

And I honestly never ate any apricots to be quite honest. And now I'm a fan.

So I've made jam, we definitely froze quite a few, and going to be making some other stuff as well in the future.

So I'm just learning all about apricots at the moment until the plums start coming in.

Tanmai: That's super fun.

Brian: Excellent. Well, this was a super fun conversation.

Tanmai, thank you so much for catching us up with Hasura.

Folks, keep spreading the JAM.