Ep. #85, Data Replication at Speed with Jaxon Repp of HarperDB
about the episode
about the guests
Brian Douglas: Welcome to another installment of JAMstack Radio, on the line we've got Jaxon Repp from HarperDB.
What's up, Jaxon?
Jaxon Repp: Not much, how are you doing?
Brian: I am doing fabulous.
Like you mentioned off air that you got seventy degrees weather in Denver.
I've got 73 degrees out here on the West Coast in Oakland, and I'm ready to go outside, to be quite honest.
Jaxon: We just opened our pool this past weekend. And so, immediately yesterday it rained, all day.
Jaxon: But I have high hopes for this weekend.
Brian: Yeah. Perfect timing with the rain, but yeah, it's definitely going to be, at least where we live, it's going to be a nice couple of days, for sure.
I didn't actually bring you on to talk about the weather.
It's nice that we can talk about that, but I want to talk about HarperDB.
Say you're VP of product at Harper, I guess, would you like to do your job and explain what Harper is?
Jaxon: Well, HarperDB is a distributed database, meaning multiple nodes that can hold independent sets of data distributed across your cluster, what we call a cluster.
So that you can move data into specific regions or into specific zones of access where specific users that might want to primarily read or write into any given table and move that data around on what we referred to as a comprehensive data plane.
Brian: Okay, excellent. And that we just skipped right over and introducing yourself.
But do you want to tell us how you sort of got in the data world and working on product at HarperDB?
I was working for an IoT platform where we were doing lots of reasons and rights of sensor data, very high-speed data.
And I needed a database that had sort of atomic transactions.
Traditionally drivers that would connect with a database, you could forget to close your connection.
Our platform wasn't necessarily as elegant as it needed to be to do that.
And I found the HTTP API of HarperDB super useful.
I found an error in their documentation and I wrote to them, it just, "Hello, HarperDB", and mentioned it.
And then, I started up dialogue with them and eventually it turned into a consulting contract and eventually a full-time job.
Brian: Yeah. Excellent. Yeah.
This is where the serendipitous stories of folks getting their next job, their new jobs.
I had a very similar experience at my former employer at Netlify.
I was a customer when it was less than, at that point I think they had about 8,000 customers.
So it wasn't small, but they were around for a couple of years and that has happened with all of my work.
And then, I did that and I run this podcast and I'm really interested in the idea of atomic databases too, as well, if I got that correctly.
But I wanted to touch back real quick on what you just mentioned about database planes.
Can you explain more about like, is that a scaling technique for Harper?
Jaxon: We now consider that most systems ultimately exist to either record or read data.
However, they might in a distributed system say an IoT system or a massive global network, they might have different responsibilities across the various nodes in a system.
So, there might be very right heavy nodes in certain regions or say next to a sensor with very few reads because the data is just going in.
However, up in a cloud node, where you're attempting to monitor all of your sensors, I have your read load, but very few rights, say administrative thresholds might be loaded in there.
Traditionally, this is transactional versus analytical databases.
And we style ourselves as a hybrid between those two.
We have exceptionally good read and write performance and unmatched flexibility in terms of how you deploy these notes across your network.
So the data plane really is a comprehensive solution for all of your data needs with one single installable application.
Brian: Yeah. So, who's the ideal end-user for leveraging and grabbing Harper?
You mentioned IoT and reading sensor is quite a bit, but do you have like use cases of companies that are leveraging Harper as well?
Jaxon: We do.
It started out primarily as a developer focused product where the HTTP API makes it very, very easy because everybody's already making HTTP calls inside their applications anyway.
So, rather than adding a driver or opening a connection, executing some transactions and then forgetting to close your connection and ultimately overloading your database server, the atomic transactions, where it's just, as soon as the HTTP request is done, your transaction is closed.
That connection is closed and you can count ultimately on that being recorded as soon as that transaction is over, as soon as that HTTP call is over.
Ultimately, what we're finding is that the distributed nature of the product is extremely appealing to large enterprise who have massive user bases, where they need to have ACID compliance. They need to record that data. They need to know that it's written and ultimately they don't want to move all of their data, full replica sets around the globe.
They may be just one partial replica sets.
So, I'm only recording data in South America, from South American users.
There might be an aggregate of that table or some distillation of that table that I ship up to the cloud, but I don't tend to ship all of my data everywhere.
So it's a much more efficient use of space, resources, and ultimately delivers functionality to what we refer to as the edge.
Brian: Yeah. And when you talk about localized data storage, which I don't know if that's a proper term, feel free to correct me.
I think of things like the things that are happening with GDPR and how you need to keep record of what edge networks and what the data centers.
I know there's a lot of questions and concerns on where you start a data and how do you sort of make an equitable and also transparent for end users.
Is that an opted in, or is that given to you for free with the something that Harper, if you're going to be storing on the edge?
Jaxon: Well, that's more of a data storage policy.
Brian: Got it.
Jaxon: We're more of an efficiency of storage.
So, we allow you to record anything you want, obviously, into the database.
But ultimately, you don't need to record stuff that's never going to make it into the executive dashboard, which is driven by the cloud node.
If what you're really looking for is to look at high-frequency data and look for anomalies or run machine learning algorithms against it, and ultimately what you're really going to ship upstream is a distillation of that, the result of analysis of that.
So, really what you do is you record all you want, and depending on what you're using that data for, you can either push it up to the cloud or an edge node, which might need thresholds to analyze, can pull and subscribe to the thresholds, which are administered in the cloud, but then pulled down to those edge notes.
Brian: Okay, excellent. So would you--
I talked about the end users and use cases too, as well, but as far as a developer, are you primarily targeting to DBS or akin developers on a smaller project maybe they've not quite enterprise, would they approach leveraging this in their use cases?
Jaxon: That's a really good question.
And one of the key aspects of our hard data model is that everything is indexed by default.
So, the way we store data with an underlying key value store and lots of different relations in between those keys and values allows us to remove from your area of responsibility, building indexes, if you've ever had a database with a billion records in it, and you decided to make a clear and slightly faster by adding a new index, it's sort of like, okay, let's let this run over the weekend and we hope it's done and doesn't affect your hour prime time performance.
Ultimately, we decided that to keep things easier for developers and to sort of collapse the stack in terms of functionality, we'd build a database that was already indexed, no matter what you were going to run, and that had an HTTP API on the front of it, so that you didn't have to worry about third party drivers or massive libraries to get stuff in and out of your database.
Brian: Okay, excellent.
And going through the documentation, I saw that there was a term that I was curious to dig into, which is SQL and JSON, I guess, how would you consume the HarperDB data?
Jaxon: So, you would leverage a simple fetch call or HTTP call, any requests basically against our API.
And one of our operations is titled SQL and the body of that SQL statement can be just any normal SQL statement.
Despite the fact that we allow you to store JSON documents as records.
So, deeply nested documents akin to MongoDB.
Jaxon: One of the things you run into inevitably when you're looking at MongoDB is once you need to do something with the data that you've stored, because it's very good at storing it, it's super flexible, you don't have to find a schema.
Once you try to join two tables, you end up having to do that in code because it's not natively, right?
So you pull it out and then you'd join it with another query.
And it takes much longer than it needs to.
So, we allow you to use SQL against JSON format of data.
Now, you might just be putting in individual rows with JSON and it would work just like a relational database.
However, we also allow you to query on deeply nested fields and join on those as well.
So, SQL on JSON is ultimately, let's call it a signal that we do the language you're comfortable with, with the data flexibility that makes no SQL databases so, so incredibly appealing.
Brian: I like that too, as well.
I understand some limitations when you have like giant JSON blobs and documents.
And I know the appeal of using something like SQL to be able to say, "Oh, I know how to basically get this data, relational databases, how to get their solution to that data."
So it sounds like, is this built into the SDK or is this built into the entire underlying technology of Harper?
Jaxon: This is built into the entire underlying technology property.
We use a library called AlaSQL.
We've done a podcast with those guys and talked about how their library is exceptionally good at taking ultimately the standard SQL that we're all familiar with and parsing that out into a format that's much more reminiscent of our effectively our NoSQL query engine.
So we just break down SQL queries into something that Harper DB's API understands natively.
And then we pull that data back.
Brian: Okay. That's fascinating stuff.
And I don't have a use case for like massive reads in my data, but, I know there's a lot of side projects.
I got to chat with the folks from Home Assistant and the maintainers over there.
They have lots of IoT devices and you can manage your own data.
That's literally what Home Assistant is all about, managing our data and being able to create your own cloud in your bedroom, your kitchen, whatever you want.
So, I could see the use case, but I'm curious of how can folks get started try out HarperDB.
Is this something that you need to actually start when you start your new project or can you invest in integrating into your project today?
Jaxon: You could integrate it super easily.
We have a cloud hosted SAS product with a free tier.
So, you can always go to Harperdb.io and sign up for free.
There's a button in the upper right hand corner, and you'll get started logging in and you can spin up a free server with HarperDB.
At which point you can just go through the docs, which are also located at HarperDB and figure out what kind of operations you want to use.
One of the great things about having a standard HTTP API is that while you'd want to have a server side component to that for authorization and managing access to the data, while you're prototyping your application, you can just make those HTTP calls directly from your front end code.
So, ultimately, you can skip building an API for the initial phase and then not to reveal too much, but an upcoming release of HarperDB is going to have something that we call custom functions, which ultimately allows you to build in that API code all of your custom methods with access to core HarperDB functions as well.
So what's built on top of FastAPI, much like our regular operational HTTP API.
So it's super fast, super scalable, and allows you to ultimately collapse that stack even further.
So now you're not going to have to run a separate API server either.
Brian: Yeah. Yeah. And I like that trend that I've seen.
I don't know if it's a trend, but like doing more on the edge is something that I'm enjoying, seeing a lot is stuff that I've been playing around with and being able to have small sort of manipulations of the data that it happens at the time of either writing it or reading.
It is something that I like to hear.
And I would love to sort of kick that around and test it out for sure.
I did see, you mentioned FastAPI, but also I saw things like Postman referenced in the documentation as well.
What about integrations, are able to integrate different features are even open source features inside of SQL?
I'm over Postgres person, but I don't know what all the open source stuff that's on SQL.
Jaxon: Yeah. So we have a developer marketplace, which is full of SDKs for various languages for interacting with HarperDB.
So in the way one might call an HTTP API or parse the data that comes back is different for every language.
And so, we have a bunch of SDKs for different languages. We also have ODBC and JDBC drivers.
We have an Excel driver, so we have myriad ways to access data in HarperDB.
And ultimately, we want to be as flexible as possible.
So while our core technology is not open source, we are headed in a direction where we're going to reduce HarperDB to just the core functionality of accepting data, writing it and retrieving it very, very quickly. And then, functions like our HTTP API or custom function server, or even our clustering engine.
Eventually, we moved out into their own modules and those will be open source because we know what works for clustering for the use cases that we've experienced and for the customers, for whom we're doing implementations.
But somebody might decide that they want to write a lightning fast, zero MQ clustering engine to move data between nodes.
And we're going to afford them the opportunity to just create a plugin or a module, drop it into a folder, and then they can use that.
Or they can please issue a pull request against our open source modules. That would be great.
Brian: Yeah. And I saw there's developer examples too, as well.
So, there's an example of Python machine learning.
There's also one with React Hooks and using HarperDB as well.
So, but then there's some like more, not hardcore, but I concrete examples like car performance monitoring app, that would be a really cool fun project to work on.
Actually, I do have a kind of a newer car and I'm super impressed with the Android tablets they've been shoving in all the new Fords and Nissans.
But yeah, it'd be a nice little fun project to see the about hacking that, but also not voiding your warranty, but that's a whole nother conversation.
But yeah, I was just calling out Harperdb.io/developers/developerexamples.
It looks there's a quite a few examples folks can kick around, for sure.
Jaxon: Yes, indeed.
Brian: One thing I mentioned too as well, and I know this is going to be a Twitter email, but I made a comparison of Postgres and SQL together as they were different, but they are technically the same.
Postgres being a flavor of SQL.
My question for you though, with Harper, is what's the flavor of SQL that you're leveraging?
Jaxon: Well, we are technically the best of both worlds, a traditional relational database and a NoSQL database.
So NoSQL and being the unstructured data, JSON document type MongoDB sort of store, but the ability to, because we index every field, including in those JSON documents, ultimately we can function as performance as most traditional relational database platform.
Brian: Yeah. Yeah. And you mentioned that early too.
I completely forgot, but that's pretty novel too, as well.
Having the best of both worlds.
I've never actually heard of folks combining the two, I guess, models together.
It sounds pretty novel, but maybe I'm just not as educated of other competitors.
Jaxon: Well, Postgres can store JSON data types.
So, that can be a column type. Microsoft SQL server can also store JSON in a field, but you have to define those field types.
Whereas we simply allow you to, if you just insert a completely flat JSON document, it will effectively become a row, a flat row, just like in a relational database.
If one of those fields has JSON in it, it will simply amend or in those individual rows that do have JSON in it, it will put JSON in it.
And I think one of the nice parts about the flexibility of this model is when you start developing an application, you think you know what your columns are supposed to be.
You think you know the data you're going to store and then adding a new column or changing a data type if you've decided that there's a better way to go about it somewhere down the line is really, really hard in databases that aren't built to be as flexible as ours.
Brian: Yeah. Story of my life.
I'm actually currently working on a project where I created the schema for the database using Postgres.
And I haven't touched anything new to it because I'm not sure if I made the right decision.
So I don't want to new features and then figure out, oh, you know what?
I got to go create a whole another migration and change up my entire scheme of my database.
So I've been very slow in building features to their because I want to make sure I get it right.
I'm the one working on the project. It's not a lot of other users and contributors.
And I'm terrified of making the wrong decision and missing out on users because my data is completely broken.
So, it's nice to hear the flexibility is a thing with Harper.
Jaxon: Ultimately, we want to create something that's simple, right?
Our motto is simplicity without sacrifice.
So, we try to give you, as a developer, everything that we wished we had in a database and always found lacking, or always, it was just part of your boiler plate when you're starting a new project.
Okay. I have to do this. Then I have to do this and I have to do this.
And what if they were all just one product and your job was to just save data or ask for data.
Brian: Yeah. I love that.
And I love the fact that you're solving that problem.
So that way I can just come in and leverage all the knowledge from day one, as opposed to me trying to, as I mentioned, trying to figure all this out upfront, because I'm terrified of making a mistake because this is not my-- I'm not a DBA.
Databases is not the place I feel comfortable in.
So I do a lot of reading and a lot of testing, and then I hopefully can move on from that.
Jaxon: Well, I think we're getting a lot of traction lately with a lot of, not only independent developers who are super passionate on our feature suggestion board, but also massive international clients with huge footprints and massive user bases who really appreciate that there's fewer moving parts to the system.
And ultimately it's a better solution for a lot of, unless you're highly specialized. You can do almost anything with HarperDB.
Brian: Yeah. That is awesome to hear too.
It sounds like you're getting the best of both worlds as well.
And I saw on the list of companies that are leveraging Harper, one of them being the U.S. Army, I'm curious of Harper.
You mentioned in passing, but you're able to use Harper for things that require security clearance as well?
Like you all have set up, I'm fishing for the terms, I've heard them, but like things like SOC 2 and data compliance things?
Jaxon: We need the requirements for the project aspect.
A lot of it is implementation when you're looking at those specifications.
So, we don't have ultimately control over any of the hardware.
And to be honest, what we prototype on to demonstrate that our capabilities are sufficient for their requirements are nowhere near ultimately what the final thing is.
And I doubt we'll ever learn.
Brian: Yeah. Yeah. I guess only one way to learn and that's the joy and the army.
Jaxon: Ultimately, it comes down to asset compliance and fault tolerant networking.
So if a node goes down and it can't publish or subscribe data to another node, is it going to be able to store that hold onto it.
And then, as soon as the network connection is restored, will it be able to, in a seamless manner, reconnect and continue to do its job?
And we live up to that standard.
And I assume, because you mentioned the Harper cloud and able to have the hosted version, but you're also to take this and host it, like in your own sort of cloud infrastructure as well, no problem?
Jaxon: Absolutely. We are an NPM package.
So you can simply NPM install HarperDB.
Well, I can't wait to actually kick this around and try it out.
I'm a big fan of this trying out stuff randomly for a little side projects and see if it sticks.
Jaxon: I will tell you that it will. It will change your development flow. And it will.
I hope I'm not being too grandiose, but it will change your life as a developer because it, as the guy who has been handed a napkin and told that this is the spec for a new feature or an entire product, it's nice to have a tool that doesn't get in your way.
And instead allows you to be flexible as your development process kind of evolves and your product evolves.
And ultimately, if I'm one thing for this product is I am the representative of the developer who has had to fight for requirements.
And ultimately been told that despite being handed none, that my architecture was wrong.
And now it's going to cost lots of time and money to fix it.
And HarperDB, doesn't put you in a spot like that.
Brian: Yeah. Excellent.
Well, Jaxon, I super appreciate you coming through and talking about Harper.
Folks, definitely try it out. NPM install this thing.
Try it out for your next side project, your next work project, integrated into your current work projects.
That'd be a super awesome to hear the feedback. You can find us on Twitter.
With that, Jaxon, I want to move into a section I call picks.
These are jam picks, things we're jamming on.
It could be movies, food, music, technology. It's really across the board.
And if you don't mind, I'll go first and then I'll give you a chance to fish up some picks that you're jamming on.
So I've got two. One that I'm super enjoying, and I've been enjoying this for the past, I think eight or nine years, which is Bob's Burgers.
It is my go-to for like, if I don't know what to watch or my wife's like, "Hey, do you want to watch something?"
It's like, I don't know what to watch. I'll just catch up.
I'd caught up on like eight episodes about Bob's Burgers this last season.
It's also my programming buddy too as well.
So, if I want to hack away at something, but also want to have a couple of breaks.
Things like Harper. If I'm going to try Harper out, I'll throw out Bob's Burgers and old episode, NPM install, see what it looks like, read the docs, laugh a little bit at what's happening on the TV and then go back to my laptop.
So, I'm very much like I used to do my homework with the TV on type of kid.
So, whenever I do my homework or do some research and development, it's always throw on a show like Bob's Burgers for sure.
Have you caught that show before?
Brian: Yeah. It's hilarious. And the first season was a slow burn.
I wasn't sure where they were getting at with it, but once you're invested in the stories, it's almost like a sentence, I guess, once you're invested in the characters, you're ready for the next one.
And then, the next thing I've been working on, which is really just reminding, folks, that I live stream on Twitch every Tuesday and Friday.
I've actually been leveraging Twitch as a place where I also do research and development.
So, I've been segmenting my Twitch into segments, almost like a podcast where I'll just introduce the thing we're going to work on.
And then I'll go through issues, any open source projects I need to maintain or work on.
I just ended up going through issues and chatting with the chat at the same time.
And I find it's a much better way to triage issues and opensource by having an audience, because then what happens is after I've taught them how to use the issues.
When I go to write some code, I've actually been throwing up ideas and people go open the issue for me and then I'll go work on it.
So, I've actually got three new PRS merged in the last week just from using this technique.
So, if anybody wants to find me live on Tuesday and Friday on twitch.tv/bdougieYO, come through, grab an issue.
I'll open the issue. You open issues, we all have fun. It's like Oprah. And hope everybody enjoys it.
Jaxon: You get an issue, you get an issue. Everybody gets HarperDB.
Brian: I need to actually record that. So, next time I start the issue triaging.
Actually, I have a card it's called issues with Brian and it has a jingle as well.
And I've only played it a couple of times, but it's basically, it's like issues with Brian.
That's as much singing you're getting out of me today, but it's a blast.
Everybody comes through. So you have any picks, Jaxon?
Jaxon: I would say that this entire past year has been downtime has been dedicated to English aristocracy since the Wars of the Roses.
And it's incredibly dry, but I look at the world today and I see all of the infighting and it seems increasingly brutal and disconnected.
And then, you look at what they were doing back then and you feel better about the world because there were a lot more swords and poking each other.
And the most digestible, easiest way into it is this series of series produced by stars, The White Queen, The White Princess and The Spanish Princess, which ultimately covers most of the famous names you know in English aristocracy and watching how they would marry not for love a sense of duty.
And then you look at something as recent as Meghan Markle and prince Harry, and she was an employee.
And ultimately, it was a much worse place to work a long time ago.
So, it gives you faith that the world is moving forward, ultimately.
Also, it's beautifully shot and wonderfully scored.
And there's lots of, as I said, there's lots of slicing off of heads, which is nice.
Brian: That's amazing. Yeah.
I've definitely turned up the documentations on Netflix and stuff like that.
But also, I love the long series as well.
I've caught a couple this year, but I quit in the Michael Jordan documentary. Yeah.
But I'm going to check this out because I think it's the same thing with code.
Like when you talk about the architecture and like you can be stuck in a decision based on what was provided to you.
When you really zoom out, like having things like HarperDB give us a lot of stuff for free.
It's a lot better than rolling your own and building your own infrastructure to do the same thing that combined relational databases with document models.
So yeah, I don't know.
I'm not a shoe horn that correlation in that entire conversation to what you just said, but.
Jaxon: It felt strange, but I liked it.
Brian: Excellent. Well, folks, hopefully you liked this conversation and listeners keep spreading the jam.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
Understanding Legal Issues for Open Source Software Start-ups
Viewing Open Source Startups Through a Licensing and IP Lens Open source software (OSS) is a vibrant and rapidly-growing space,...
How to Successfully Invest in Open-Source Startups
Investing in Open-Source Startups: What to Look for Open-source software (OSS) leverages the power of community to create a...
What Success Looks Like for Modern Open-Source Software Startups
How Open-Source Startups Succeed Estimates suggest that modern developers have built about 70%-90% of all modern software on top...