1. Library
  2. Podcasts
  3. Open Source Ready
  4. Ep. #31, Developer-First Data Engineering with dltHub
Open Source Ready
40 MIN

Ep. #31, Developer-First Data Engineering with dltHub

light mode
about the episode

In episode 31 of Open Source Ready, Brian and John sit down with Matthaus Krzykowski, Thierry Jean, and Elvis Kahoro to explore how dlt and dltHub are changing the way developers build data pipelines. The conversation dives into DuckDB, LLM-driven workflows, and the growing shift toward developer-first data engineering. They also discuss open source adoption, AI orchestration, and what it means to be a “10x engineer” in 2026.

Matthaus Krzykowski is the CEO and co-founder of dltHub, where he focuses on building developer-first data infrastructure tools that integrate open source, AI workflows, and modern data platforms.

Thierry Jean is a senior AI engineer at dltHub, working at the intersection of data engineering, Python tooling, and LLM-driven development workflows.

Elvis Kahoro is a developer advocate at dltHub who helps developers adopt modern data tooling and brings real-world pipeline experience to the open source community.

transcript

Brian Douglas: Welcome to another installment of Open Source Ready. On the line is John McBride, my co-host. John, how are you doing?

John McBride: Hey Brian. I'm doing good. Exciting things are happening. How are you?

Brian: I'm doing well. I feel refreshed. I just got off a Southwest flight and was delayed for six hours with three kids. So I would not wish that on anybody. But we're all alive and well and we're all back to work in school and we're here to run the next mile of 2026. So here we are.

We've got actually quite a few guests here from the dltHub team. So I want to welcome all of you, but I'll shout out to Matthaus first. Why don't you introduce yourself, dltHub, and we can go around the horn to introduce each one of you.

Matthaus Krzykowski: Hi Brian, hi John. I'm Matthaus. I'm the CEO, co-founder of dltHub. We've been doing dltHub since 2020 I think.

Brian: Wow. And Thierry you've joined the team, not shortly after, but you joined within the last couple years. Right?

Thierry Jean: Yeah. So my name is Thierry. I'm based on the east coast, Montreal. I'm a senior AI engineer. I lead a lot of the new front line of data engineering and AI, working with LLMs and how it directly connects to Python code and connector as codes.

Brian: Cool. And Elvis, you're the newest team member.

Elvis Kahoro: I think I'm the newest, yeah. But I'm working at dlt as a developer advocate, and so my job is to get dlt into the hands of as many people as possible, which is really exciting because I was a dlt user before joining, and so it's fun to evangelize a product that you use yourself.

Brian: Yeah. So let's just jump in. Like, what is dlt? Like, why do I care about it?

Thierry: So dlt is a Python library. It's Apache 2.0 license. It's free. Before joining dlt, actually I was a user of dlt, and it's the best tool in Python, if you ask me, to move data, whether it's from a REST API source, the file system, a database, and to move it to where you want to do more data operations. So your warehouse, your data lake, your lakehouse or whatever is trendy next year.

John: What does dlt stand for? I'm familiar with dbt. What do all these acronyms stand for? As somebody who's maybe data science curious, what would you tell them?

Thierry: So dbt stands for Data Build Tool. They're a transform layer tool. So they transform data that's already in your warehouse. And dlt is a data load tool. So it's loading the data where you want to do further transforms. If you search on the web, you'll probably hit distributed ledger technology also. But we're gaining SEO points as we grow.

Brian: Yeah. So I guess we should be very clear, is this something you guys developed at the dltHub company?

Matthaus: That's correct. So it's something we started developing in early 2021, and then we opened up this to the community a little bit more than two years ago. So we were bootstrapping the first one and a half years and then opened up to the community two and a half years ago.

Brian: Okay. And so what's the adoption look like? Tell me what the GitHub Star account is. I don't think I have the repo.

Matthaus: I don't think we actually focus too much on the stars. But we just crossed 4 million downloads a month. 16 months ago, we had around one thousand organizations using dlt in production. Now we crossed 7,500 over Christmas.

And around 14 months ago, we started to get indexed by the LLMs. And most of our growth this year, since around May, like, our web traffic on our docs started to plateau. And all our adoption and growth is by LLMs who are hitting our docs and our library.

Brian: Nice. Perfect. Yeah. Because funny enough, I ended up using Claude to build out a pipeline for me. So I basically built like a mermaid chart to figure out what tools I should build a pipeline. Most of the listeners know John and I from OpenSauced days. John built all the infrastructure, joined me at the Linux Foundation to work on that.

Which we have a mutual connection, Matt, at the Linux Foundation who's building LFX Insights and I believe also using dltHub. So I had the idea of, why don't I build a pipeline from GitHub that's using Python and DuckDB? And it was honestly a nd this is our connection--

Recently we did a workshop together at Small Data Conf where I kind of showed off a small little piece of how I built this pipeline from GitHub to build data insights and charts and graphs. And I will say it was pretty straightforward for me to get this off the ground and running.

And John was saying he's data science curious. I am not even close to curious. I was just more of to get the job done, do the thing and get out of my way. And I think that's what really helped me kind of grok what this thing is.

So like leveraging dlt, the open source project, a ton, but also in explorations of like how to leverage the Hub and make this more of like a repackageable experience. But John I colored all that context and I failed to actually mention that to you or, no, I think I mentioned this before jumping the call a couple weeks ago.

John: Yeah, because I think there was the connect with DuckDB and that's something I'm very curious with dlt. How is it integrated with a lot of these things like DuckDB or the various lake real estate places?

Matthaus: So John, I think without DuckDB, dlt would not exist. We're literally six weeks into going full throttle and we were one of the first 90 people at DuckCon number two and we saw the light.

Because like essentially what we ran into and was the initial idea for dlt, we got early access to GPT3 in, I think April, March 2020 and we were building data. We were suddenly building data pipelines for the Python AI crowd. And what we ran into is the fights with the traditional SQL database, relational database, data engineers who were gatekeeping anything the Python kids were doing.

And it was like everywhere already five years ago. And then the idea is what if we do all these things in Python, easy to understand, which all these Python people can do but where they don't need to learn Data Engineering. And DuckDB was a Eureka essentially because DuckDB gives on your local machine people all the freedom to do what they want. Right?

They're not like gatekeeper in their tool choice like they are in the cloud tooling. And so if you look at our docs, everything is DuckDB first and the code is such that whenever it runs locally you can push it the same way to the cloud. It was like literally you intuitively hit one of our earliest big insights of how to make this more useful to people.

John: Brian's very good at doing that.

Brian: Yeah, literally yesterday I had a goal. We have this open source tool at Continue where we're running these workflows or pipelines if you will, for agents. And I just wanted to use a bunch of GitHub data to find out how many folks are using Continue in projects with more than 1,000 stars. Like very straightforward. I could search this at GitHub.com search but I've got access to these pipelines. Let me build a pipeline to basically just answer that question.

But like, I was able to leverage the MCP to like get me started and like, I already got the framework of what I need to accomplish. And again, I'm not even data science curious. I just wanted to do the thing and get out of the way. So plugging into Continue, like my agent and have the MCP like scaffold this thing out, problem solved.

So like all I'm doing is, in DuckDB querying the data every day, and then answering the question like how many folks have adopted Continue in the public? Which is like, I think a lot of these, like the worlds of this data science is you just want to answer the question.

But when I was at GitHub, specifically whenever I need a data science problem, you got to open a ticket, you got to like scope out the problem. You have a couple weeks of meetings to clarify the scope of the work and basically have them kill everything you asked for. And then you get this hodgepodge of stuff you didn't want, but you got to have something to move forward with.

I feel empowered and emboldened to go answer and solve my own problems at this point without the need of waiting for either contractors or data teams to unblock me.

So hats off to you guys for one, getting the connection to DuckDB because DuckDB itself is an amazing tool to be able to query data in a way that's natural but also not awful like SQL but now you have this thing on top of it to help you get to production ready data science.

Thierry: Looking forward also--

DuckDB is a key piece of everything that people are building regarding data engineering and AI because having data that's local or in this DuckDB instance really limits the blast radius of what an LLM can create as a problem, in terms of security but also messing with your data.

So it allows you to copy stuff and operate at scale and efficiently.

Brian: Yeah so we talked about DuckDB and also dlt but can we distinguish dltHub now at this point? Because I'm intrigued by the company angle on the side of this but also what is the Hub to you guys today and what is it to your customers, users, et cetera?

Matthaus: So I think the clarity between open source dlt and commercial dltHub is that dlt is an open source Python library and it handles the extract and load part. Right? So extracting loads of data from messy sources into clean data sets. And this is, as you said, it's interoperable, it's modular and it's built from the ground up to work with LLMs.

And the step of dltHub is we're extending it into transformations so you 're manipulating the data into storage and runtime. And the vision for it is traditionally until recently and it's a little bit like the experience of what you had with dlt, Brian, is if you wanted to do everything right like essentially from the data engineering to the analysis, imagine you're in a larger organizations, then you need infra and security people.

This was being too difficult and our vision for dltHub is to make it so simple that a single person can do everything end to end from your local dev environment up to the cloud. So this is what we're building.

Brian: Perfect. So at this point you're building it. What stage are you guys in? We talked about the adoption for dlt but I'm curious what are the use cases that folks are, one, going to dltHub for? Use the opportunity to resonate to the audience of, okay, I might be listening. I've never heard of dlt or dltHub. What are use cases that folks are looking for? I shared my GitHub example but I'd be curious to hear what's working.

Thierry: Yeah, I can give you a sense. Personally I run a pipeline that ingests house market data, while I'm looking at houses. I think it's very simple to get started and it has a lot of configuration and knobs to tune performance. So dlt is production ready. It's used in many enterprises in their critical infrastructure.

So if you're someone that ingests data from a REST API that's public on the web or that's internal, if you're doing database replication or you need to move data from an operational system like your payment system to your data warehouse or maybe your CRM, those are a lot of cases that like dlt is used for and like find success doing.

But I think one thing that's special about dlt is that it's open source, it's Apache 2 license, and anyone can audit the code. It's adopted in highly sensitive industries like healthcare and finance because they want to be able to inspect all of this code and own the pipeline, own the connectors, and not have outside interference, change their things or be sensitive to changes in what a vendor is offering.

So that's definitely something that we've seen. But then typically they come to us and they ask for more and that's where we find inspiration for our paid products or the dltHub platform.

Elvis: Yeah, one of the use cases that I was experiencing at my former company is me and also our forward deployed engineers were responsible for taking customers from zero to one. And so we essentially get a bunch of these schemas because they can't actually give us their actual data. And we'd have to, for example, do a bunch of data generation. And one of these pipelines you might be moving hundreds of gigabytes from your local machine or some cluster into a data warehouse.

The nice thing for us is that with dlt we were able to switch out the actual end destination with just one line of code. And so this made it so that we could actually scale up these onboarding pipelines for customers across different tech stacks depending on what that customer wanted.

And the second thing that was super helpful is that there was a lot of ergonomics around schema evolution. And so if we did give a customer back a dataset and say, "Does this look like your actual data?"And they'd give us a modification to a column or some schema, it'd be really easy for us to either regenerate it completely and then not pay some cloud platform a bunch of money to move, again, hundreds of gigabytes of data.

Or even just change the schema in a way that there was zero copy in whatever file system we moved it to.

Matthaus: And Brian I'll give you the founder product-market fit answer and not the developer answer. So what's special is now that the number of Python developers is rising super quickly and there's managers of these Python developers now. And these Python developers know if they get choice of graphic user interface, old connector tooling, SaaS tooling, it's not something they really appreciate from the developer's perspective. It's not something you want.

They want to use Cursor or Claude or Continue. And you can't make that work with old connected and GUI and SaaS world. At the same time, as Thierry suggested, we have a lot of technical leaders and they need to make sure things are sensitively cared, that data quality is catered to, that whatever auditing happens in the correct way. So they want control. So therefore we want code.

So this is the tension where we operate. And surprisingly to a lot of people we rarely compete with graphical user interface tools. We literally mostly compete with internal Python scripts.

So that's piece number one. And the second thing is what Thierry described is around the sensitive industries, finance, health. What Elvis described is around people needing a lot of customization and developer control. What a lot of our database partners like Snowflake are surprised about is how big our customers are. And startups usually don't have large corporations in finance and health and manufacturing run on open source tooling very quickly.

And we have all these badges from Snowflake in all these industries and they're very surprised that we, as the partner, are able to serve customers like this.

Brian: Would you tack that onto the fact that you're open source with dlt first, making it approachable, or is this like more the connection with your data partnership? You don't have to share your secret sauce. But like, I'm curious, how are you capturing this as a market that folks are unaware of? At least Snowflake is, that you can do this.

Matthaus: So you know, the first one and a half years we were actually closed. And what's interesting about this is where they're interoperable and modular, but not only to the Python world, to Pandas and notebooks, but also to the traditional modern data stack. Before we launched, we spent four months on optimizing our helper for Airflow.

This is a tool the Python script crowd doesn't even know where of. But once you move to the enterprise, this is what often is there or our dbt helpers. We knew that when it's showtime, we need to be interoperable with people's tool stacks. And we are. And so we spend a lot of time to make it happen.

The second thing, and this is the power of the community, whenever somebody does add something interesting in DuckDB, often they start talking about it and chances are they talk about us. And so it gives us like a feedback loop from our user base and from our community, and we gets to hear how we not only work as dlt by itself, but as part of people's tooling.

And whenever something interesting happens-- I know a couple weeks ago, boring semantic layer in our world, for example, then somebody will start doing demos around it and then start talking to us and giving people. It's a beautiful feedback loop and it's something we've done on purpose.

It made us sometimes in our development architecture slower because we feel we are stewards to a lot of these different tools. But I think it's worth a gain and you ask me, maybe all simplicity, when did we figure that out? I think in the first couple of weeks.

I talked to Sebastian Ramirez from FastAPI and he told me, Matthaus, half of a company is your docs. If you want early, start early, just be helpful to other people. Don't spam anything with your tools or whatever agenda you may have. Just try to be generally useful.

And one of my co-founders, Adrian, for the first year he just lived in comments on LinkedIn, on Reddit, on Stack Overflow, where he was just genuinely being helpful.

And that's the advice I keep on giving to other open source founders: Just try to be a genuinely good person in your domain and good things will happen.

John: Yeah, it's really good advice. I always feel a little sad, you know, probably being chronically online at this point. But it seems like some of the advice is the opposite in the AI or startup ecosystem. Like if you can be really kind of edgy, then maybe being edgy is an edge. But I do like that.

I think that, especially in the open source ecosystem, communities respond really well, at least in my experience, from really kind and generous people in open source. Right?

Thierry: Yeah.

One of the biggest perk of my job I would say is being able to contribute to other projects.

Like I spend time giving back like with feedback, with issues, with PRs, for tools that I like to use, but also tools that we want to integrate with. So like I spent time chatting with the folks at Continue, contributed features to Marimo which is a notebook environment and like we really like Ibis SQLGlot and I think it's mutually beneficial to have these tools work nicely together because users will have a better experience and at this software layer, in the end it's really like the best user experience that will win and gain the bigger traction in the end.

Matthaus: And I think, John--

I think the LLMs are actually beneficial for open source. These things go so fast. You need to be super focused, right? And you need to rely on other people left and right to develop good software and listen to users because the speed of co-development has dramatically increased in the last couple of months.

Brian: Yeah, one note you mentioned, Matthaus, about just making it work with other tools and making it basically approachable. John we were having a conversation about this earlier this week, this approach of adoption, open source and getting in enterprise.

And that's one thing that attracted me to Continue is that the open source angle of enterprises happened to deploy this at scale because they saw an opportunity to basically ship an AI coding agent that was going to work with their tools behind their firewall.

And it's pretty fascinating on how quick folks can get adoption. The whole sales cycle and everything like that, that all becomes interesting. But I think when people love what you're doing they're willing to work with you to figure out okay, well we're willing to pay for this part of the control plane or this part of the data. I guess what I'm seeing, on the Continue side, is you get really great R&D and feedback from not just open source but also some of the largest players at the stage.

Matthaus: 100%.

Thierry: I think also--

The enterprises that really get open source, you can tell because they see that it's mutually beneficial. They get to shape a product that they like and I think it's well worth the investment.

Brian: Yeah. Elvis, you got a response?

Elvis: Yeah. I also would add that--

As hard as it is to build an open source business, I think it ends up being a forcing function for really deeply understanding value alignment and making sure that you actually are providing value to users.

And so I actually think it prevents people from having an anti-pattern of having business models that are based on vendor lock-in or making it difficult for people to not use your platform versus making developers love it. And so that's another reason that I really like open source devtools.

Brian: Yeah, I had a final question just more about the other tools in the space I think I sort of got this question from reading your docs in your landing page, but just curious about the connectors in the space. You mentioned the LT and the ELT part. We talked about the open source stuff and about just leveraging the agents and the coding tools.

But what about Airbyte and Fivetran? Did they exist? Do you work with them? Are you competing with them?

Matthaus: So we rarely compete with them, is the answer. Like purely on the developer experience, right? Like if the developer is picking dlt, they use already code. So we're looking for a code solution, right?

So maybe there are some, in larger organizations they buy every tool in the box. Whereas we're part of this bottom option where developers are asking. So often, the organization may have bought Fivetran, but then suddenly the developer does, Brian, what you just did when recently you just developed a dlt pipeline and you even don't tell your bosses who have bought FiveTran for the organization.

And that's usually often how things start for us. And then the developers start pushing for more dlt in the company. You know, Fivetran makes 90% of profits with around 25 sources. And one of our angel investors is Julien from Hugging Face.

If you look at Hugging Face, there's literally hundreds, thousands of data sets. And our vision from the get go was that this is how the world should look. It should not be a world where only a few GUI connectors are supported. Can we make it a world where these data sets are free and everyone gets to interact with them?

Anyway, our vision is if we succeed, we can make the market much bigger. So we're trying to make it in such a world that we make the pie much bigger for everyone in the world. So we don't feel we compete with these GUI connectors. We often recommend, if it's a business team, they have a lot of money and they want some of the core connectors. We tell them go to Fivetran. Good company.

If we talk to the technical decision maker, you have a bunch of Python people, you use an AI code editor. We are a natural choice. So we're lucky that actually if you start talking to people about the choices, especially given what happened in the AI world in the last years, we have a very clear differentiation which is now much clearer to the bosses.

A year ago we had a much harder time explaining this code solution to the management. But now management understands and wants productivity gains from AI code editors, so this part of the story is much easier.

Brian: Yeah, that's excellent. I'm like super excited about what you guys are working on. I'm super excited that you keep working on it because it's helping me solve my day to day problems. We're lacking data science teams at Continue. So you're looking at them, at this point. The data science infrastructure and leader of Continue, thanks to your product.

Before we switch to Reads, Elvis, you had a thought?

Elvis: Yeah.

I think another metaphor that I've been thinking of recently that I found helpful is the dlt team is in a way almost a platform team for the data community. Because at big organizations you have a platform team that enables these developer pathways so you can go from zero to one super quickly.

And I feel like it's super nice to have a team dedicated to thinking about data movement from first principles and then embedding best practices into an SDK that an AI just has to fill in the blanks for, versus the AI trying to come up with how do I materialize this quickly? Or is this using Aero? And so for me it's been really fun to see folks who have no data experience just get up and running and then their eyes light up because in a notebook they've built this production pipeline.

Brian: Yeah, actually that framing of the platform team for the data community is actually perfect. Put that on the front of the box because this is exactly what we're all struggling with as engineers who are like, you have less people to solve even bigger problems. So like you go from zero to tech debt from like seed funded or pre-seed funded, angel funded to now you have like a scaled company but you have less people.

More and more situations you're seeing like someone's picking up stuff that they didn't have the skill set for, like myself, because like, do you go higher for it or just r oll up your sleeves and get excited about the problem? And for having the problem solved in a way that you're literally my platform team to source this data from, perfect framing. Appreciate you jumping in and adding that.

Speaking of jumping in, we're going to jump out of this conversation and move into Reads. So the question folks, for everyone here, are you ready to read?

Cool. So I'll go first, John, because I have one read, which is Gas Town. So Steve Yegge. He's been on a tear of doing a bunch of podcasts, writing some blog posts. He, I think today, works at the company Amp which is the AI coding agent.

Previously worked specifically on the Cody project which was Sourcegraph's AI ide thing. He's been around for quite a bit, but he put out this blog post around Gas Towns and I think Steve is like in a fever dream when it comes to AI coding. Previously he launched Beads, which is a very opinionated framework for AI coding that this takes the decision making, just knows what you want to do.

And what Gas Towns does is basically like you've got different personas or these agent orchestrators or these overseers is the one thing that he was talking about, that you just kind of know-- The overseer knows what to do. And I say this as a read and yeah, if you guys have the show notes, you can open up and give it a skim. It's a long skim, to be quite honest. But I will say it feels like a fever dream reading it.

Steve's got a lot of ideas and they're all in that one blog post. So get a large cup of coffee and pace yourself as you read it. But John you had some pretty good feedback on this and I'd be curious to get your take on this read.

John: Yeah, it is interesting. I mean, I think there's a lot of this kind of orchestration work happening and I've seen some takes where people are being like, well, the new engineering is orchestrating. And you know, people who were really good at StarCraft back in high school are now very good at orchestrating agents around.

But I think one of my main feedbacks on this was that it's just wildly expensive to do anything like this especially with the Frontier model, because you're not only doing the inference for coding, but then all the inference for the orchestrating and then all the inference for like the critiquing and the different personas and just like this huge flywheel that just ends up being a lot of tokens spent.

So I'm so curious how far down the billions of parameters you could go, how small you could go to get similar, maybe not like cutting edge results, but like how small you could go on that. I'm sure somebody's going to do something similar and call it, I don't know, Ant Town or something with tiny models.

But yeah, I'm curious Elvis, Matthaus and Thierry, if you've read this or seen anything about this, you know, these Orchestrator engines that people are building or any crazy hot takes.

Thierry: Yeah. I've been toying a lot with LLMs and we're actually building real things with LLMs internally as an offering, but also for our own developer tooling. And I like always to start with like what an LLM is. Right? It's a machine that receives texts in and outputs text.

And where is the brain, where is the smarts coming from? Either it's the text that it's been trained on or it's the text that's in the input. And we've all experienced it talking to ChatGPT or in a different tool where if you have a long conversation like it will derail, it will start to have a mind of its own or get off topic. And I think that's where like the orchestration helps.

All of these like personas or orchestrated agents, they have a very clear task, they, they don't get derailed as much. And I think it's something that will get better at understanding on like when do I need to change the conversation, when do I need to recenter it and give it a clear instruction?

And instead of having a chat window where you input every two, three messages, hey, get back to your task. You can have an automated system that does it and figures out when you need to remind it what it's supposed to be doing.

John: Right. Yeah, I think a lot of people are sort of banking on these various systems, be it Kanban or Linear or GitHub Actions even or just the pull request in code to be that kind of like kickoff point and then the anchoring for these various agents.

I think one of the things that's been quite disappointing from what maybe was promised with LLMs is that they're not actually really that autonomous as much as, people were saying that it's going to go and just start doing all this stuff. You still really have to tee it up to get going.

Elvis: Yeah, I also think it definitely proposes a lot of user experience questions around how do we create user interfaces for developers to actually intercept agents at different steps of this continuous workflow?

I think the thing I'm most excited about is we've added a new definition to 10x engineer. Historically 10x engineers were staff, principal type people that shaped the architecture of business. But we now have the second definition which is that I literally have 10x output of a typical software engineer. That's something I'm very excited about as well.

Brian: Yeah, it's real in the sense of, like, I wouldn't consider myself a 10x engineer, but I built an entire pipeline for my data stack and I'm able to answer questions without really asking any questions internally of like, hey, how can I do this? Or can I get your permission?

So it's interesting, in the last couple of weeks, we've seen a lot of these tweets from, really around Claude Code specifically, but folks basically saying, like, this is what I do now because of Claude Code, or even Boris, who created Claude Code, is now saying 100% of the code for Claude Code is now generated by Claude Code.

And this is maybe it's scary, maybe it's not, but we're in a world where now-- And I know the Claude Code team is actually in the whole framework of it, they're not that big when you think about the impact they're making in the industry.

And that's a testament of what we're actually doing with, if you need to go scale and auto scale a bunch of servers up based on Claude Code uses in Japan ends up going off the charts. Like, let's go figure this out. That's the team that you're looking at on Twitter every day. Those are the folks who are managing that.

So perhaps they've hired more people since the last time I hung out with them. But basically what I'm getting at is--

We're definitely living in a world where the 10x engineer is less of a meme and more of a reality.

Thierry: Yeah.

I think some people in large enterprises are seeing more and more that writing code is becoming a luxury for them. If we look a few decades back, we were running computers by punching holes in a card. So maybe writing these magic words in a file is not something that we'll be doing a lot of in the next decades.

But I think we're at an interesting point where we'll see if it's real and we'll be part of this transition and we can bring a lot of the knowledge in steering these machines and making them useful.

John: Yeah. Something that I guess listeners will know that I'm sort of notoriously skeptical, on and off. You know, I've had waves of being all in, AI pilled, all the way to thinking it's dramatically overhyped.

But consistently, I've been using this stuff even since the early OpenSauced days when we built a lot of early RAG technologies. And it's funny, we're talking about these big Python data pipelines and stuff because I had to hand roll that crap way back in the day for things.

And the explosion of stuff that's happening in the space is very exciting and I think that there's going to be so much more software just out there. W e can't put all this back in Pandora's box. It's just out there and people are using it, people are making more and more stuff, people are pouring gas on the town and it's lighting on fire.

Brian: Yeah. Matt, did you have something to add?

Matthaus: I think this is for me, my fourth AI wave. I was building agents with WIT.AI in 2016 so I know what a happy path is. And for me I sometimes wonder whether I'm excited enough or whether I am too skeptical because I'm just seeing, "But, but, but."

And for me, orchestrators, I know, in the dltHub community I think 60% of all people start with GitHub Actions. And you know, while, for example there's like data engineering grade tools, Baxter or if you're more AI inclined temporal model out there.

And the way how I think people pick some of these traditional data engineering orchestrators are honestly people are making it up all along. I think it's just there's new kids in town, they're YOLOing it and they don't even know what these are and how we should be choosing and then they just test things out.

And then I get excited because I think wow, it's kids out there. It's not about, I don't know, some scientific comparison of which orchestrator tool is better than other or what needs to be doing Just people. It's the wild west out there and the kids are just doing funky things and we should just appreciate it and see who builds something which is interesting and fun.

And then we all get to use it. So I often, when people speculate, like Steve did here, is I am in awe that you can write something because for me I have too many "but" questions in my head to write a post like this ever I think.

John: Yeah, exactly.

Brian: Cool. John, you got some reads you want to share.

John: Yeah, I got a couple reads that were quite impactful over the last like week or two for me. The first was this great read, 21 Lessons from 14 Years at Google. And I feel like I codified really put into words a lot of the things that I have experienced or felt my Time at startups, at AWS, VMware. Yeah, this is a gold mine of advice.

So listeners go check it out. But two that really stuck out to me was the first one, the best engineers are obsessed with solving user problems. I think that still stands today in the age of AI and coding agents and all this stuff. Like there's a lot of fun, sexy, interesting technologies, a lot of things to go build. But being obsessed about user problems I think is still a great differentiator of great products of great platforms.

dltHub, you know, you're solving like real user problems that then is impactful for the business, impactful for all kinds of things. And then the other one was number eight which is at scale, even your bugs have users, and oh boy, have I felt that before.

You wouldn't think versioning software would be as hard as it is, but once AI can solve that problem for me, I'll be very happy.

Thierry: Yeah, I'm skimming the article and just looking at the titles, I'm smiling. Some of them are hard earned lessons in my case. And the number eight as you mentioned I also maintain a library called Hamilton and once we shipped a notebook related feature and it was very experimental, I had the open PR and someone merged the PR while it was still like in a draft state.

John: Oh no.

Thierry: And I wanted to change something and I heard that we couldn't because an enterprise user started using it as part of their test suite or something and it's like oh well this is with us forever until version 2.0.

John: Yeah, it's a classic enterprise story. Honestly like, "no, we rely on this bug that you shipped three months ago. You can't change it now." Oh goodness.

Okay, I'll move on to the next read because we're running a little bit short on time but this was something from PC Gamer and I've mentioned this a few times on the podcast, but the short of the post is basically Linux is good now and I could not be happier. This made it onto Hacker News.

And Linux gaming has really sort of taken off in the last year or two to the point where really it's gotten good enough where you can basically daily drive it for, for most games. For the longest time for software developers you could obviously daily drive it for productivity stuff. With a lot of stuff moving to the web, you could daily drive it.

Gaming was really like the last thing I think. So I'm calling it, maybe I called it before on the podcast: 2026 is the year of the Linux Desktop. Brian what do you think?

Brian: Yeah it's on my list of things to do. I've got a second PC that I've ran Linux on multiple times but I got in the windows 11 bug. Well, I wanted to try it out. So it's running Windows 11 now. But I might circle back especially if I could play, you know, some AAA titles. That'd be great.

John: Yeah, we should really try to find some people at Valve who, you know, they're pouring a lot of resources into the open source ecosystem to enable this to happen. There's a lot of these compatibility layer things with Proton and the ARM translation layers and some really interesting deep technologies. But yeah, dlt Team do you, do you game, do you game on Linux? What do you think about this?

Thierry: Yes, I have my Linux desktop since last year. Linux on my laptop and I get to play Balatro or Slay the Spire whenever I fly to San Francisco. So it's been good.

Elvis: I used to game on Stadia.

John: Oh wow.

Elvis: Yeah, I was so excited about Stadia and I tried to get every one of my friends on it because to me it was just so innovative as a technology being able to stream these games. So it's very unfortunate that they shut it down.

But I do think it's an exciting future for especially folks with low compute resources to be able to game and things. That's the last time I gamed though is when Stadia died.

John: Oh, bummer.

Matthaus: John, I have two kids so my playing in this Christmas season was Lego and Uno.

John: Oh, nice.

Brian: Yeah, keeping it analog. Perfect. Well, hey guys, thanks so much for coming on, having the conversation, and educating us about dlt and dltHub. It sounds like, our audience, this is in your wheelhouse. So try it out.

You can reach out to the team. You all have a Discord, it's pretty accessible. So check it out and stay ready.