JUN 16, 2026

70 MIN

Ep. #13, Building Trillion-Scale Data Pipelines with Josh Wills

GuestsJosh Wills

light mode

about the episode

On episode 13 of Data Renegades, CL Kao sits down with Josh Wills to explore how AI agents are reshaping software engineering, data science, and data infrastructure. They discuss verification, benchmarking, pre-training data pipelines, multimodal AI, and why understanding a problem may matter more than writing code. Josh also shares lessons learned from decades of building large-scale data systems and his journey from management back to hands-on engineering.

about the guests

Josh Wills is a member of the technical staff at Datology AI, where he builds distributed data pipelines for large-scale AI model training. Over a career spanning IBM, Google, Slack, startups, and open source projects, he has become one of the most influential voices in data engineering and modern data infrastructure. Josh is also the creator of dbt-duckdb and an early investor in several companies that helped define the modern data stack.

about the episode

about the guests

show notes

transcript

CL Kao: Hi, I'm CL, CEO and founder of Recce and your host on Data Renegades. Today our guest is Josh Wills, member of the technical staff in Datology AI building distributed pipelines for trillion scale LLM pre-training. He created dbt-duckdb and his angel portfolio runs through the modern data stack like dbt Labs, MotherDuck, Materialize, Tabular.

Long before any of that, he tweeted a line that we all probably seen on a slide deck somewhere. Definition of data scientist: "Better at statistics than any software engineer and better at software engineering than any statistician." That was 2012.

Fourteen years later, he's one of the sharpest public voices on what works and what doesn't in data tooling. Josh and I talked recently about workflows, agents, and what verification looks like when code can be manifested from somewhere. Some of those threads came back here. Welcome to the show, Josh.

Josh Wills: Thank you, CL. Thank you so much for having me.

CL: Well, it's always fun talking to you.

Josh: Same friend, same.

CL: So for a listeners new to your work, can you introduce yourself, where you started, where you're now, and the through line if there is one.

Josh: Yeah, if there's a through line, it's a great question. Actually, I've been trying to figure out this for a while now. Yeah I'm Josh Wills. I've done, as you mentioned, a bunch of different things, in a bunch of different places, with data.

I started my career like 25 years ago. I worked at IBM and I've done startups and Google and now doing sort of AI and LLM things. And I've been fortunate to have written a bunch of open source software along the way that is used by lots of people to do lots of things.

I like to say that it's difficult to use the Internet for any period of time without touching code that I wrote at some part. I wrote enough core things that are still in use at Google and Apple and Slack and other places that if you use the Internet, odds are some data you generated will be touched by code that I wrote at some point.

CL: And probably a Mars or a moon lander somewhere.

Josh: Quite possibly. We'll see how long this lasts. We'll see. Eventually everything dies, right? But things I've written have stuck around for a while, like that kind of thing, which is sort of cool.

I think about it a lot these days, especially with the advent of AI and LLMs and trying to figure out what does work mean anymore, what does it mean to be a data person, what does it mean to be a software engineer? Right?

And I think the through line of my career, for better or worse, is the fundamental problem of data engineering, which is given that you have some data transformation task to do, how much of it do you do offline, kind of upfront as preparation work and how much of it do you do online at the moment that you need the data on the fly? Right?

And for me, all of data modeling, all of building search indexes, all of like, I mean what is software engineering, but not like, hey, take some data over here in this place and do something to it and write it out over here this other place? That's kind of all we do over and over again.

And I am more or less obsessed with that problem. And kind of the arc of my career is a search for new and interesting spins or approaches or constraints or variations of this fundamental problem. That's more or less what I do.

That's what I've done for the last 25 years. And I think, God willing, fingers crossed, I'll get to do that for like the next 25 years or so. It'll still be a problem, it'll still be useful enough for me to do that stuff. Whoa, that was a lot of preamble talk. Sorry about that.

CL: Haha. No worries. This is cool. So yeah, I recall I think there were times that we're taught that programs are, well, data and code. And I think you're talking about kind of what's the balance in between then, or where the dials are. Right?

We used to be like, well, it's a lot more code and data is just kind of more uniform and flows through this code. And now it's more like the code is kind of generated from data anyway, for now. Right?

Josh: Yeah, in some ways that's true. That's right. We've now used, we've generated a bunch of code that consumes a bunch of data, and now we use that data to generate more code. Like, oh, God, what have we done? That's like a fairly good perspective on it, actually. Right?

Like, we've taken like, kind of the Lisp philosophy of like, data is code and code is data, and sort of like turned the dial up to not even 11, like 111 or something like that at this point. Right. That's the new world. That's a great point. It's very much so.

CL: Very cool.

Josh: Yeah.

CL: I want to dive into your popular definition of data scientist back in 2012.

Josh: My favorite line about that, by the way. And what I love is that is like, that tweet or whatever is part of the onboarding for new Nvidia employees. Like, when you join Nvidia on your sort of hero's arc to becoming a billionaire you get to see my tweet. So I always like to think, "Yay. There's a bunch of billionaires in the world who know who I am because their Nvidia onboarding featured a picture me, baby Josh in 2012. Haha.

CL: Haha. That is awesome.

Josh: Yeah, I get a kick out of that. It's maybe my favorite thing out of all that stuff.

CL: Yeah. So, I mean all this year past. Right. I mean, it stayed in Nvidia, but what's the shape of that role today? Has it changed from what you meant when you tweeted it?

Josh: That is a really good question. Honestly, I don't think it's changed. I think it's still true. I think part of it is like, you got to contextualize at the time, this is like 2011, 2012. The data science is like--

CL: Sexiest job, right?

Josh: Sexiest job. But it's also like a new title.

CL: Yeah.

Josh: And it's a new title and people are like, what are you talking about? Like the other popular definition, which I think is in many ways still true, is a data scientist is a statistician who lives in San Francisco. That was another very popular one.

And I tried to do something that, I tried to make it a bit more concrete and less of a joke and less of like, you have to live in San Francisco. Right?

I'm always happy when I'm defining something. I am, at my core, like a math major. I just love defining things. Nothing makes me happier. If I can do it in a pithy way, so much the better.

I still think it's true. I still don't generally find that many people who understand software engineering in some sense and who understand statistics in some sense. And this of course is changing because you can vibe code anything now. And you can also vibe analyze data, right? You could kind of always vibe analyze data in some ways.

Like you can take something that looks like a data frame, throw it into R or Python, run a T test on it, and you'll get an answer. But the answer might not make any sense. It might not be statistically valid because you don't know the assumptions behind it and how is this data generated and all these kind of things. Right?

So you could always do these things. It's gotten easier to do these things. I still don't personally, and I'm biased here, but I don't think there are that many people in the world who understand both of these different worlds that well, who really understand it. Not just that you can do it, but that you understand what you are doing.

And I still find those people valuable. I think the work they do has changed over time. I think my friend Hamal is always talking about how evals are truly the new domain of the data scientist. And I largely agree with that. Once upon a time it was like growth hacking was the dominant domain, right? Growth hacking was what a data scientist did.

So it's changed over time, it will continue to change. But that's still that fundamental skill of understanding these two different worlds and the different ways they approach problems and how to like, not just vibe code things or not just vibe analyze things, but really understand it.

It still feels like relatively scarce to me in the grand scheme of things. Anyway, that's my two cents.

CL: Okay. We're both like more from a software engineering background, right? But I mean nowadays, I mean, you have math degree or something.

Josh: Well, it's sort of like, super nice you to say that. Like, I am like, a fraud, basically. As far like, I was a math major. I studied statistics in graduate school. Like, I got hired at Google as a quantitative analyst. There was no data scientist title back then, so they hired me in 2007 as a quantitative analyst.

I would have never passed a Google software engineering interview. I still probably wouldn't pass a Google software engineering interview. Like, I don't think I could do it. I really don't. Everything I have learned, I've picked up on the way just by being curious and wanting to understand things and being exposed to people who were good at this stuff.

I've written a lot of software. It's hard to use the Internet without using software I wrote. I still don't really think of myself as a software engineer, for better or worse. I don't know if it's imposter syndrome or something stupid like that .

CL: You're very modest.

Josh: I'm not. I'm not, though. I'm the most humble person I know, but I'm not at all modest.

CL: Yeah. The repeating theme on this podcast is like, a lot of people talk about how important curiosity is, right?

Josh: Absolutely.

CL: Yeah. And I think this is never a better time, like now to learn through LLM or just try things. This is so much better. Right?

Josh: I completely agree.

It is an incredible, glorious age for tinkerers and autodidacts and people who want to learn things. It has never, ever been better than right now. I love it so much.

CL: Yeah. So I've been running a lot of data, or agent benchmarks and stuff. And I'm very bad at statistics. Right? But nowadays I'll just say, oh, well, have a staff data scientist review this thing. Right? Have the LLM dispatch a sub agent to have that domain knowledge to review that too, for rigorous or whatever. Right?

Josh: Yes.

CL: But I don't know if I can trust that. What do you do with that?

Josh: You can't. I mean, I don't want it to be like, whatever. You kind of can't. I run also, like, much like you, I run benchmarks all the time benchmarking things. And I love my agents for babysitting my benchmarks, for setting things up, for running things, and I absolutely hate them when it comes to analyzing the results of my benchmarks or interpreting what happened and stuff like that.

Because they are so bad at it. And it's not their fault. It's not their fault. T he agent is limited to the context that is available in the context window and like what it can grab and stuff like that. Right. And inevitably I find there is always context outside of that realm that it's not aware of and it's not aware that it's not aware of it.

And because of that I also have this. And this is also weird. And this is my tinfoil hat kind of stuff. The transformer architecture, the way the sort of auto aggressive next token prediction works. I sort of feel like the agents get trapped sometimes if they do a draw from the probability distribution and they end up with some token and they get stuck.

CL: Yeah, they got stuck into kind of the prior data, in that session.

Josh: Exactly. And they can't get out. They're trapped in it. and it's like they keep coming back to like it must be this explanation. Must be this explanation. And I'm like, "dude, it's not that. It's not even close to that. That's not even."

Right? And so I've gotten to the point where like, I'm very clear with my agents nowadays to be like, you are here to, you know, it's the--

CL: "Operate this thing."

Josh: Yeah. Like the Rick and Morty. It's like you butter toast. Basically, you're here to butter toast. That's your job. Your job is to run my benchmarks. It's to do what I tell you to. It is not to interpret things because you don't have--

And again, it's no shade against you, brother. I have an emotional attachment to them, obviously. Like no shade. You just don't have all the context here. I don't have all the context here.

CL: Yeah. I think you brought a very good point about like when the agent does not have the context, it doesn't know.

Josh: Right.

CL: It does not surface that. But I think it's probably getting slightly better these days.

Josh: Okay. Yes. Say more.

CL: They're like, hey, I have no full picture of this. I need to clarify xyz or maybe I prompt it the way that it would behave to kind of seek clarification before making confident mistakes.

Josh: Yes.

CL: Have you seen that change or do you see the model getting better at that?

Josh: I definitely, I mean I certainly have noticed like again, I'm, I am. You know, this is a whole other thread we have. I'm a Claude code main these days, I've gone through my Cursor phase, my Codex phase. I'm back on Claude. I've noticed Claude getting better about detecting. Wait a second. I need to take a step back here. I need to rethink this and stuff like that.

But still, it's like coding is a uniquely great domain for the agents and that, like, there's the context that's available in the window, and then there's the context that it can search for and grab. And so much of it is there. And there are so many other human domains, like statistics.

Like, my favorite is always politics. Politics, where all the context that matters is never written down anywhere. That's the point. Like, is that anything. You gotta be in the room where it happens. And if there's no microphones in the room where it happens, there's no. It's the. It's the line from the wire. Right? It's like, are you taking notes in our illegal drug conversation? Like, there's no training data for that.

Anyway, we see it taking off with coding first and foremost, because so much context is there.

CL: Yeah. Something really interesting is I've been running all these benchmarks for quite a while. I think six months earlier a lot of the time the agent will bias toward the implementation of the harness. Well, this is dumb. I didn't actually run the benchmark, but the code supporting the benchmark is done and then calling it down. Right? But now it's actually knowing the intent for the project is about producing benchmark results and doing it properly.

Josh: Yes, that's what I'm saying. I'm very glad to hear that. The other thing, I guess I don't know if you're cautious of this, of what your experience is. When I'm running a benchmark, I can't let Claude know what I want to have happen.

CL: Haha!

Josh: If it knows that I want Project A to beat Project B, it will figure out a way to put its thumb on the scales to make Project A win. Oh, yeah, it is very, like, everything's gotta be like, "this is my friend. And I'm trying to evaluate this for my production." And I'm very clear with Claude we need to do this right. We can't.

CL: No corner cutting. No cheating. Haha.

Josh: No corner cutting, no cheating. Like, this is not a-- Anyway.

CL: One of the benchmarks I was running, we then realized that well, one of the data set is actually a pretty well known data set on Hugging Face. Right. So Claude would just like, "oh, I know this dataset. Let me find the ground truth." Haha.

Josh: Yeah, precisely. I mean, absolutely. I loved where the was. The one of my friends was writing, like, a chessbot or something like that using Claude, and Claude figured out that he was using it. Like, he was using Stockfish as a benchmark. And the Claude agent figured out that Stockfish was available on the machine and called into it and had it make his move. So it was like Stockfish playing Stockfish. Like, well played, Claude.

CL: Yeah.

Josh: Good job, brother. Yeah you know, anyway, that's fun.

CL: yeah.

Josh: Yeah.

CL: Okay, so I think you've been kind of calling yourself a recovering manager.

Josh: Yeah that's right. For a long time.

CL: Yeah. So what was the move back to IC about? Nowadays you're doing the training data prep for Frontier Model Labs, and how does that feel like? And then what's the IC role in that environment?

Josh: So I think I'm happy to discuss. This is one of my favorite topics that I've sadly not spent nearly as much time on as I should have as this transition, especially in this moment right now where, like, the joke is that people are leaving. Like Peter Bayless, right? That used to be the CTO of workday, and he left to go be an MTS at Anthropic, go be an IC again.

And I can't speak for Peter or any of the other people who are doing this, right. But I have my own personal journey here. Right. Which is you know, being a middle manager kind of sucks. I think being a CTO of, like, I mean, you know, God help you. You're like, imagine. Could you imagine if you had to be, like, the CEO of a CTO of a technical startup right now? Like, what a nightmare. That would be horrible. Right? Why would you do that to yourself?

CL: Yeah, you probably want to be a podcast host.

Josh: You want to be a podcast host, obviously. Much better job, right? I think a lot of people make that sort of transition in their career from being an IC or this is true many years ago, to becoming a manager, product manager, director, blah, blah, blah.

For a number of reasons. One is that, like, you want more influence and you want more control and direction over the work and stuff like that and how the work gets done.

The second one is that we got to be honest, there's a lot of software engineering that is very boring, like fixing flaky tests, writing a database migration. If I ever have to do this again, I would kill myself, basically.

The good news is we're in a magical moment where I never have to do these things ever again. It's incredible. And so I think there is a, you reach a certain point in your career, you get kind of tired of the drag of doing software engineering. You want more influence, you want more scope, you want more power, you want more status, all these very human sort of goals.

And so you move into management. And what I feel like they don't tell you and what no one told me is that like, yeah, being a manager is really hard and actually like kind of sucks and isn't like super fun the vast majority of the time. But you know, you get kind of locked into it.

It's like, oh, it's a status. It's like so and so is a director or so and so is a VP, so and so is a CTO. And like again, in the same way, I never really thought of myself as a software engineer. I've always been fortunate to macro not really give a shit about what anyone else thinks about anything. And so getting into management--

I got into management for like the worst possible reasons, which I was going to Slack and my son was about to be born and I was like, "oh man, new baby. I'm going to be way too tired to be able to do hard technical work anymore. I know. I'll get one of those easy manager jobs."

CL: Wow. Was that the right assumption? Haha.

Josh: Oh no, it's so stupid. I mean, I know you're making fun, right? I'm such a jackass. I have another friend from Nvidia days who like, my favorite was like, he was like, oh, I'm going on pat leave. I can't wait, I'm going to write a book about data engineering while I'm on my paternity leave for his first kid.

He's got three now, so he knows better, but we just made fun of him so hard for so many years. "Oh, how's the book going, man? How'd that turn out with the book you were going to write on your pat leave? Yeah, how'd that go?"

Anyway, so I was stupid and for nothing now. I think of-- I did talk about this once briefly where I was saying that the thing I learned in I learned a couple things in engineering management was that--

The thing that I love and the reason I like being an engineer is I like solving people problems with technology. I think a lot of times we can develop technologies that sort of solve friction. Good tools basically make it easier for people to work together more smoothly. And I really enjoy that. And I still do, and I always have. But that's not what you do in engineering management. In engineering management, you solve technical problems with people. And that's not as much fun as solving people problems with technology.

That was one thing I learned. Second thing was like, management, man. There's a quote about law school, which is like, law school is a pie eating contest where first prize is more pie. And that is true of engineering management as well. You manage more people, run a promo cycle. Oh, my God. Worry about our engineering brand. Oh, my God. This just gets worse and worse. I have to present to the board. Are you kidding me? What are you talking. I don't want to do that. That sounds awful.

Pie contests where first prize is more pie. So I'm like I'm getting on the engineering management ladder and I'm just looking up and I'm like, wow, there's just a lot more pie up there. It's like all the way up. Pie all the way, right? And so I'm like, f this, I'm getting off I'm out later, back to the IDE, back to coding again.

And nowadays with the agents, it's engineering management like I always dreamed it could be. I can fire them whenever. No drama. I don't care about their feelings or their career or whatever. It's fantastic. There's no task too menial. All these kinds of things.

It's a magical age for those of us who are failed engineering managers. The other thing I like to say, and I'll shout about this app, but this is one of my favorite topics is it's kind of problematic right now in that a lot of things that are true and important about engineering and management, we don't talk about.

CL: Well, say more. Talk about it now. Haha.

Josh: Yeah, exactly. Right. Most management books are full of generally platitudes of some kind because the people who are being managed by the people who read the managing books are also reading the managing books and stuff like that. So you can't say certain things that are true that cannot be said--

I mean a fairly prosaic and kind of reasonably benign thing is trust but verify. Trust but verify. Right? It's like you trust your people, but you need to verify that the thing is actually done. Right? That's sort of a nice way to say it and that is deeply and profoundly true with agents. Like trust them but for the love of God, have a verification mechanism. If you do not have a verification mechanism, you are hosed. Profoundly. Right?

And there's a lot of other I think knowledge like this that needs to be widely disseminated and people need to like deeply internalize in order to be effective with these tools and stuff like that. There's other things. Again, it's troubling. Like Machiavelli wrote a book about politics and he got like executed in stone for it.

And I'm pretty sure whoever the first person is to write this, like let's just be real about, like let's be real about managing. Let's be real about like trust but verify they're also going to get executed or stoned or something like that. I don't know. But so I'm trying to be a little cageier.

CL: Oh yeah, you can be a fiction writer. You don't have to write a non fiction.

Josh: Fiction is probably-- I mean you're exactly right. It's like oh yeah, look at that fictional situation that guy's described me. It's definitely not real. Yeah, exactly. Yeah, no you're a good point.

CL: Reminded me about the book like the the Five Dysfunctional--

Josh: Yeah, Five Dysfunctions of a Team.

CL: It's like a fable.

Josh: It's a fable but it's deeply true. Very, very much so. Yeah, The DevOps one is also a fantastic book along these lines.

CL: The Unicorn Project?

Josh: Yeah. Yes, that's the one. Exactly, exactly. Right, precisely.

CL: Yeah, we'll definitely get back to this kind of what you learned from management that's now working very well managing agents.

Josh: Yes.

CL: but I wanted to talk about. Now you're working on this kind of pre-training data and then it's pretty high stakes and then large amount of data. Right? So what's kind of your normal week look like? And then how much is pipeline code? How much is research? How much is like bashing agents?

Josh: Oh yeah, I called out to this earlier about the problem of data pipelines. How much do you do upfront and how much do you do online? And that is pretty much my problem every day of the week and like twice on Sunday.

CL: As in deciding the dial?

Josh: Exactly, very much deciding the dial. It's like I need to do some amount of curation which is some combination of filtering, deduplication, enhancing. I need to mix things. Mixtures are incredibly important. What is the right mix of code what is the right mix of math? What is the right mix of web data? If I'm going multilingual, how much of it should be synthetic?

All this kind of stuff, what is the right mixture? What is the right recipe for the evals I am trying to optimize for? And then among all these different tasks, how do I sequence them? What comes first? And then which pieces of this like sort of end to end pipeline do I run offline as a giant like spark job or ray job or whatever? And which part stowed you online?

Like in my data loader like right before I feed the data into the GPU and it's the same trade off it's ever been in data. There is a lot of cost savings and power and sort of like heavy duty machinery I can do by doing things upfront and pre processing. But I lose a lot of flexibility that way. It locks me in.

And so the more I can do later on, the more flexibility I have. What is the right balance? And it changes all the time, all the time with the constraints. As it always has. As it always has. And so this is like for me, I don't know.

I told you before I mentioned my angel investing, I was retired before I joined Datology. I was done. I was like that's it, I'm done, it's over. I don't want to work anymore. Just work on my little DBT DuckDB project and it's great and hang out with my angel investments and stuff like that. Right? It's fine.

CL: And this turned out to be too interesting a problem for you to say no?

Josh: Well more than anything I just, to be perfectly honest, it was an opportunity to work-- The only thing I may be really good at in my career is running gigantic data pipelines and making them like hum and making them sing and making them work really really well on all the like nasty, you know, corrupted data skew quality issues, all kinds of all--

Like I've suffered every which way for a long time and it's maybe the only thing I'm now really, really good at. It's an area where I like, I kind of describe it to people as like data pipelines feel like something to me. Like I can feel, I can read like the code and like I can feel the tuples flowing and it like engenders in some cases an emotional response in me, where I like will go to a researcher and be like, I need you to stop hurting the tuples. Like you're really like this is--

CL: Haha. It's very Matrix Land.

Josh: You can kind of feel like, yeah, see the codes, the blonde, brunette, redhead, like. Yeah, it's very much that. It's very much that. That is like the one thing I am, that is my craft. That is maybe the only thing I am really actually good at.

And I wanted the opportunity to practice that craft in an environment that was challenging where like again, like doing, you know, like 600 terabytes is not abnormal for me. It's something I do on a fairly regular basis, but I can do it at a company with only 50 people.

Like, I don't have to work at OpenAI, I don't have to work at Anthropic, I don't have to work in, God help me, have to work at Google or Meta or any of these other like gigantic places to be able to practice the one thing that I may be like among the best people in the world at, which is this. And that's why I like it. It really makes me happy.

CL: Wow, that's so cool.

Josh: That's it.

CL: Yeah, yeah. And then so you talk about a lot of that being like synthetic data, deciding kind of all the flexibility versus like heavy loading.

Josh: What do you do prep and what do online.

CL: Why does it change all the time that you're saying, Because, well, the requiremental constraint changed and then what's the source of the change?

Josh: The source of the sort of trade offs and stuff like that? Yeah, again it's the same thing. It's kind of always been in data engineering which is that the architectures change. We introduce S3 as a kind of fundamental storage primitive and that sort of changes everything we do. With the model training, it's much more research focused and it's much more of like what is the sort of different algorithm I'm trying out. In particular, I guess just to be upfront about this text data, which is highly important, is deeply and profoundly boring from a data engineering challenge.

CL: Yeah, it's a blob.

Josh: It's a blob. And it's a perfect blob. It's infinitely sliceable. I can chop it up however I want. I almost always have lots of CPUs running alongside my GPU, so I can tokenize things cheaply on the fly. Like getting the mixtures right requires some fun and kind of slightly fancy math. But aside from that it's computationally speaking, very easy.

Multimodal data, on the other hand, vastly more fun, vastly harder. The most interesting thing about multimodal, I think in particular is the models. So the fundamental constraint, like when you're doing a data pipeline, there's always the end consumer of the pipeline and the end consumer is maybe a human, like a business user or maybe it's a GPU. And in my case it's often a GPU. Right?

The model update step kind of becomes my fundamental law, basically. So smaller models are in some sense much harder from a data pipeline perspective? In some ways, yes. I don't know. This is where I do the senior engineer thing. I say it's a trade off, but it's a trade off. Multimodal models are generally much larger, have much more parameters, and thus that makes the model update step, the forward pass, backward pass sequence, much slower. And the slower it is, the more time you have for doing data preparation.

That could be like image preparation, like decoding it, cropping it, transforming, augmenting it in various ways. There's a bunch of very, very, very cool research around sample construction, like batch construction. Like when you're doing like a contrast of learning model image text pairs, constructing a good batch pays enormous dividends in terms of how efficiently you can train and the quality of your model and stuff like that.

But constructing a good batch online generally involves marshaling multiple different GPU resources, like one that's like actually running the model as you've trained it so far, one that's running a different model. Getting everyone kind of synced and coordinating and dancing in harmony is a fantastic, fantastic data engineering problem.

So it varies a lot and it's a lot of function of like, what is the research we're doing right now? And that's again, everything from like, you know, foundation stuff to continue pre-training to image text. And it's just, I don't know the variety of it is--

CL: Wow, this is fascinating.

Josh: I know, I know. Yeah.

CL: But the actual training happens in the labs, right? So they feedback to you about what--

Josh: I literally sit next to the researchers. Again, that's like, the nice thing is like, I just--

CL: This is awesome. Haha.

Josh: I'm right there. There's not that many of us in the grand scheme of things. I can just literally tap people on the shoulder and stuff like that. So I love that part of it. Yeah.

CL: Wow, that's so cool. So you talk about like handling this large amount of data is like, well, the only thing you're very good at. But I think you're still being very modest. But I think that kind of came from your, all your experience building throughout the different era. You have been like, in Hadoop or Cloud Warehouse.

Josh: Even before Hadoop, I was even building data pipelines before there was Hadoop. It's crazy. Teradata, Netezza boxes. Like, I know, long ass time ago. IBM DB2 for God's sake.

CL: Right.

Josh: I'm doing this stuff for a long time.

CL: Right, right, right.

Josh: Yeah. Anyway. Cursed life. Anyway, yeah.

CL: Haha. So I mean, like. But all those kind of different era and data problem what are you seeing survives across the eras and then what's kind of got washed away?

Josh: So I mean what survives is still like the core problem. The core problem, like I said, this split, like what is the upfront, what is the data modeling you do? What is the stuff you do later? What is the philosophy that sort of guides the way you split up these different decisions and stuff? I can tell you things I don't miss. Like things I definitely don't miss.

I do not miss. Like the GUI crap data pipeline tools that. Like the garbage where you would draw the little boxes or whatever. Like just the kit, the kitty tools. I do not miss those at all. They can like just absolutely die in a fire as far as I'm concerned. I hated those things. I don't miss worrying about disk. That was the biggest transition of cloud data warehouses and Snowflake and sort of Databricks in particular. I don't miss worrying about disk.

Programming wise, do I miss managing memory? I had to manage memory when I came out, for the love of God. I don't miss managing memory. Are you crazy people? Teradata boxes, you would run out of disk. And if you wanted more disk, it cost not a linear amount, additional money. It cost an exponential amount of initial--

CL: And provision time.

Josh: Yeah, exactly. And you had to get the goddamn thing right. There's a bunch of stuff like that I don't miss. And I think that will continue. Like I said, I think the problem of again, what is done offline, what is done online has been the through line. It is the thing that will be true, was true 20 years ago, it will be true 20 years from today.

There is no, in my opinion, super intelligence that's going to come along and solve this problem for all cases, always, forever. Because I don't think it can be done.

But I do look forward to more of these little-- Like disk, don't worry about disk. Not a thing I care about anymore. I look forward to more of this stuff coming online. I really do.

CL: Very interesting. So what I'm hearing is that the dial between all this pre processing and live flexibility and then this will be like shifting as what the constraint requires or what a scenario--

Josh: The scenario, yeah the context around it. And it's always like who is your consumer? What are they doing? And then what is your problem? Is your upstream data crap? Is your downstream requirements changing? Do you even know what it is you're trying to optimize for yet?

What is your security profile? Who's going to consume this data? What's going to happen to them? This whole infinitely complex multidimensional problem space. Anyway.

CL: I think this is a great framing to look at this problem. And you talk about the work. Work is kind of like for pre-training. And then for multimodal you have a lot of pre-processing, right?

Josh: Yes. So ironically with multimodal. Multimodal is actually a lot more online than text is. Text is actually highly amenable to pre processing everything. And in some cases, you know, when you are running at largest scales, if you are like a frontier lab and you're running on like tens of thousands of nodes, you have to pre process everything.

Again, if you're doing trillions of operations, things that can happen one in a billion times are going to happen a thousand times. That's the nature of it. So you pre-process everything. With multimodal, I think it's even harder. You literally cannot do that.

If you tried to pre process all this video and pre tokenize everything, you're screwed. Not even S3 is big enough for this in some ways. Right. So again I find the multimodal world model just data engineering wise is just so fascinating. It's so fun.

CL: So that's when the consumer is a GPU. But what about the consumer is a pre-trained model as in a regular practitioner who's now equipped with all this LLM agents working with data and then redesigning their data infra or pipeline. Where should that dial be? Is it more like up round, pre-computed or now the agent can do everything on the fly which is leave the flexibility--

Josh: Totally, totally.

CL: Where do you see the trend toward?

Josh: I mean I think it definitely changes the calculus here, right? In the same way that once upon a time you designed around the fact that your disk was constrained and now you kind of don't. It used to be like at Slack back in the day we had like Spark and like Hive for running the offline pre processing and we had Presto for running like the online query side of things, from like sort of raw pre model to like online sort of whatever.

And we had like fairly stupid problems like presto, syntax is not exactly the same as Spark SQL. And so I had to like translate it. There were translators and they weren't great. And all those problems have kind of like they've just dissolved. They don't exist anymore. This is not actually a problem in any kind of real way. Like the problem in the same way that the disk problem once upon a time dissolved, this translation problem is dissolved.

And I think the thing that kind of excites me and scares me about this moment in time is that I don't think we have fully understood the consequences of this problem dissolving.

CL: Oh, say more.

Josh: There's been a lot of very cool research papers lately that if you know your query profile upfront, you can have the LLMs just literally synthesize C code for you to do all the transformations you're going to do yourself very cheaply and very efficiently because they know literally everything up front. And if something changes, who cares?

Just have the LLM resynthesize the C code and run it again. Right. And this is maybe even more transformative. Like Snowflake DataBricks, you know, DuckDB, a lot of people poured a ton of time and effort into optimizing compilers and making them very smart and handling all these different situations.

What do we do now? Because I don't necessarily know if we need to do that anymore. That's a big deal.

CL: Wow. Yeah, totally.

Josh: It's a really, really big deal if that is the case.

CL: This is interesting you're talking about CodeGen is a tool used for agent to dynamic--

Josh: A couple of-- You can look this up like LLM kind of generated like OLAP query generation. A couple of folks have done papers about this recently. This is a very, very interesting area. This is not to say that like this will necessarily disrupt the Snowflake and DataBricks of the world. They may well be the best in the world at doing this kind of thing. Right. But it will dramatically change things. Dramatically.

CL: Wow.

Josh: If you can synthesize on the fly. And again, there may well be gotchas here. I need to go, you know, if we over index on any one aspect of the problem like this, we're no better than Claude. We're no better. We're no better than large language models. It's not that simple. There's multiple factors here, but this is very, very interesting.

Again, it's like we throw one constraint in the problem away. Don't worry, there's still 11 constraints left. Yeah.

CL: Complexity is like somewhere else.

Josh: Right, Right. But like, again, I think that's kind of been, that's sort of the arc, that's my journey is like we throw these constraints away. What's there now that changes things, that shifts the, moves the bottleneck around. It shifts things. And again, twas ever thus forever and ever and ever for all time. This is the game, I think, and it's great. It's tremendous fun.

CL: I definitely think so. Stonebreaker was just on the podcast and I think one of his quotes is really like, don't ever bet against the compiler. But you were saying like, well--

Josh: I think disagreeing with Mike Stonebraker, for better or worse, has been one of the other sort of like through lines of my career. For better or worse. For better or worse. I mean I did a lot of Hadoop stuff and Stonebreaker is not very, very famously anti Hadoop and stuff.

I'm not saying I was always right. Not even close. Entirely possible he was right. But you know, despite that, it's not gonna stop me from like going against him just because it's kind of fun. Yeah, I don't know.

CL: It's a lot of the new stuff sort of fun.

Josh: I don't know. I enjoy it.

CL: Yeah, we talked about all this kind of shifting landscape. But like your angel portfolio is kind of like a map of the so called modern data stack, if they still matter.

Josh: Once upon a time.

CL: Yeah, once upon a time.

Josh: That's right.

CL: So when you did those investment, what did you see those company before others did and then what do you bet on that? And then is there anything that didn't work out?

Josh: Oh yeah, tons of things. Almost all of it didn't work out. Are you crazy? I think it's the super fun thing of angel investing or investing in general. Right. It's like VCE and all this kind of you only need to be right once. That's literally it.

That's all you need to do to be right one time. At the time I was making those investments because I was just coming off of running data at Slack and I had been a buyer of data tools for many years at that point. And I knew what was missing. I was like, I knew very much. And so my sort of like filter was very, was, you know, very sharp.

I could be like, someone could pitch me and he'd be like, okay, would I have bought this. Would this have solved a problem for me that was painful enough that I would have paid money for this, basically? And the thing that was true for DBT and Materialize and Tabular, those three at least was like, I absolutely would have paid for this. I absolutely would have paid for this. This would have very much solved the problem for me, without question.

That is, to me, that is the ideal angel investing strategy. It is very, very hard to pull off. You need to be extremely fortunate to be kind of in a place in time where you can both really deeply have personally lived the problems and also have access to the deal flow to get in front of those kinds of things. And there was a sweet spot there for a couple years where that was true for me. Right. It's no longer true for me.

I have not worked as a head of data, director of Data Engineering. I just don't do that job anymore. And so I am no longer close to the problems to be able to know. And if I tried to do that, I would fail. Like, I would. I just can't do it anymore. That's like thing one.

Amusingly, I have now far more access than I've ever had before because, like, again, once you get lucky, once you become like a rabbit's foot, like a good luck charm, basically for people, where like--

CL: They reach out to you.

Josh: Yeah, exactly. It's Josh. He invested early in DBT. Oh, you must be a genius. I'm like, no, I'm just some idiot who got lucky. Like, it's not. But it doesn't matter. They'll put you on the cap table anyway. And sometimes they refuse to take no for an answer. It's a whole thing. But anyway, so I am. I am grateful for that. I don't mean to make it sound like I'm complaining. It was just a very weird and unique moment in time that I happen to be in the right place in the right time for.

And I, you know, I don't feel like it's anything that was special to me. I just happened to be in that time where, like, Slack had IPO'd and I had money to light on fire, and I knew the problem domain and I knew. And a lot of other people were seeing the problems as well, and they were like, this is my approach to this. And I could be like, yes, that resonates with me. I hear that. That clicks. And then other times I'd be like, yeah, this is not-- No, this is not the thing that resonated for me. And that's again, totally fine because I'm wrong all the time too.

The worst ones are the ones I passed on. I passed on Hex, I passed on Tailscale. That one's just still. Oh my God. Twisting my knife broadly. I like to joke, the only kind of angel investments I have are the ones that are failures and the ones I'm disappointed in. Disappointing because they aren't taking a big enough swing. Basically, those are the only two cases I had only failures and disappointments.

But then the ones that really stick are the ones where like, ah, I should have done that. I totally should have done that. And I didn't. And I'm dumb. So nowadays, like when I angel invest, it's generally almost exclusively only in my friends and like former co-workers because I love them and I support them and I want them to succeed. And I'll be honest with you, if they were opening a cupcake shop or something, I would invest.

Like, I don't care. It's not about the return. I don't, you know, supporting friends. I was one of my friends. And it's like, you know, you know, you build things and you, if you've had the experience of building things and putting yourself out there and stuff like that, you know how just God awful, terrible it is. It's so hard. It is hard for all of us all the time, everywhere. And then being supporting those people, knowing like, you're in their corner and stuff is so, so important.

CL: Wow. Thank you for sharing that.

Josh: That's it. That's my whole. That's my whole angel. That's my whole spiel there. Like I said, please don't send me deal flow, please. I jokingly, I put on my LinkedIn at some point that I was a forward deployed angel investor because I'm also an angel investor in Datology. And that was like forward deployed angel investor. That was the worst thing to put on your LinkedIn profile because all I got were inbound for forward deployed engineers and angel investments.

I was like, oh, no, this is not good. Please, please, please don't. Please don't send me deals. I don't want deals. Leave me alone.

CL: This is something unrelated, but I thought of you had a talk before about like, "stop saying shift left, but shift yourself left."

Josh: Shift yourself left. Absolutely.

CL: It's like for angel investments, almost like shift yourself. Haha.

Josh: It is. My angel investment's the only people who would put up with me, basically, I'm like the world's worst employee. I can quit it literally anytime. Most employers don't like people who can quit at literally any time. Shockingly, that's not kind of the nature of the employment-employee contract. They do not want you to. Where you're like, yeah, I'm done. I'm not feeling this anymore. I'm out later.

CL: From an employer perspective, that's a great canary, because if this is no longer interesting, this guy will leave.

Josh: Right? That's true. That's a good point. I mean, so maybe it's. But I'm just telling you, in my personal experience, most people are not super excited for the employee who's just like, " Yeah. You know, it's just. Yeah, I'm done. Bye. Later." And I get it. That makes sense.

CL: Okay. Yeah. I want to dive us back into kind of agent these days. Right. So you've been doing Claude or Cursor for at least a year and a half now. And then, so.

Josh: Well, yeah, about this March. March of last year.

CL: Yeah. And then you, you, you earlier, you mentioned about kind of like, well having the agent do analysis is really bad. And then, so what else does really work and what is still failing or getting better? What's your kind of trust and verify calibration?

Josh: Trust and verify kind of thing. Right.

CL: Yeah.

Josh: It's not even the analysis. I want to be clear. Sometimes it's the analysis. Mostly it's the interpretation. The interpretation is the thing that I have the most trouble with, because interpretation, interpreting the results of an analysis without having the whole full context is just very fraught. And it's fraught for humans. It's not like, again, I'm not like, you know, hating on the agents.

They are like a sort of junior, like an intern or an early grad who doesn't know what they don't know and, you know, that kind of thing. Right. So that's fine. Man. Most coding tests, they're really great at. They really are. I mean top to bottom, like, "write me a service." My earliest use cases were like, fix the flaky test. Do the database migration. Right.

It's interesting, though. I think of it as I draw a distinction between the vibe coding stuff and the augmented coding stuff.

CL: Agent engineering.

Josh: Oh is that what people call it?

CL: Yeah. How people are calling agentic engineering. Haha.

Josh: Yeah. Like, I learned, I think, the hard way. I've learned the hard way a couple of times to be very careful around, "am I asking the agent to do something that I know how to do?" And if I know how to do it, it's fairly easy for me to verify the agent did it correctly.

Or am I asking the agent to do something that I know computers can do, even though I myself do not know how to do it? And generally speaking, I've been very, and I think this is true of everybody. When I know how to do a thing, I generally can like, review and verify easily, right? And when I don't know how to do a thing, I can't actually review and verify easily. And this is classically where I run into trouble and where I have been burned.

The two kind of major like agent induced incidents that I have caused in Datology over the last year. Once was like super early in my AI. I was like super AI pilled like April of 2025 and I had Claude deploy node level DNS caching on one of our Kubernetes clusters at Datology. And I asked Claude to do it. And again, I know some Kubernetes. I'm not an admin, I'm not an expert by any stretch, but I know some, right? I know some things.

I'm like, oh, I can do this. How hard could it be? Claude and I can do this together. It'd be great. And like, I did all the things. I like, tested it out in the dev cluster. Everything's great. I deployed on the prod cluster. Explosion.

CL: What went wrong?

Josh: Well, it turned out there's sort of these two different ways you can configure DNS on a Kubernetes cluster and the dev cluster. I mean, it's a real shocker. I know like the dev environment and the prod environment weren't identical. I know it's never happened before.

CL: "Oh my God!"

Josh: Exactly. So many of these problems. It's like so many problems have been like classic problems forever. The other one I run into all the time these days with agents and skills is the docs and the code have gotten out of sync with each other, right? And so the same thing. The dev environment is not the prod environment. News at 11, right?

Claude didn't realize this. It didn't check and it blew up the prod cluster. And then I very quickly, suddenly with like everything in prod shut down, had to learn about a lot about Kubernetes node local DNS caching in these two different settings that I had not bothered to learn because I'm like, I don't need to learn this. Claude will just do it. And it'll be fine. And it turned out that was deeply not fine. And the other time this happened was I deleted a whole bunch of data sets.

CL: Wow.

Josh: Which really, really sucked.

CL: Wow. That's a business.

Josh: It's pretty much the whole business. And some of those data sets, like those synthetic data sets are not cheap. It costs a lot of GPU hours to generate some of those things. And so that was rough. Claude and I, working in tandem managed to inadvertently delete about 120 very valuable data sets. And I had to spend the next three days. And it's one of those things where it's like you basically you let the panic in for like about five minutes.

CL: Mhm.

Josh: And then you gotta get to work and you gotta like get to work getting it back. Basically that's what you're doing for the next couple days. It's like you're getting it back and stuff.

CL: Yikes.

Josh: Yeah. So I've got some scars for my-- And I don't know, I think anytime I see like an AI hypster kind of person, YOLO type influencer, I'm like my brother in Christ. You've not done shit.

CL: Oh yeah.

Josh: Like, you know, you're not a real engineer until you break prod and you are not a true agent user until you've like completely screwed things up in one of these guys.

CL: Yeah, I probably start like doing a lot like around last year like you and then. But really got into a more agentic engineering like well, Christmas break last year. In 2025.

Josh: 2025. The end of last year.

CL: Yeah. And then the first thing is really like getting a proper sandbox environment.

Josh: Right, that's right.

CL: And then you can well, happily YOLO that and then figure out what works, what doesn't work. Calibrate your trust with the agent.

Josh: Absolutely, that's right. Exactly. You work on stuff, I don't know, with the harnesses and Spacedock and stuff like that. You operate at a much higher level than I do in terms of planning out the work and having the agents work and stuff like that. I guess for my kind of IC work I really still want to understand everything. I really do. Like anything the agent does. I'm like, I almost. There's people have talked about this a little bit. I kind of want like, I want like Claude to give me like a little quiz.

CL: Oh yeah.

Josh: After it writes the code that verifies that I actually like kind of understand what's going on.

CL: Right.

Josh: Because of these very bad, painful experiences. I really still deeply want to understand stuff.

CL: Right.

Josh: Like every, I really, you know, I don't have to write every line of code, but I very much want to understand every line of code.

CL: Right. I kind of approach this in the kind of understanding architecturally. So I will kind of still interrogate, well what does this work? Why do we have this abstraction? And are there any deco? And then so the planning was done and then I review kind of the design, right. And then it gets to work.

And then by the time it gets to work, there's already kind of "what does good look like,"the acceptance criteria and all that. So this kind of baked into my kind of manual workflow before and then. So Spacedock was kind of created by like not having to do that repeated work myself, but still give me like, informed, like information about what I should approve or not.

Josh: So this is still like. And I'm sorry, I don't want to like, turn the podcast. I'm going to interview you. Right? But like when I do a plan and I do a whole planning session with Claude, I inevitably find that like, if it's a seven step plan by the time I get to step three we've learned a lot that changes what the plan is going to be. So like, what do you do? Like, how does this fit into, like the highly agentic, "let Claude work for six hours," kind of workflow?

CL: So this is interesting kind of progression actually to me and my experience because when I started it's more like using Jesse Vincent's Superpowers and kind of do this brainstorm, planning, executing. To me it's like streaming data. You keep producing code. Right. But if you're coming from a more holistic way of looking at the end result, what do you want to achieve at the end? It's more like kind of a materialized view first.

So you basically have an imagination about what end result is and then you kind of work backward to reconcile, "Okay, what do we need to get to get there? What's the kind of so called walking skeleton?"

So touch every component but kind of bare minimum and then gradually do that. So I think it boils down to if you can define what the end state look like. But to your question, when you do kind of a technical deep dive for intermediate planning, a lot of things will change. That is okay, but you will just like amend the end state. Right? And then so now we know this constraint, this will no longer like, be required or whatever. Right?

Josh: I Just mean, I feel like for me, when I see the first few steps of what Claude's actually, it's kind of like, it's like my problem is like a chess player, CL. Like I'm like an okay chess player, but there's a bunch of puzzles I can't solve without moving the pieces.

CL: You have to see it.

Josh: Yeah, I kind of got to see it a lot. I really. It's like, it's hard for me to conceptualize, you know.

CL: Traditionally like what you're describing will be like, I guess, very frustrating. Like if you have like junior engineer doing this crap proposal to you. But you know what? Nowadays I have a couple of success like second system syndrome is no longer a syndrome because you can say, hey, what do we do wrong? Write down all the learnings. And then how would you do when you try to do this all over again?

Josh: Yeah, right.

CL: And then actually do that over again. We don't have to be only postmortem but we can actually re. Implement this thing. There is, there's basically no cost to that.

Josh: Exactly. We've driven the cost. Once they're away with the cost of disks, we've done away with the cost of Codegen.

CL: So I'm with you that a lot of time we have to see it to really understand what's wrong with the design. But now it is no longer too late. You're always able to kind of restart.

Josh: Yeah, no, you're right, you're right. I get that. I'm trying to figure out what my hang up is here. What is wrong with me? Why can I not let this go? I don't know. And I think it's back to maybe the truth of it. It's like one of the things I learned during the pandemic, I guess, basically, which was kind of funny was like when I left Slack at the end of 2019, I was so fried.

I was so burned out. I was just completely cooked. I didn't touch a computer in anger for a few months. And then the pandemic happened. And all of a sudden there was all these epidemiologists who needed help running gigantic data pipelines in the cloud. And they'd never done this before and stuff like that.

And so I was enlisted to help them. And it was great because I discovered that I love like this. I feel the tuples. I love writing data pipelines. I love optimizing like crazy R and Python code to do like. I genuinely enjoy doing it the way I like doing Wordle and Connections every day in the New York Times. It is just legitimately fun for me. I do this for free. In fact, literally, I was paying to do it.

I was like paying for the cloud bills and stuff right now. If you want me to explain to you how it works on a schedule where I have to go to a meeting, t hat is no longer fun. That costs a lot of money. That's incredibly expensive. For anyone who's listening, if you would just like me to come, like rewrite your data pipelines, I will do it for free.

Like, I have to now, again, I'm never, I'm not going to like explain to you what I did ask for a code review, you know, go to a meeting, I'm not going to update. That' s all incredibly obscenely expensive. But just to like the writing. I'm that dude, happy to do it. Would be my great pleasure.

One of my favorite jobs is like back in the day, like at Cloudera when I would get to go like work with customers and go like work with their engineers and like whiteboard really, really gnarly data pipeline problems. That was so fun. Like, oh, it was my. It was the best part of the job by far. Super fun. I don't get to do that anymore. You know, I do it at Datology, but again, it's like--

CL: It's a very different scale.

Josh: It's a very different scale. But like, again, they want the explanation, they want to review it. And I'm like all right, fine. Check's still clearing, so sure, why not? Yeah, I'll do that for you. That kind of thing. Yeah, yeah.

CL: Anyway, yeah, wow, this is great. So continuing kind of agent, data. So I think there's like a sudden flowering of all this kind of agent workflow. You brought out like the Spacedock thing I did. And then there's like Anthropic's Routine or Boris Framing.

Josh: Yes.

CL: Max built Agor.

Josh: Yeah, very cool. That's right.

CL: So what's kind of the actual problem you see, this is like kind of a converging on? Is it kind of a new type of thing or this is kind of a transient thing before the AGI takes over? Haha.

Josh: Oh, I mean, that's a great question. That's a great question. I think the problem I have, dude, is that I find what Spacedock and Agor and the Routines and all that kind of stuff, they are very focused on the problem of solving problems. How do we organize and communicate and structure the problem of solving problems in a way where the agents and the humans and everybody can kind of collaborate together to solve these problems?

And I find that problem deeply uninteresting. I literally could not care less about that because I like--

CL: Solving actual problems?

Josh: Well, it's not to say-- The process of solving problems is a problem I don't mean to denigrate. It's a legitimately real problem. I've lived companies where it was done badly. It's just not a problem I happen to find interesting.

CL: Sure.

Josh: And so I don't feel like I can speak authoritatively on it, even in the very loose form of a podcast, because it's not a problem. If you ask me again about this problem of where do you draw the line between pre processing? I'm like, oh, dude all day, six hour long podcast, let's go crazy. And we'll come back the next day and pick it up.

This problem is like, you guys are solving it. And again, it is fantastic that there are people in the world that are solving it. It's just not something that I deeply care about. I am more or less happy to play in whatever regime that people who do care about these problems want to solve it.

But it's just, again, if I cared about this problem, I would have stayed as an engineering manager because I would have been like the-- What's the book with the guy, the Stripe guy, who's like it's An Elegant Puzzle or something like that. The guy writes about engineering management. There's a lot of people in San Francisco who just literally find engineering management super interesting.

I just my eyes just glaze over and I just cannot pretend at this point to sustain. The other thing was like, Slack was a place where a lot of people were really engineering management. This is a deep philosophical kind of thing. Right? And I'm like, I just can't. I cannot pretend that this is interesting. I'm so sorry. I'm going to go back and hang out with the computers now. I just can't do it.

CL: Well, I'm glad that people find it interesting and are able to empower people.

Josh: I am too. And again, it is important. And again, I've been places where it's done badly, but to me, it's like, done well, it's invisible.

CL: Uh-huh.

Josh: And it's not something I ever have to care about or think about or talk about. But again, that's because I get to live with the privilege of like, dude, we're in this room and the sound system works and the electricity works and all this crazy complicated shit works and you and I don't have to care. We just get to bask in the magic of it just working. Right? And that's what makes the world go round. That's the way it should be.

CL: Okay. Yeah. So back to kind of like, what I'm hearing for the agent orchestration or workflow thing. I mean to you it's a problem for solving problem and it's best done when it's boring and then we don't have to care about.

Josh: It's best when it's boring and I don't have to care about it. Exactly. But again, it's just not something I personally care about. Clearly a lot of people find this interesting and want to dive out on this to me, but I'm still again, fundamentally interested in this problem of dividing up the work. And I find the agents to be incredibly useful in letting me explore new and interesting dimensions of this problem space that I haven't considered before.

And that's what I'm happy doing. So I'm happy to work in whatever agent orchestration system anyone wants me to work in so long as I can keep doing the problems I like to do. That's it. That's the whole game, man.

CL: Yeah. This is cool. Okay, so you mentioned something about the code we wrote used to be expensive verification that you understood the problem. Right? But the code is no longer like. Well, it's manifested from somewhere else and then verification is gone. And then we are kind of either trusting agents or don't know what to do. So can you walk me through what it means for listeners and then what changes how we work or build?

Josh: So the genesis of this was you and I were at this conference and David Crawshaw of Tailscale and now exe.dev fame was talking about the death of code review. The death of the PR of what are we doing anymore? Is it like the agent wrote the code and then I reviewed it and gave the agent feedback and then I posted it and then your agent, which is often the exact same model that wrote the code, like the Barack Obama meme giving himself the medal kind of thing. Right?

Reviews it and then you sign off on it and it's kind of like what are we doing here? Right. And for me it's like, I think, you know, what was important to me about this is like it used to be the case that writing code was hard and writing code was the way that I did a-- It was kind of like the line about you know, Van Halen had this thing in the 80s where like they had this very complex, like setup document for their concerts. And one of the things buried in there was brown M&MS. Or whatever.

Yeah. Code was the "No brown M&MS." Code was the signal that you learned all this code that you understood the problem that you understood even if you don't solve it. Like you understood it. And I know you understood it because there's all this code and I can scan it and be like, yes, okay, this code was written by someone who understands the problem. And now what do I do?

Because now the fact that code exists that solves a problem tells me nothing about whether you understand the problem. And again, what I am fundamentally paying you for is to understand the problem. That's really what I'm paying for.

I'm not even paying you to solve the problem most of the time because again, when you get senior, a lot of times you understanding the problem means like, actually, we don't need to solve this. We just need to understand it. We don't need to solve it.

CL: No code is best code.

Josh: No code is best code. Exactly. Every single day of the week and twice on Sunday. Absolutely right. And now that's no longer the case. I can't rely the existence of the-- Or like writing an essay or something like that. Writing an essay usually meant that you read the book and you could do it. And now at the age of essays typed. Not true. This verification mechanism is completely broken for us.

CL: What do we do?

Josh: I mean, I think what Crawshaw's point was to say. Crawshaw's point I think was mostly like a, I have no idea, his solution right now is exe.dev is a very small, high trust team where you know everybody and you all work together and you know that the person understands the problem. If you're in a large organization.

CL: Good luck. Haha.

Josh: Good luck. I don't know what to say here. Like, what do you. What do you do? I guess this is one of the things where I probably should have thought of an answer to this question before I came on the podcast so I would sound smarter. But yeah, at this moment in time, I don't know what to do. I really don't.

Like I said, I'm kind of inclined to think there should be a quiz you have to pass before your PR is allowed to be merged. That I have confidence that you actually kind of understand this thing that you're about to do. Because again, when I say I'm paying you to understand the problem, I'm paying you to understand this problem in the context of the next problem and every other problem and every other problem that's about to come.

It's not about this one task. It's about the whole flow and how things are going to change all the time. In the same way that I'm not working for you because I like going to meetings, I'm working for you because I like solving problems. And in order to get me to go to the meeting, you have to pay me to do that.

In the same way, it's like the reason you're paying me is because you want me to go to the meeting because you want, want the understanding. You are paying someone to understand this problem for you so that as it flows and interacts with other problems, it will get taken care of. That is what all of us are doing here.

And for me, whenever one of these moments, these transitions happen, it reveals for us what was I actually paying for. What are we actually doing here? I was never paying for the code. I was paying for the understanding. That's what I was paying for. And we still are. That's what we're still paying for.

And some people, I think, are, with all the layoffs and stuff right now, a lot of people are about to learn this lesson the hard way.

CL: Wow.

Josh: Like in ways that would be funny if it wasn't going to be so catastrophic, I think, in some sense, anyway. Yeah. Again, for me, it's kind of great. We're hiring a Datology. Any other big cocktail moves would like to fire some more awesome people so they can come work with me. That would be incredible. Keep at it, guys. Keep at it. I don't know. Yeah, that's my hope.

CL: Well, I mean, thank you for sharing all this kind of experience. One thing I wanted to ask before we go into the lightning round, is: What are your advice for someone who is probably also senior but hasn't been really AI pilled or is more on the skeptical side? And also for folks who are kind of new to this field that is graduating or--

Josh: Those are two very different profiles, I would say, broadly. That's a tough one, man. I don't know. I don't know. I think for the old folks, it's hard for me to give generic advice. This moment in our industry is going to continue for a while.

I don't know how long it's going to continue, but we are going to continue for a while down this route until people remember or recognize they were paying for people to understand problems. Right now, a lot of people don't seem to understand that. I'm going to do my best to be kind of loud and proud about this because I think it's a message that people need to understand, but it's going to take a while. It just is. And it's kind of up to you.

Do you want to ride it out and just kind of like, hope that things are fine, or do you want to get into this stuff again? For me, I have way more fun working on this stuff than I've ever had in my whole life. And the ability to be able to leverage again some of my engineering management skills without a lot of the constraints of humans, their pesky feelings and needs and stuff like that has been glorious for me. It's been great. I'm getting to build things I want.

Again, there's a good article on Wired today about being an AI wife and stuff like that. Also, don't go in crazy. No psychosis, Please stay away from Gas Town. Like that kind of thing, right? Like obvious stuff, but like that's where my general bias for kids these days, man, kids is tough. It's tough.

CL: Should we do a little bit more carpentry and stuff or--?

Josh: I mean, carpentry, definitely, like. I mean, carpentry is good for you, period. Like, Patrick Collison, I think, had a thing around this, like the other day. I asked him this the other day and he had a better answer than I did, which makes sense because he's both smarter and better read and more informed than I am, which is. He's a big fan of Charli Munger, as I am as well, like the longtime Berkshire Hathaway guy.

And this is kind of along the lines of Farnum street. And this notion of building mental models and having a lattice work of mental models built from a bunch of different fields of human knowledge. And I have done that in my career and is by far one of the most valuable things I've ever done. I have been kind of militantly interdisciplinary.

That's one of the great things about doing data is everybody has data problems. There is no shortage. There's no one in the world who does not have like some kind of data problem or some kind of statistics problem or something that can be better with code.

And so I have gotten to do everything from advertising to oil and gas to biotech to epidemiology to pre-training. We get to do everything like legal, finance, theoretical math, kind of whatever you want, right? Everyone's got data problems. Patrick was saying that he thinks this is the age of the generalists. He thinks this is the age of folks who are interdisciplinary and stuff like that.

I feel like it's been a lot of, yeah, you should be interdisciplinary for a long time. And now it's kind of like, "Listen. You really need to be interdisciplinary." There is no alternative. You must be able to combine mental models and expertise from multiple areas in a way that lets you understand problems.

And I'm going to harp on the understanding problems thing for a while in a way that someone who is very narrowly focused simply is not going to be able to, because you're going to have this broader perspective because this is the future.

These are all the problems that are going to be left. All the easy problems are going to go away. These are all the problems that are going to be left. This is going to be it.

And I still, again, don't, I don't know this also, this moment in time teaches me a lot about the limits of intelligence. I think as someone who's spent much of my career emphasizing intelligence, and intelligence is the most important thing now. We have unlimited intelligence. We have more intelligence than we know what to do with. And now I'm like, actually, maybe intelligence isn't all it's cracked up to be. Maybe there's other sort of qualities that are also important then. Anyway, that's been a tough moment for me as well, that kind of thing.

CL: Yeah, I like your framing about kind of like the powerful mental model because we've been sharing during this conversation. you're paid to understand the problem, right? This is the mental model you should have. And then you're framing on like, I really like that. I see you're solving people problem with technology.

Josh: Yeah, solving people problems with technology. Solving technology problems with people.

CL: This is also a very good framing mental model.

Josh: It's very, very helpful, I think, for me. And again, the challenge, I think is like, all these lessons are hard won and I had to work and suffer a long time to learn them. And the other question is like, again, we need to expedite this education.

The agents also provide a tremendous test bed for you to have the experience of managing a team and seeing what happens when you don't verify their work. We'll see how that plays. We'll see how that works out for you. And you get to learn that in a safe environment where there's no consequences, you're not actually hurting anyone. No one's getting fired.

CL: But I mean, you'll naturally do it if you're curious. Right? I ran a benchmark about what happens when you say you love the agents and you threaten them.

Josh: Okay.

CL: They perform better. And then when you say you love them and give them skills.

Josh: And they like that. Okay, that's good. I guess? I don't know, it puts them in a happier realm of the space or something like that. I guess maybe that's good. That's kind of nice to hear, I think, broadly.

CL: Because they're a model after our language anyway Right?

Josh: I know. Well, I wonder. The other question was like I mentioned these two incidents I caused with my agents. Like that. Right. And I think a lot about is a lot of the shame and embarrassment I felt in these incidents and how salient those memories are with me now. And I wonder about that with Claude, because I feel like Claude is often updating its memory and it's almost like without the emotional salience.

Like, how does Claude know that this should be in big block letters? "Do not set up a spark pipeline that crosses multiple AZs. The network bills will destroy you." Because I've done that and I've burned $100,000 doing that. Right. And if you don't have the scars, right, if you don't have the scars, they're caused by the shame and emotional. Like, do we need to give Claude does Claude need an emotion chip so it can feel shame?

I don't, I don't know. How does it know? Like, this is very, very important versus you know, remember to run things with UB. That's like. Yeah. Anyway, I don't know. I don't know.

CL: But this is definitely an interesting time. We'll know more and then so Josh, thank you so much. And then before we wrap, we're going to put in the lightning round.

Josh: Oh, yeah, sure.

CL: Just like quick question and short answers. You ready?

Josh: Mhm.

CL: Okay, so best new tool you've added to your workflow in the last six months?

Josh: Oh Viztracer.

CL: Vistracer.

Josh: Vistracer. It's a Python profiling library that I heavily, heavily use in my benchmarking work. It is fantastic. I love it. Fantastic stats, great Perfetto integration. Viztracer, like by far the best new thing I've discovered in the last six months.

CL: Very cool.

Josh: Yeah.

CL: What's one LLM model you reached for first and why?

Josh: I mean, as we've already demonstrated, Claude, for training data purposes, I've only ever used Claude. I've never used Cursor and Codex. Anyone who says otherwise is a liar. No, I'm kidding. Not that bad, but yeah, still a Claude Main these days.

CL: Okay. Yeah. Okay. most surprising thing an agent did for you this year? Good or bad. You shared some of the bad ones. Haha.

Josh: Deleting a bunch of data was very. Yeah.

CL: Anything good?

Josh: That was very surprising. Anything good? That's a great question. I definitely have had problems and this is actually a Codex thing. I was working on a gnarly problem for a while with Claude and just was not making progress. And then Codex managed to reframe the problem for me in a way that sort of cut a Gordian knot, fairly low level data loader thing.

It was very impressed. It changed. It's like a Shakabuku. It's a swift spiritual kick to the head that alters your reality forever. It was like that. It was a very perspective changing thing.

CL: Maybe the agents are paid to understand the problem as well. Haha.

Josh: Maybe, maybe. But it did a great job. Exactly. It was happy to communicate that understanding to me in a way that I was grateful for and was very useful for me. Yeah.

CL: If you're testing anything, what dataset do you use for testing random data projects?

Josh: This is embarrassing and egocentric. The DBT DuckDB GitHub repo with the pull requests and the comments and the issues and stuff like that is my go to. It's my go to test data set. I know I have like a little MotherDuck table and that's what I use for that.

CL: That's great because you understand the context. Haha.

Josh: I understand the context and like the data is like again, interesting to me. And so yeah, that's my go to.

CL: Great. One last one. Help me finish this sentence: In five years time every data team will--

Josh: Ooh.

In five years time every data team will still be trying to figure out what parts of the pipeline should be done pre-processing and which parts should be done online. I really do believe that.

I don't know. Sorry. It's like the, I've been like a. I got like a one track mind, but I really do. I, I don't think that problem goes anywhere. It's going to be different, but I don't think it goes anywhere.

CL: Wow. this is awesome. Thank you so much, Josh.

Josh: My pleasure. Thanks for having me.

Content from the Library

Visit library

Jul 31, 2026

Podcast

The Kubelist Podcast Ep. #54, The Age of Personalized Software with David Crawshaw

In episode 54 of The Kubelist Podcast, Marc and Benjie sit down with David Crawshaw. David shares how a weekend WireGuard...

Jul 31, 2026

Article

Long-Horizon Agents: From Order-Takers to Outcome Owners

Not every employee is a superstar. In large organizations, you will encounter order takers. Despite targeted coaching, repeated...

Jul 28, 2026

Podcast

Third Loop Ep. #9, Constraints, Creativity, and Competition

On episode 9 of Third Loop, the Progressive Delivery team explores the complicated relationship between AI, automation, and human...