DEC 23, 2025

70 MIN

Ep. #5, The Identity Crisis of BI with Benn Stancil

GuestsBenn Stancil

light mode

about the episode

On episode 5 of Data Renegades, CL Kao and Dori Wilson sit down with Benn Stancil to explore how data tools evolve, and sometimes lose their identity. Benn shares lessons from building Mode, the risks of drifting from an opinionated product vision, and why most companies struggle to turn data into meaningful decisions. The conversation also dives into the future of BI in a world increasingly shaped by AI and unstructured data.

about the guests

Benn Stancil is the co-founder and former CTO of Mode Analytics, a SQL-first analytics platform acquired by ThoughtSpot. He writes extensively about data, analytics, and decision-making, and is the creator of ADE-bench, a benchmark for evaluating real-world analytics agents. Benn is also known for his thoughtful critiques of the modern data stack and BI tooling.

show notes

Benn Stancil on Linkedin
Benn Stancil’s Substack
Mode Analytics
ThoughtSpot
ADE-bench
Hex
dbt
Amazon Redshift
Stay True by Hua Hsu (book recommendation)

about the episode

about the guests

show notes

transcript

Dori Wilson: Hi, I'm Dori Wilson, Head of Data and Growth at Recce. Welcome to another episode of Data Renegades.

CL Kao: And I'm CL, CEO of Recce. Today our guest is Benn Stancil. Benn started Mode, one of the first code-first and self-serve analytic BI tools. He's been a prolific blogger writing about the data ecosystem, or "the modern data stack," as we called it at the time, as well as AI.

He's probably also the most anticipated speaker at many data conferences, being entertaining and thoughtful at the same time. So glad for you to join us Benn. Welcome to the podcast.

Benn Stancil: Thanks for having me.

CL: Can you take us back to the beginning? What problem first pulled you into the data space?

Benn: Yeah, so I started my career in Washington D.C. After school, I ended up at a think tank in D.C. which was like a very D.C. job, sort of like a "when in Rome" kind of thing. But it was doing like economic policy research and, structurally, it's very similar to a classic data job in Silicon Valley. You have a bunch of numbers, you are trying to make a recommendation about what somebody should do.

In our case, those numbers were like numbers published by government bureaus and stuff like that. And the things we were recommending were like what the Fed should do. And so if you're like a data team that's frustrated with like, the PMs don't listen to your advice. The Fed really does not listen to your advice when you're some 23-year-old analyst in a think tank.

And so the structure of that job was very interesting. It was like, okay, take some numbers, think about stuff, make recommendations. It was kind of academically fun, but it's sort of frustrating when you're 12 steps removed from anybody caring what you say.

Dori: So I have a master's in Economics and worked at the SF Fed as an economics research associate. So I can tell you, doing research on the inside as well, your research does get surfaced, right? But still you're further away than you'd like from seeing it in action. I wrote a white paper, co-authored it on the impacts of a trade war with China in 2018. So, fun to see how relevant that became in later years.

Benn: I have a book, it's called Currency Wars. This was one of the things that we wrote when I was there. So, yeah, I'm familiar with that. But yeah, nobody pays any attention to it. And you were the person we were trying to get to pay attention to it so that we might be like a single citation in your report of 50 citations, which then nobody else reads.

So it was like there are 10 levels between us and any decision actually getting made, which ultimately probably gets made by some political thing anyway. The structure was kind of fun.

I had some friends who ended up in San Francisco. I thought that sounded like a fun thing. I had no real long-term ambitions to be there or to be in startups. It was not a world that I was particularly familiar with, but I ended up as a data analyst, data scientist type of person.

I mean, I was like data analyst in a true sense at a company called Yammer, which was like a B2B SaaS business kind of in the early era of that. And so I was doing product analytics type of stuff where it's like, all right, how many people are using this button? Should we change these features? All that kind of stuff. Which I thought was very interesting.

It was like kind of a fun job. And then that company was acquired by Microsoft not long after I joined. And so then me and two other folks ended up like basically taking an internal tool that we had inside of Yammer and using that as inspiration for a company we started in 2013. So it was like kind of modeled off of this internal tool, plus a bunch of other internal tools that people had built at other various big tech companies like Facebook and stuff like that.

CL: Right, So I think you meant Mode, I'm assuming. Right? And then taking the internal tool you guys built at Microsoft for product analytics, and turning that into a standalone product.

So what was the actual spark? BI has been there for a while. Why did you think that it was something you should invest a decade into?

Benn: Yeah, well, naivete on a bunch of fronts, I would say. One is we weren't planning on doing it for a decade. I don't know how long we thought it would be, but it wasn't 10 years. I don't know that this is a bad thing. I think, like, in hindsight it actually might be the right way to do it. Even though it sounds dumb.

We did no research prior to starting this thing. So we built this internal product. It was basically like a SQL query editor in a browser that had charts on top. That was functionally what it did.

Like you could write queries in your browser, you could take those results, you could make a graph of them and then you could share, share that graph with somebody via URL and they could click on the URL and get a chart and query. It worked really well as a way to quickly share the kind of analysis that we were doing, which was essentially just like doing pivots on top of tables and a database.

And this was very much what was happening inside of a lot of like, data teams at Facebook and LinkedIn and Airbnb and places like that. And they had built very similar sorts of tools. And so, like, that was our market research was like, well, after the acquisition, people inside of Microsoft seem to like the thing that we made. And we went to LinkedIn and they had something that looked like it, and we went to Uber or Pinterest and they had stuff that looked like it. And we're like, well, there must be a market.

I think that, like, we were like a little bit more thoughtful in that we believed that SQL was going to continue to be a big thing. I mean, this was 2012, 2013, and this was a little bit of the era of "SQL is dead. We have no SQL in Hadoop. And why would we ever talk about this sort of stuff? Everybody does stuff in Python now."

And so if there was like any kind of real trend that we attached ourselves to, it was, it was Redshift coming out. Like, we did have this conversation before we started it that the tool that we built internally was running on top of Vertica which was like a big analytical database that HP bought. It was very expensive, we paid a ton of money for it. That was kind of state of the art at the time.

And when Redshift came out it was cheap and so it was like, "oh, the thing that we do as a data team that writes SQL on top of a database is going to become a lot more like, that's become more accessible. There should be tools like Mode that sit on top of that."

But we did no research into what else exists in the BI space? I'm not even sure we were that aware that the BI space was a thing.

And part of that was like we didn't really imagine ourselves as being that similar to MicroStrategy or Tableau or whatever because we were very like SQL-oriented as opposed to draggy, droppy stuff. The degree to which all that collides I think was a surprise to us and kind of continues to be somewhat of a surprise, I would say, to everybody who attempts to do this.

But yeah, like we saw ourselves as more of a, "oh we are a technical workbench for data teams. That's not a BI tool. What even is BI?"

And so we actually resisted the BI branding for a very long time. But like leading up to starting the thing, it was kind of like, seems like this is an idea that might work. Let's try it. Like, we did not do the long run of like, let's do tons of market sizing and research and competitive things. Like we had to do it for fundraising but we never really cared about it ourselves.

Which I think is one, like a bit of a mistake. But to a lot of times that's how actually startups work is you just like have an idea and you say, we're going to go do it and see what happens. As opposed to like, you can overthink this stuff and like triangulate too much. And we weren't really doing a lot of triangulation. We were just like, we like this product. Seems like other people might. Let's try that.

Dori: Yeah, you're solving a problem you had building what you knew. I'm curious, why do you think that would be a mistake to not do the market research?

Benn: My belief is that if you are launching an early product, there's two ways you can kind of approach it. You can go talk to a bunch of customers and try to essentially average their needs into something. And that to me is this kind of like triangulation approach where it's like, all right, we've talked to these people and these people and these people, and they want this, and they want this, and they want this. And we can't build that one thing because that's the giant, like, Frankenstein behemoth of a product. But if we kind of find a spot that solves all their needs in the right way, we'll do this.

And you end up building, like, kind of this weird mashup of stuff. And like, I am sure there are successful companies that have done this, but like, that to me is where you go if you talk to a bunch of people without a particularly firm idea. You end up like, there's no opinion in your product anymore. It's attempting to find a position that is like centering yourself in what is the cluster of approximate ideas that people have.

To me, the better approach, and it's what we did without knowing it, is--

If you talk to people, great. Find the one that you think gets it right and build exactly what they want. And your job is not to identify how to average all these things out. Your job is to figure out which one of these 50 people you talk to is the one that had the idea that you can sell to everybody else.

It's finding the very opinionated thing. And sometimes the very opinionated thing doesn't mean it sells to one person. It's like this very opinionated thing actually has, like, a much broader appeal, but everybody else doesn't think of that, or they don't see or they don't know that.

And so that's basically what we did. We just never talked to anybody. We talked ourselves, but we built a very opinionated thing that we wanted.

Dori: Yeah.

Benn: And I think the more likely to be successful approach is find something that, like, build the opinionated thing. I don't think you build anything with opinion if it's this, like, blended average.

And so it's not that doing research is bad, but I think the danger of the research is you end up sanding down all of the interesting edges and the things that make it appealing by trying to make it a mass-appeal product. And most early stage stuff, I think doesn't work if it has this kind of mass appeal. It works because it's got a sort of character that some people really, really like.

And honestly, if some people really like it, there can be lots of ways to draw some gravity from other people that wouldn't have necessarily liked it initially.

Dori: Yeah, start with the niche.

CL: What I'm hearing is that keeping those opinionated bits, not just sanding it off by extensive market research because, you know, you might actually also end up in analysis paralysis, not doing anything. Right? So keeping opinionated and doing the thing you believe in and then dog food a whole bunch.

Benn: Yeah. And like, what do I know? But the market research, I think, is like, all else equal would be better if you cannot let it, again, turn you into just like, I have mashed everything up into this kind of average.

Dori: Like a group-think, product by committee.

Benn: Yeah. And you do this as you develop the product too, where you start to talk to a bunch of people. You build an early thing. It's a thing that some people like. Some people come to you and are like, I like half of it, but can you build this feature? And you can build this feature?

And everybody starts to basically expand around the edges, but, like, different parts of the edges. Somebody's like, oh, I really like it for this thing. Somebody like, oh, I really want it for more, like, enterprise stuff. Some people really like the charts. They want more charts. Some people really like some other integration thing.

And that can have the same effect where you're like, well, we need to build one feature for this customer. We'll build that, and so on. A nd everybody's aware of that possible product bloat in a "there are too many buttons way."

But I think the bigger problem is there's an identity bloat where you no longer know what you are anymore.

It's like, oh, you're a clothing store that sells T shirts. And then it's like, well, maybe pants, now sock. All of a sudden you're like, we're a department store. And the thing that made us is was we sold these really cool T shirts is like, you don't really have that identity anymore, and you're not really sure what you're doing.

Dori: Yeah.

Benn: That's effectively the same thing as it's like, average. It's hard to do that with customers because you have a lot of incentives to try to sell to them. It's also hard to do that when you talk to a bunch of people and you're trying to be like, how do we make the biggest market? You inevitably will be like, it's a department store, as opposed to, it's really cool T shirts.

Dori: Yeah. How were you able to keep your focus? Did you have to fire customers? Or is it just going back to, you were super opinionated?

Benn: Oh, we didn't. We did a bad job of it. Like we messed it up. We did okay, I would say. We didn't drift a ton. Mode never had any pivots or anything. Like the product it was on the day it was acquired was not that different than the product we launched.

The biggest fundamental question I think that we never really answered was, was Mode a BI tool? The answer is yes, it was. But there was always this divide of like should we be for everybody or should we be like this very technical thing that basically says, "you must be this tall to ride this ride?"

And I don't think we ever quite made a decision on that. And there's a very, to me, a very strong market dynamic for why that is, which is basically like the market for a "you must be this tall to ride this ride" tool is not that big.

Dori: Yeah.

Benn: And so everybody who buys that is going to be like, this is great but like we could sell 10 times as much if you could make it useful for everybody else.

And so it's really hard to maintain that discipline to not end up drifting towards a BI thing. I think this is what happens with every product that tries to serve technical audiences in this way, is they drift towards becoming a BI thing.

We basically did that slow, again, not "pivot" but like slow expansion into more and more draggy, droppy self-serve stuff. The thing I think that is tough about that is like once your initial brand is that like we were always like--

People do not see us as a proper BI tool. They saw us as like a technical thing. And you see this with other tools too. With the people who've started in the same place we did it and move in that direction, like this is very much is what sort of Hex's story.

Dori: I was just thinking of them.

Benn: Yeah. And so they, to me, have followed a very similar arc. They started as a technical thing. Their orientation was towards notebooks as opposed to like kind of SQL IDE but like same same, but different. And you can see like there's this tug towards like there's more self serve stuff, there's more classic BI stuff and it's the same dynamic.

I think it's like they want to sell bigger contracts. The way you sell bigger contracts is sort of wall to wall deployments. The way you get wall to wall deployments is value to everybody. The way you add value everybody is they can like at least have a sense that they can use the tool in some creative way as opposed to just like consuming dashboards someone else created for them.

Dori: The casual salesperson can use them.

Benn: Right. And the weird thing is in this world, people don't really do that. Like Looker was the big BI tool that like, or big sort of Silicon Valley BI tool that we competed with. One of the ways we sold against them a lot of times was people would be like, "we bought Looker. We spent all this time spinning up a bunch of stuff."

And then you say, "Great. Do your salespeople actually go and do a bunch of self serve stuff?"

And they're like, "no. They look at dashboards"

And you're like, "well then why'd you do that?"

And I think that's basically true of all BI is that people aren't doing a whole lot of exploration on their own. They're mostly looking at static dashboards.

Dori: Or they download it into Excel.

Benn: Yeah, yeah. The export to Excel is really the thing they do. But people want those things. They sell. There is like a really strong sell to like you can give that demo and be like, look at all these things you can do without having technical ability. And it's like, ah, yeah, I want that.

And so we built it too. Even though we knew it like wasn't going to be used that much. Whether or not like AI and chatbots and all that stuff changes, I don't know. But, at least up until that point, certainly that was kind of the arc that all of these things seem to follow.

Dori: Yeah, that's interesting. So CL originally had written in the intro that Mode was an analytics data platform.

Benn: Mhm.

Dori: I had him change it to BI Tool.

Benn: Who knows?

Dori: So that tells you where I see it.

CL: The identity.

Dori: Yeah, yeah, yeah. So it's interesting, that conflict that you had. There's like a few like follow ups here that I want to get into. So, first, to take it back to the company a bit. What led you to the acquisition?

Benn: A couple things, really. So one, we were in this like long, I would say gradual transition into being like, oh, we got to build a BI tool. That looked like, okay, we have a thing that's pretty good for analysts. We could do a good job of landing with data teams.

And typically the sales motion that we went through was, we land with data teams. The contracts we would land initially aren't huge. They were like not tiny. It wasn't like put in a credit card and pay a hundred dollars a month. Like we were probably landing between 10 and 30k deals on average, which is not a bad size deal, but it's not huge and certainly not as big as like you could get if you were sort of a proper BI tool.

But the average contracts that Mode ultimately had were much bigger than that because the motion was essentially we would land with a data team, they would start to share stuff out with everybody else. A lot of like, the gravity of the organization would actually move towards Mode, away from the BI tool, because the more valuable stuff would get answered in Mode.

Like, there'd be a handful of very important core dashboards in a BI tool. And then everything else would be like these ad hoc questions from executives. And the data team would answer those using Mode and they'd share the results using those things.

And so it was like, "yeah, we use the BI tool for a handful of dashboards and everything else is just shared back and forth in these ad hoc things in Mode."

And so the contracts could get pretty big given some time. But it was very hard to land them that way because people would be like, "This looks like a technical tool. How many seats do we really need?""

And we're like, "we think you need a bunch."

And they're like, "well, we'll use them when we need them."

Dori: Yeah.

Benn: And so it would start small and over the course of most of Mode's life, basically half of our revenue came from expansion. So we were about 50, 50 new business and expansion. That all was great. And like, we were like, oh, actually the thing we gotta do is eventually become like a proper BI tool. We're gonna do that. We built some stuff. We built some like visualization stuff that I think was actually quite good, and I would argue probably still today is like second really only to Tableau and the stuff that you can do with it.

And so we felt pretty good about being on that path, but it was gonna take some time because BI is this enormous feature set and there's like a long sort of transition there. And it also took time for us to grow into custom. The mark when like the market turned in 2023, 2022, 2023, there was a lot of dynamics in customers that they no longer were willing to buy two things.

So if they were buying a BI tool and Mode, which typically was like, they were kind of happy to do, like "data team uses this. We use A BI tool and it eventually works its way out." Now, they were like, "we're going to use one. Budgets are tight. We got to use one thing."

And that put a lot of pressure on us where, like, sometimes we'd land really big deals because, like, okay, Mode's the choice we have, but you'd also lose a lot of deals because it's like, well, we only want one thing, and we want the BI tool, because that's what the executive wants. And so it just became a lot harder to have those initial lands.

We had a path basically to get through that. It was like, yeah, this still could work out. It's just going to be a harder road. And when we were having those, basically figuring all this stuff out, Thoughtspot, which was the company that bought us, was running into the opposite problem, where they were like, very BI tool. It's like, very classic for business users only. Data teams did not particularly like them.

They were running on the issue of data teams would get in the way of deals. And so they were starting to think about building a data suite. And we were like, well, we're building a bunch of BI stuff. And so it was essentially, we were having this, like, all right, we got to figure out how to navigate what is now a sort of more difficult environment, that our sales motion didn't quite work the way it used to.

They were running into the same problem, and it was like, we need a BI tool, and y'all have one, and we need an analytics tool, and we are one. And, you know, so, like, basically it materialized through that. So it was somewhat just a timing thing where they seem like good partners and a good sort of product to join with, and somewhat--

I mean, a market dynamic does in some ways force these things. Not in like, "oh, my God, you gotta sell it." But, in a way where the conditions that we were selling into changed, such that there was more risk to continue to follow the motions that we were following.

Dori: Yeah, that makes sense. Looking back, is there something you would do differently if you could do it again?

Benn: I don't know that this would work, but it feels like the thing that would have made it more fun, I guess, is to hold the line as hard as we could on not being BI. And I think part of that is you have to raise less money. We didn't raise a ton, all things considered. Certainly not in this era. I mean, over the course of Mode's life, over the over 10 years, we raised $80 million which is, like in certainly in AI world, that's nothing.

Dori: Pre-seed are doing 4 million, 5 million right now.

Benn: Right, right. And leven in our time that wasn't a lot, but it was enough to get to where you need VC scale outcomes. We needed to have paths to $100 million. Our board wanted to see a path to $100 million in revenue. And again, we could piece together that math and it wasn't like going to be impossible but to get there, it's really hard to do that on this like technical workbench thing.

And so I think that like the thing that you potentially could do is if you could build a really like-- And I think there still honestly is space in the market for this product. Like there still is a kind of a Mode shaped hole in the market. Hex fills it to some degree. But again like the move towards BI stuff doesn't quite close it. And I think, you know, they're very focused obviously on the AI thing which maybe that just replaces all of this and who knows, none of this matters.

But I do think there is still a real need for like look, just a SQL tool that lets you do some of the stuff that we let you do and has like good visualizations on top and this kind of workbenchy thing that I think could have been a pretty stable and good product but it would have been slow growing, the market would not have been huge.

I think the market would have been you know, you could have gotten to 100 million dollar business but we've had to get there in a slightly different way. And so I think that would have probably been the different thing that I would have done is just be like we are going to really stick to our guns on that.

Again, it may have completely tanked everything. It may have been like this is actually unsustainable, it doesn't work at all. So I don't know. But there is also something where it's like--

One of the other dynamics when you are a technical tool that drifts into becoming BI is you alienate people, because everybody's seen it.

Everybody's like, oh, here they go again. They're off to make money with the people with the real money and they're not going to care for us anymore. And no matter how much you say you do and like, no matter how much you try to like yeah, you do kind of end up telling people who pay you money.

And so I think there is something to, to being like, no, we are like dedicated to these people and continue to be the champions of the same people. It's a slightly more fun thing to do I think is like building a product for people that just really want you to build the thing and you're building the thing for them and it feels like you're all in this together. Whereas there is a thing that happens when you drift into "and now we're an enterprise tool."

The people who like thought you were building for them now sort of feel adversarial. And so I don't know, I think it's like that's maybe also naive to think that you could ever maintain that over however long. But I suspect if you were to ask like the DBT folks, they would say the fun part of DBT was the early stages where it was like very much a community thing.

And then when it became they raised a bunch of money and they have to sell a bunch of things like that is. That is how a business works. I do not have any objection to that. That is of course the thing they should be doing. But it is a different dynamic when your customers see you as like big company making money versus company building a product we love because we're all in this together.

Dori: Yeah.

CL: So I'm hearing, in apparel universe you might stake in the ground, like we're building for these very niche like people like us. Right? And then not go through this identity bloat that slowly accelerated. I guess it was also like the puzzle you mentioned that fit together between Mode and ThoughtSpot, if it's going to be a full fledged BI thing. Right? Having the coverage for everything.

Benn: Yeah, yeah. I mean, if I could write an ideal version of that story, I think it is something kind of like that where it's like you stick really firmly to your guns on who it is that you sell for. And ultimately if that market's not that big or whatever, you end up getting acquired by somebody but maintaining the sort of distinct brand that you have. And I think that's really what this market needs is it's "not giant behemoth product that does everything."

It's a little bit more like "suite of products that integrate well together." And "integrate well together" may simply be like permissions and like OAuth, but very little else, that maintain their own kind of brand identity. It's a little bit more of like a conglomerate of related things such that, oh, when people say, we like Mode, but we need some more draggy, droppy stuff, we instead say, well, we have a store next door that sells that. You get a discount, like, here's a coupon.

As opposed to being like, oh, our product also does that because I think you end up with this. The analogy I've used before is like, if you sell, say you're a great menswear brand and you're like, there's a half the world is not men, we should sell women's clothes. You don't change all of your styles to be like unisex. You create a new brand.

And I think that there is a temptation in these sorts of things to be like, ah well, we're building a product that was for half the world. To build a product for the other half the world, we need to like, expand our existing product to be sort of acceptable to everybody. And I think this is a place where you don't make great clothes for menswear brands and women's brands by mashing them up, you make two separate brands.

Dori: It's interesting because I'm thinking like as a practitioner, when I think of Mode and in RStudio, which is now Posit, that started off that practitioner thing. But then they brought in more BI, more self-serve. And it did feel, even though I didn't lose functionality on stuff I could do, it felt like I did because it just felt more crowded and more noisy. It was trying to do two things and doing neither thing as well as if they had just been dedicated.

Benn: Yeah. And there's a psychological thing that seems to happen with your relationship with your customers that way too. And I think this, again, I think DBT is a particularly good example of. DBT did all these things where like, look, they have this core product, they have the open source thing. As they added stuff to the paid side or even just change the pricing on the paid stuff, which had absolutely no impact on the open source stuff, a bunch of the open source people kind of started to revolt against it.

Like their world was unaffected. There was no particular reason why anybody-- Like, it shouldn't have mattered at all. It's like, oh, you updated stuff on the paid stuff and I'm using the free thing. These aren't connected at all.

Dori: Yep.

Benn: But there is a sense of impending rug pull. I think people suddenly start to get suspicious that, like, it's not a slippery slope, but it's like, I feel like I am now standing on a slippery slope and at some point all of this is going to go away. And maybe it doesn't, I don't know.

But like, in your example of RStudio, RStudio starts to build all these other things. Even if RStudio itself doesn't change, there is this effect that people start to be like, what are you taking this away from me? Like, you've proven that you have other incentives in your head. Therefore maybe one day you'll like, abandon me too.

And so I think, like, that's a really hard thing to avoid in this world, in the BI analytics thing. And we tried, but I don't think we did. And I don't think anybody who really ends up in that drift does.

CL: Yeah, well, I guess in the open source world, not just data, it's all the same things. Like, people are super suspicious about that. But I want to go into our next topic, actually. You wrote extensively over the years about the modern data stack, the kind of best of class tools, you mentioned the approach and also this potential bundling that we're seeing consolidation.

But in the entire data ecosystem of data engineering today, what is the hardest part that nobody's really talking about or solving?

Benn: My answer there is a little bit like--

The overarching problem is really hard. And I think we, in some ways, with the whole modern data stack thing, got distracted into thinking the problem was the infrastructure. And in reality the problem is that analysis is hard.

My rough view here is, I don't know that like analysis actually has proven to be all of that useful. The version of the story to me is in the early 2010s, when all of this stuff became popular, a lot of it was based off of like people at Facebook and people at Google and people at Airbnb are doing this incredible stuff and, oh my God, it's changing the trajectory of the businesses and stuff like that.

And there became this sense, and this is especially true if you look like B2B SaaS companies that, oh my God, we have data and the data is really valuable and like, let's just stick a bunch of analysts on it and like, we will find insights and this will change the trajectory of our business. And it didn't work.

And people seemed to react that it didn't work because, like, the tooling wasn't there yet. And it's not literally like we didn't have the query tools, but it's like data was too messy, the pipelines weren't working. It was like we spent all the sources, we didn't integrate enough of the data sources. It was the kind of thing where it's like, well, we continue to build what was essentially the modern data stack underneath data teams.

The documentation's not there. At some point that will get good enough such that we will now be able to like unlock the tremendous value that is inevitably contained in the data that we have. And I just don't think it was there. Like it was there for some companies and it's there for Google and Facebook and for companies that have this enormous scale where you can find these sorts of patterns that have like millions of dollars in them.

But if you're a B2B SaaS company with a thousand customers, the alpha isn't really there. It's not to say there's nothing useful in it, but you can probably learn the same thing by going to talk to 10 customers.

Dori: Yes.

Benn: And so maybe you could do something really valuable with it. But to find something in the way that a thousand customers and 10,000 users interact with your B2B SaaS app, it's really hard. Like that analyst job, "go find valuable insight in those numbers" is really hard. That's just a really hard ask is like go discover something that will change this business. Like there's just a lot of noise. There's not much signal, there's not many people, there's not that many patterns.

It's messy. There's all these like weird org dynamics. And so I think like there's just a lot of expectation of what we would get out of that. And the real problem we haven't solved is like we don't know what to do with any of this stuff. And I think a lot of the modern data stack was essentially like this kind of distraction where we couldn't solve the problem at the top of it.

And so therefore we were just like, well, let's make the pipelines better, let's make the stack cheaper, let's make it able to process more. Like just solve all the technical noise underneath it. Because those were sort of obvious like frustrations and bottlenecks, but they were ultimately bottlenecking a problem we couldn't solve anyway.

Dori: Oh Benn, I just feel your preaching so much. I'm just thinking flashbacks to so much of my work of I've had to talk to PMs in leadership and been like, "sure, we can do an experiment. It's going to tell you nothing. We don't have scale. I can do Bayesian. But like there's only so much will to say. And like, what would this meaningfully change?"

Like, I think a lot of my emphasis coming in has been on and like the later part of my career which is still relatively short on the "so what?" Like what do we want to do? What do we want to change? Because we can find needles in a haystack. But is it going to be useful? Do you need a needle?

Benn: Yeah. Sean Taylor had a blog post from a while back that was like he was talking to it. So Sean Taylor's like a data scientist, used to work at Facebook and now is at OpenAI and started a company called Motif. In between he had a blog post where he was like, had some quote from an old boss of his that said something along the lines of like, "if the experiment matters, I don't need statistical significance."

Like it's just obvious. If the thing is actually meaningful enough for me to care, it's going to be really, really obvious which one is better. And like I think that's kind of true of experimentation in a lot of these B2B SaaS businesses type things generally is if the difference is something that it will show up in a meaningful way where it's like, yes, this experiment is the right thing. You're gonna know from talking to people.

You're not trying to detect these tiny bits of noise like in a billion Google searches a day. You're like, people love one and they hate the other one. And all of the chasing, oh, this is statistically significant. Nobody seems to notice anything about it. And all this sort of stuff, it's like, yeah, it's statistically significant because like you sort of force the numbers that way but like it doesn't really matter. And it certainly doesn't matter enough for us to be spending any time on.

Dori: A question here. So building off of this for developer infrastructure tooling and other tools that are more usage based. So maybe your front end, your UI isn't as clicky but your API calls, you do start getting scale, other types of usage. I have seen in my experience, I'm curious what you think about--

You can start doing more experimentation and you can start doing modeling there, but it becomes more about product improvements and technical than necessarily user experience.

Benn: That's not a world I know a ton about. But like I would suspect that's true. Like every product now is like an AI product. And I suspect that's also very true there where, one, the things taking actions are like I click a button or I send a message, like I send a message to a chat bot and a million things happen in the background, those things probably experimentation is also much more useful.

The thing where I think experimentation is tough is when it's relatively uncommon and relatively uncommon it can be like a handful of times a day. But like relatively uncommon user behaviors and I don't know, you see this a lot. There is this like weird dynamic with the experimentation stuff on again like mostly SaaS companies or like kind of small consumer products where it's like well nobody even noticed this change, but it seemed to have made these like dramatic.

Like it's a little bit like the psychology experiments where it's like we realized, oh, if we give 30 college kids an extra piece of broccoli on their plate with lunch, then actually 30 years later they're twice as likely to make more money. You're like no, that doesn't seem like that could be possible. If the statistics say that then something is wrong with the numbers or like it's just that's there is too much noise in that for you to be able to do it.

And I think like there's a temptation a lot of times in some of this type of like smaller scale experimentation to like ignore causal explanations or to really reverse engineer them to say like, "well nobody saw the thing but if they must have changed the way they thought about this thing," it's like I don't think it's that complicated.

Dori: To your point, you don't need it, you don't need causal.

CL: So I'm hearing, it seems that your main point for this is that the hardest problem is really we are applying technology to the wrong problem or like the unsolvable problem or one that shouldn't be solved by data. Is that what you're saying?

Benn: It's not that it shouldn't be solved by data, it's that I think extracting valuable information from data is hard. The analysis is hard in the sense that it's like look, we have all these numbers, we have a bunch of like patterns of things that are happening. And what we're trying to do is figure out like smart things to do because of that. I think that's just a very hard thing to do.

And the accessibility of data tooling did not make, to me anyway, the skill that is required to do that as equally accessible. The gap that-- Sure, there was a technology gap. 2010, there was a technology gap that like the everyday company did not have the ability to build fancy data infrastructure. And that was a thing that held them back from being able to do what Facebook did. Absolutely.

But also what was holding them back was one, they didn't have as useful a data as Facebook did. And two, Facebook was able to recruit incredible talent. And the average company cannot recruit that same talent. And I'm not sure that the Facebook talent could find all of like the useful insights at random B2B SaaS company either.

But the combination of a much harder problem with like less incredible talent, because they're not you're not Facebook. I think it's just a really, really tough, like the tooling doesn't help you there if you're giving again, like Facebook is attracting the very top talent. You're giving people who are not that harder problems to solve is like, yeah, of course it's not going to come out as well.

And I don't think we really recognize. I think it was like, oh, well, you just get people the right tools and like, all this stuff is relatively easy. It's like, no, it's actually very hard.

In some ways the best people were also playing on an easier mode because the problems were more accessible and more directly in front of them. As opposed to if you're doing this in a smaller place.

CL: Okay. So I guess the takeaway is that in addition to investing in technology, talent and broccoli are equally important.

Benn: Yeah, II mean the real thing I think it is, is it's like the problems are different. It's just not the same. You're not solving as hard of a problem. Or like, again, the Facebook and Google ones are easier to extract value from that.

I don't know anything about these worlds. So like, this may be totally wrong, but it's basically like if you were walking into the financial markets of 1900 and you're like a quant and you're saying, go figure out how to make money. It's probably easier than walking into the financial markets of 2025 and saying, go how to make money. Because this thing has been like optimized to hell to figure out how to make any bit of money out of it out of a bazillion hedge funds that, like, employ very, very smart people to do it.

Whereas in 1900, if you're someone who comes in from, like, having seen 2025 and like, oh, I know 100 ways to, like, play this game and do this thing and, like, make money. And so it's like, that's a harder problem to do that now than it was then because we've squeezed all this juice out of the thing.

And I think it's not perfectly analogous because, like, the B2B SaaS company or the average everyday company hasn't squeezed juice out of things the way, like, it's not that there's more juice to be squeezed, it's that, like, Google just had a very juicy lemon, and everything else is kind of a dry lemon. And so you're asking people to, like, squeeze a lot of juice out of this very kind of dry lemon. And it's like, that's just not going to work that well.

We should recognize that if data is the new oil, we should recognize that we're sitting on different caliber of oil fields.

Dori: Yeah. Another way to summarize it, because, like, just in my own experiences is it's expectations, feasibility. And on top of that, when we get to the talent, being able to ask the right questions, since it's a little bit more obscure, of like, do we want this, and making sure that you're focused on the right things for your company.

Benn: Yeah. There's like, to your point of expectations, these early companies that did it well, the ones that we tried to model everything after had every advantage, really. Like, they had sort of the natural, the oil was better, they had the talent, they had tooling because they had built a bunch of internal stuff and that set expectations for what these teams could like, oh, my God, like, Facebook's data team made Facebook Facebook, like, then they should make us Facebook.

And it's like, well, they actually had three or four things that were just structurally different than what the average company has. But there was this expectation of data teams as people who had this uncanny ability to figure things out and predict the future and all that kind of stuff, which it wasn't like, literally they're going to do the same thing that Facebook did. But I do think there was some amount of--

To your point, that that was the expectation. It wasn't like, y'all are just like, giving some reporting. It's like, y'all should find real stuff because all these other people did. And so it set the expectation of what these teams were going to be able to do as something that was quite high.

Dori: Yeah. To switch gears a little bit, because I could definitely noodle on this for a while. But what's the hardest part of data engineering today that you think no one's really talking about?

Benn: The best answer I think I could give is it's like it's messy and I'm not exactly sure how to phrase that, but everywhere I've ever been and every place I've ever seen just ends up with this giant like soup of a thing that is layers and layers and layers of tech debt basically that nobody knows how to unwind. And maybe the better way I would put this, I had to think about this a while back, is the cause of that to me kind of is data teams, and data engineering too, largely exist without a sense of production.

Dori: Hmm.

Benn: Like, there is production kind of where there's okay, this is a dashboard that executives are going to look at. But so many things get created that are like in production for a minute. It's like, oh, you can ask a question and I share you an answer and that is like in production. But like is it next week? Could have changed.

If I come back to that report and push run on it a year later, am I saying that's still good? Like certainly not. Can you do that? Yeah. Will you think it's good? I don't know. Will it be? It might be.

And so like in engineering there typically isn't stuff that exists in this like weird self expiring middle state. That's like, is it in production? Well, how old is it? Nobody looks at software like it has to check out how old something is to know whether or not it's still like in production.

Dori: It's like, does it work or not? That's really it.

Benn: But with data stuff, it is in production the moment you send it, but then it's slowly expiring. And so that creates a ton of loose ends and just this really messy world that nobody can really clean up because like do you get rid of the thing that's a month old? Well, I don't know. Maybe people are still using it. It still could be valuable. It's still kind of right. Let's leave it. Like you don't have a very tight sense of what is being like the stuff that you're really taking care of and what you're not.

And so I think that creates just this giant mess of like everybody talks about, oh, we got these like this enormous mess of dags and these data pipelines that are all unwieldy and like nothing is documented. And it's like almost impossible to solve that unless you solve like, what are the boundaries around which we are saying we will continue to support things versus not.

And data is like perpetually in this, like, well it worked when I said I sent it to you, but that's the only time I ever know. And so just. It's just this like spiraling mess of things that nobody can ever get their head around and nobody knows quite what's what. And like, does X match Y? I don't know, we'll argue about that in the meeting of like, why do your two numbers look different? It's like, well, because they came from production in different times. That's like a kind of impossible state to unwind.

CL: Right. I guess what I'm hearing is the industry's standard or boundary about what production data means given SLA or whichever way that we maintain it, artifacts, we have a shared understanding of that. Doesn't quite have a standard or the shared understanding across different teams.

Benn: It does in places where you do data as an application. Where it's like, okay, this is powering a thing. This data goes to a place and like it's not reporting. It's like this is a pipeline that feeds itself into this like machine that then does something else.

CL: It's part of the product, right?

Benn: Yeah, either like a true product or like just like an internal thing. Those things seem to work pretty well. And I think part of that is because they do have boundaries. They do have like, all right, we have to, we know what the endpoint is. Everything that does not go to those end points we will just call and like that's the thing.

But a lot of the systems, again where it's like more analytical or like answering questions, you don't really have that destination that you maintain because like there's a new question every day, like, okay, it creates a new branch. But also like, even when you have those like production pipelines that run through analytics teams, they often become things that other people are like, well, people know this works, so I'll just build branches off of it that aren't a part of that like core product.

But like, ah but why not? Because I know the pipelines all feed the same stuff. And so it's like you end up with this really messy-- Again, you would never really do this in an engineering, like a proper engineering context where you're building all of these half supported appendages for a minute on top of production stuff that you're like, yeah, I'll just leave it there for forever.

Like you want to kind of keep that stuff cleaner. And there's just like not really a sense of doing that with data work. It might be a little bit cultural, but also I think it's like that's the nature of the work. To me, any job function that functions on top of documents is the same way where it's like, what is in production for a bunch of PM documents? Ask the PM if it's still up to date.

Dori: Yep. I've seen like the sources of truth just become whatever was in the deck last. It's like, "oh, it was in this table. This is now the source of truth."

Benn: And I don't know how you avoid that. Like everybody wants to have like, oh, this is a living document. And you're like, is it though? Because people forget to update it and that doesn't mean it's right anymore. Like that's something that's useful about like code is that what is in production is what is in production. It doesn't matter what anything said.

Like you can read it, it's right there. Like is this document up to date? Is a representation of what is in production somewhere else. And so like code is sort of self documenting in that way, whereas all of the data stuff is not.

It's just a different dynamic. And I think that creates this place where the data as engineering work, the analogy doesn't work anymore. And I don't think we've really figured out how to do it. And for the most part we just muddle through it.

CL: Yeah. So I guess in this messy nature, things will break and then failure will happen. What's the most painful bug or failure you've seen in production?

Benn: One example. These are like, funny is the wrong word, and this one is very much not funny because it like has real effects. Somebody wrote a query that was supposed to, it was like a customer support thing. They were supposed to call was like ranking the customers that were in most need of like checking in, essentially.

It was like these are the people we think were most at risk of, it wasn't churning, but like people who basically we thought would be like at the best chance of buying our product because they were most risk of a thing. And like salespeople were calling these things and were like, "these people we're talking to don't seem good. It seems like they don't need the thing we're trying to sell them at all."

And then like the data science team that had basically built these models to predict like who they should be calling was like, no, no, no, trust this. Like, we built this very carefully tuned model, all that stuff. And eventually, I don't know, like sometime later somebody realized that the query just sorted the result the wrong way.

So it's like the model was good, but the result was like sort ascending instead of sort descending. And so like literally they had been calling the people who were like the worst possible customers, you know, like, oh oops.

So like that sort of stuff happens and it's very bad. To me, the stuff that's like the little more pedestrian. Okay, what's the real problems that happen is honestly like they show up in a meeting and two numbers don't match. Like it's just, it happens all the time. And it happens so much that you're just like, why am I looking at any of this stuff? What does any of this mean? Why can't you just--

And like, there's good reasons why it doesn't match. And you can also be like, well, Google Analytics counts days in this way. And like a user to Google Analytics is X. But there is something very like unnerving to trying to answer any sort of question when you ask it three different times and you get three different answers.

Dori: Yeah.

Benn: Even if there's a good reason why that is. And so I think like that happened. T hat's partly because the messy thing we're just talking about happens all the time. And it's just this inevitable mess that you just have to sort of accept. But some people don't.

Dori: Yeah, yeah, a little bit from this and to what we were talking about earlier, how relevant do you think BI tools are going to be with like a world like LLMs? And you asked yourself rhetorically this question earlier on in our chat. But I'm curious to hear what you think.

Benn: So my story on this is--

I do think that BI tools will become a lot less useful, but not because AI will solve the data problem for us. I think we'll just start to get answers in other ways.

So basically like the example I've used with this before is say that you're like an executive that wants to understand what your customers like or don't like. This is the fundamental question that people always have. What should we be doing? What do people say about us? Whatever.

The way you might do that now is, okay, go ask, do a bunch of analysis on how people are using our product and like, put together a report and blah, blah, blah, and give me things to show me what's good and bad. That takes time. It's hard. You end up with a deck that has a bunch of sort of seemingly contradictory things and there's like, not a clear story.

And like, maybe that's the correct representation of reality, but like, it's a little bit like, okay, well what do I do with this? And you're like, I don't know. It's a bunch of numbers that point in different directions. And in an LLM world, okay, we also have a bunch of support tickets. We'll put the support tickets into a giant chat bot and say, what are customers most upset about? And it'll just spit out five answers. Pretty confident.

And it will give you some citations to particular customers that will be like, they are yelling about, look, this one customer said they're super mad.

If you talk to execs now, almost inevitably when they have some strong opinion about something that is going on in their world, it will be because someone told them that.

It's like, I heard from three customers, they're really mad about this thing. And it's like, no matter what the numbers say, almost inevitably the thing that really sticks with them is like, they heard a customer story. I don't think that's necessarily wrong, but that has a certain, like, emotional resonance that seems to drive a lot more decision making than numbers. Again, I don't know that's necessarily like the wrong thing.

CEOs are much more attached to the thing they heard from the customer they were sitting across from trying to do an upsell that they got yelled at than whatever the dashboard says.

Dori: Yep, yep.

Benn: And I think if you start to give people access to tools that are like, I can just chat with all of my support tickets and it'll tell me stuff and it will express it in a way that is English and therefore probably has a bit of emotion attached to it and I can read the tickets. Like, which one is the CEO going to use?

Are they going to go to the dashboard and the data team and ask a bunch of questions and get this kind of mixed report, or are they going to ask the thing that spits out an answer in 30 seconds that has like, here are the five things they're most upset about. Here are five citations. That's what it is. Like they're going to start relying on the other one.

And so I think the thing that happens with BI if something material happens to it is we just start to try to answer questions more directly by basically querying unstructured data, which is a more straightforward thing to do because like you could ask a chat about anything and I'll give you a pretty decent answer and it's like not that wrong.

It scales where it's not like I talked to 10 customers and I just got some anecdotes like no, this is, this is an aggregate of every customer's opinion and it's really easy and fast and kind of convincing. And so like it seems hard for me.

And if that is the competition of like who gets the CEO's attention, effectively, like the user researcher that has read every support ticket and every customer communication and can effectively summarize them or the data analyst that can go off and produce a hard to understand report, I think the user researcher is going to start getting a lot more attention.

And so it's like that is basically like something other than BI. And so yeah, I think like the risk to BI is not, "oh my God, we all get replaced by like analysts who write SQL queries." It's more like we stop trying to do a bunch of math and instead just like ask for the vibes.

Dori: So data people listening, become better storytellers, I think is a good way to summarize that too.

Benn: Yeah.

CL: Or convey the vibes.

Benn: Yeah, I am a believer that vibes will become more important. I think that AI makes vibes more important. What we do with that, I don't know.

Dori: Yeah, well it goes back to I think our earlier bit about how much can you really get from the numbers from a lot of these smaller B2B SaaS. So then it's like well if it's going to be vibes anyways, if they're getting the same like quality of decision and it's instantaneous, they don't have to wait for somebody to write a query and get back to them and then it's in words they can understand and then easily ask follow up questions. That's a different world.

Benn: And, like the other part of this. So say you're a B2B SaaS thing. I think if you just forget AI, if you were a CEO who wanted to understand your business and you're like, oh my God, we have to make real changes. I think like we did this super deep data dive into looking at every usage pattern. Or I just sat down over the weekend and read a thousand support tickets. I think like people are gonna be like, oh, you read a thousand support tickets? I would much rather care about your opinions about the product now, having read that than having spent a week trying to look at numbers.

The problem is saying reading a thousand support tickets takes a long time. And what if you have 10,000 and like all this stuff? But I think nobody would really say that the signal is less in those a thousand support tickets. People would actually be like, the signal is way higher than that. It's just hard to extract.

And so I think if that's easier to extract, which it certainly is now, why would we not shift towards that type of signal as opposed to the signal that is kind of this like exhaust that comes out of people's usage pattern?

Dori: Yeah, it's like a direct feedback versus, an economics term, "revealed preferences" that we're getting from some of the data.

Benn: Yeah. And I mean people will then be like, but revealed preferences are truer. And I'm like, kinda. But that requires a lot of interpretation. Like we're not that good at that interpretation. There are some places where I think that is true. Or it's like, actually nobody ever used this product. Probably is not a sign that it's useful, but it's like pretty high level stuff.

Whereas if you're just trying to say like, what do I do to fix my business? The stated preference of what people seem upset about is probably a better place to start than like your attempted inference of revealed preference through the buttons they're clicking.

Dori: Or you may not have all the things captured. You could have a lot of noise. You don't have enough scale to really get at the inference part.

Benn: Yeah. The other thing I would add is we have spent like 30 years building an enormous amount of infrastructure to collect quantitative data, to collect numbers. Because we had stuff we could in theory do with them, like event streams and like all of these like telemetry stuff about what people are clicking. Like this website that we are currently on is instrumented to hell about everything that we do on it because it is feeding some data pipeline somewhere that people in theory can like do analytics on.

None of that stuff existed. It existed because we had stuff we could do with it. Like once we started to have all this data infrastructure that could process it, we're like, whoa, how do we collect even more of this stuff? Because again in theory there's information in it.

I think the same thing can happen with like this unstructured stuff where we haven't bothered to collect a lot of it because what do we do with it? Like all of the tools now that sit on top of customer feedback are designed to sit on top of existing sources of information, sitting on top of customer calls, sitting on top of support tickets, because that's the stuff we had.

But there's like a feedback loop where once we can make use of it we'll be like where do we get more of it? And I think like there's a bunch of pipelines that could be built to collect more of that information that makes it a lot more complete than it is today. Like a very easy example of this is you could certainly imagine a website having like a pop up that will pop up and be like we will give you $5 to your Venmo account right now if you record 30 seconds of what you think of our website.

You would never do that today because like what in the world am I going to pay a thousand people? I'm pay $5,000 for a thousand 30 second snippets that I can't do any with? Like probably not. But if you can just automatically stuff all that stuff into an LLM, a thousand bucks or five thousand bucks for 1,000 30-second snippets from people about what they think is tremendously valuable.

And like maybe that mechanic doesn't work, I have no idea. But like you can imagine a hundred of those sorts of things where it's like actually how do we collect all this information now that we have something to do with it.

CL: Right. So more potential value extraction from unstructured things that we previously hadn't collected. Right? But that being said, I think you're still working on somewhat structured things, like the benchmark you're working on. Tell us a little bit more about that.

Benn: Yeah, so a side project type of thing I've been working on recently-- As you all I'm sure know, there are lots of benchmarks to figure out if AI is good at various things. Most of those things right now are either general knowledge, take the SAT or how well can it reason through like these logic problems. But there's also all this like domain specific stuff about like how well can it code, how well can it use a terminal, how well can I be a doctor, whatever.

There is not really one for proper analytics work. There are some for like text-to-SQL type of stuff of like, can it take these five tables and write a SQL query that like aggregates these numbers in the right way? But this type of work we're describing, especially when like, it's very messy in these contexts where like we have a database with tons of tables and it's all confusing. There's not anything that represents that environment terribly well.

And so we've been working with some folks, and CL has very generously contributed some work to this as well, and would love to keep doing that, of like trying to build basically a set of problems that are more representative of the real world here, where it's like you are an analyst. Here's a Snowflake database. It's got tons of messy tables and tons of loose ends and like duplicative stuff. And it's not like there's three tables and one's called customers.

And I say, how many customers do I have? It's like there's a hundred tables called customers. None of them are labeled well. Some of them have numbers, some of them don't. Some of them are the same, some of them are different. How many customers we have, I don't know. Like, that's much more representative of how this really works. And so it's attempting to create problems like that in an environment where it's like you can't quite as obviously reason your way through it and see how well various AI robots can solve these problems.

CL: Yeah. And then what we see is that when we start measuring things, things get improved. Right? From the beginning of ImageNet and all the SWB-bench. So this ADE-bench project is really bringing a real world data analytics problem to a control environment where people can experiment with different agent settings and so on.

Benn: Yeah, yeah. And so part of it's like what's measured is improved. And I think the aim would obviously be like, if labs start to have using a benchmark like this, then they will start to like basically build models that try to do well on these scores. Like a lot of these labs now very much seem to like teach to the test. Like, I don't know. Gemini 3 came out today. The top of the thing is like, here's how we score on all the tests. And so clearly that is a thing they care about when they build these models.

But also a lot of it is the people who are building like agentic tooling. So the hexes and stuff of the world. Or there's a bunch of these like analytics chatbots, they don't have great ways to evaluate this themselves. They have their own internal benchmarks, but they're not like stuff that is published. Plus, if you are evaluating these tools, it's really hard to know which ones are good.

Like a lot of it is very vibey where it's like I use, I tried this one, I tried this one. This one seemed to give me better answers in the three tests I gave it. So sure, we'll go with it. And it's a way to attempt to give a little bit more of a framework around, like, can you actually see which of these tools work?

CL: Yeah. Can you give our listeners a sense of how well the foundation models are doing for these real world problems we're now having in the benchmark?

Benn: Sure. So the first version of this, I don't know, it has like 60 or so tasks in it. The foundational models got, it's very hard to like, the number itself, I would not say is terribly meaningful. I think the scores were like 40% is what we, of the first pass of things.

In some ways that is a tuned number where like you want it to be, I kind of ideally would want this thing when we have all the sort of complete suite of tasks to be like 25%. You want to know that you're not giving us stuff that's impossible, but you also want it to be like, have some real room to improve.

But like again, you can basically make that number as high, or as low as you want because you can give it like some really like here is a problem where the problem is a missing comma. Go solve that. Like it should be able to solve that where it's like, here's a very complicated thing that requires you to navigate a number of different tables and all that kind of stuff is, is obviously way different.

But like that's even giving it things like missing commas, they don't always get that right. Like, to be fair, models, especially when you have like bigger and more complicated projects, they tend to spiral. They tend to be like, I found this thing. Oh, I look for all this other stuff.

It tends to confuse itself just because it's, the agents here are things like Claude Code, they tend to load in a bunch of context looking for what's wrong. And then it seems like when they do that it's like, all right, I am, I am looking for an ambiguous problem. I found a bunch of things I'm going to go, like, futz around with.

I was just showing this to somebody a couple days ago, and it was like, it solved. It was the problem. Literally, it was just a missing comma. It solved the problem with a missing comma, but it deleted all the configuration files. So nothing worked. And it was just like, why'd you do that? But, like, that's the kind of thing that they kind of find themselves doing from time to time.

Dori: Yeah.

Benn: And so, you know, sometimes the really complicated stuff is hard and sometimes really easy stuff has plenty of pitfalls, it seems as well.

Dori: Yeah.

CL: Yeah. But I think in no time, we'll see a lot of improvement in this area now that you've created this benchmark and then more people are playing with that. So we're going to put you in the Data Debug round. These are quick fire questions, short answers. Are you ready?

Benn: Great. Let's do it.

CL: First programming language you love or hated.

Benn: This is like a stupid answer. I hated it. But, like, when I first learned how to do JavaScript, like, to use it at all. It was the first thing that, like, you created something. Like, it made a thing appear on a page and you're like, oh, my God. Like, I have made the computer do a thing.

Dori: Yeah.

Benn: With other stuff. Like, I learned R in college and it was like, okay, this, like, it's math. It writes a script and it spits out a number. Like, that's kind of fine. But, like, the first time I, like, built a couple of things. I was trying to build a map. It was a map to figure out how expensive weed was in different states.

Dori: Haha!

Benn: Yeah, it was for a friend. I don't remember. There was a website called priceofweed.com and so I ended up basically, I don't know why I did this in hindsight, but I learned how to do it through that. And it was like, this is like, oh, my God, I made a thing and it made a picture. And that's pretty cool.

So it was like, I hated it because it didn't make any sense. And I don't know why there were so many curly braces, but it was cool to make a thing.

CL: All right, tabs or spaces?

Benn: Spaces. I used the tab key, but I want it to make spaces. I don't know if that's same.

CL: It's still the same. Yeah.

Benn: All right.

CL: Biggest bug you've ever shipped into production.

Benn: This wasn't that meaningful. But back when I was at Yammer, we were doing user agent parsing. So we had to figure out what, like, devices people were using, like, iPhone or Android or what browser people were on. And I misspelled browser. I left out the S, so I just spelled it "Brower." And it ended up, like, basically baking it into all of the downstream stuff from it.

And at some point, it was just like, it became so prevalent across. It was like, there's no way we're to actually update all these things because it was like, in other tools and stuff like that. That was like, in dashboards. And so it was just like, this is just Brower forever. I'm sorry, don't ask why. So it's just--

Dori: Just cultural.

Benn: Yeah. Not like a big bug and in a meaningful way, but it was like, there were times we were like, why is this spelled wrong? It's like, don't worry because it just is.

CL: Great. Next one. The go to data set you use for testing anything for a data project .

Benn: NFL touchdowns. I have this data set that's like a scraped data set of every touchdown that's in the history of the NFL. So it's like the type of touchdown. It's useful because it's things like the yards, the type of touchdown, the quarter. There's all these variables in it that there's like a categorical variable, like the type of touchdown. There's like, the yards. There's like a kind of distribution to it that looks fairly normal.

There's some trends over time or, like, passing touchdowns have gotten longer. So it's actually very useful. There's dates on it where you can look at it by year, by month, or by day. It's like, actually, like, a very useful thing that was not super big. It had some variability to it that had actual real patterns.

You could kind of ask it real questions that you sort of can anticipate the answers of. Like, the passing average of passing touchdown was probably longer than the average rushing touchdown. I don't know. It wasn't the flower petal size.

Dori: Yeah. What is one lesson from outside of tech that influences how you build?

Benn: Okay, this is a quote that I like. it's about writing. It's a Jia Tolentino quote. It came from a friend who went to, like, a Jia Tolentino webinar. So I don't know if it's written down anywhere. But she was talking about writing stuff, and she had this line that was, you have to write it over and over and over until you can stand it.

And I think, like, there's this thing that happens with people, like, I write stuff and I hate how it sounds and whatever, but, like, I'm just always gonna hate how it sounds. It's a little bit like, I hate how my voice sounds on a podcast. That one's like, a little bit like, if you listen to yourself enough, you actually stop hating how it sounds. Which is, like, kind of a weird thing.

But I think the writing thing is the same where people kind of assume, like, the stuff that you write, if you don't like it, it's because you are too critical of your own stuff. And the way I took her point was, no, actually, like, you should continue to be critical until you like it enough. That, like, that's a little bit of a cheat to say, like, well, I'll just never like my own stuff. It's like, no, you will.

You just haven't gone far enough. And I think that's kind of true of a lot of things that people make. You should make it until you're proud of it.

And, like, the "I want to stop because I'll always see the imperfections," it's like, well, then fix the imperfections. That if you like it, then you know it's good. Because, like, yeah, your bar is high, but that doesn't mean your bar is too high. There is a point at which it becomes painful. There's a point at which it's no longer fun, but, like, getting through that point is the point. That's how you make it good as it becomes where it's no longer fun to be doing this anymore.

If it's fun the whole time, then you probably haven't made it as good as you can make it.

CL: All right, what's one hot take about data you're willing to defend on the podcast?

Benn: I mean, it's not that useful. I don't know. Like it's overrated. It's very hard. I mean, I don't know. That's the other thing maybe is it's like, people underappreciate how hard it is. They're like, well, okay, so there's an inverted version of your question of, like, what's the biggest mistake I've made? There's like, what's the biggest success that I've had. And the answer is, like, there's not very many.

I've done this for a long time. And a lot of people who've done analytics work have done it for a long time. And you ask them, what's the thing they've done really well? Like, what's a place they've really made a difference? And most people can come up with two or three things that are their big success stories of the thing they found that was really valuable. You have three big successes in a decade , and that doesn't feel bad. That doesn't feel like a bad batting average. Which is kind of nuts. But I think that's indicative that this is just hard.

Yeah, I made some dashboards and stuff like that and these things. I made a dashboard that a lot of people use, like, people have those stories. But, like, where's the time you found something that was really insightful that genuinely made a difference on the business? The answer's like, it's less than once a year.

CL: Yeah.

Benn: And I think that's, again, as just representative. This is very hard.

CL: Maybe we should create a data touchdown data set to represent data success.

Dori: You made me think about, what would mine be?

Benn: Yeah, I mean, I've asked this question. I don't know. I've done this sometimes before of like, I don't remember exactly how I phrased it, but, like, I do think this is kind of an interesting thing of, like, what are the big successes that data people have? And it's not a super common thing, but I don't know.

Like, VCs do this. They have one success in their career and they make a bazillion dollars and they retire. So it's not like there aren't jobs where one home run is all you need. I don't know that that quite is how data stuff should work or how it does work. But, like that doesn't mean it's bad. It's just like, I think that that distribution is kind of the reality that perhaps it's worth it, but it's like the distribution that we must accept.

CL: Cool. And then what's the nerdiest thing you've automated in your own life?

Benn: I've written a bunch of scrapers for things that are like, well, okay, I don't know. They're like various fancy Google Sheets. I've made a bunch of fancy Google sheets. They're not so much scrapers, they're more like Google Sheets that have way too much formatting fanciness.

Dori: It all comes back to an Excel.

Benn: Well, as a somewhat unrelated story. So for like four months last year, I worked for the Harris and Biden campaign. Part of my job towards the end was building the election night forecasting. It was basically live election results, which is typically somewhat of a vanity project. But like, again and with against Trump, basically there was an expectation that Trump was going to come out and declare himself the winner. And we wanted to have like a real kind of tight view of what we actually thought was going on so that if it was very close and he said he won, we could say like our numbers say xyz. Here's why we don't think that.

So i t was actually a really critical thing. We built like the entire thing in Google Sheets. Like there was a pipeline underneath it that did a bunch of scraping of like precinct data and ap-- I mean, it's like a pretty impressive, like, apparatus that people built. But I built like kind of the reporting infrastructure on top that was just like the world's most complicated Google Sheet.

And it like worked surprisingly well, which was a little bit disappointing to be like, oh, I guess we could have just done all this stuff in Google Sheets all along. But yeah, Google Sheets, I became a fan. You know, you could do a lot of things in Google Sheets. There was a brief push to try to start the Google Sheets engineering team.

CL: Yeah.

Benn: On the campaign, but we were all too depressed.

CL: Wow! Okay, so favorite podcast or book that's not about data.

Benn: I was recommended to me by a friend I recently read Stay True. That's it. It's a memoir.

CL: Oh. Yeah, it's a Taiwanese author. Right?

Benn: Yeah, yeah, it's a memoir about-- I mean, it's very sad. It's a memoir about a guy who was in college and his best friend got murdered. But it's just very good. Would recommend.

CL: Yeah. Final question. Where can listeners find you?

Benn: Mostly this blog if you want to get yelled at once a week about nonsense, it's benn. substack.com. Beyond that, there's a whole suite of Benn properties on the internet that you can find on Benn.website. There are various links to the conglomerate of Benn-dot domains that exist, but most of them are just redirects to silly things.

But yeah, the Substack is really the only place that I'm a regular participant.

CL: Thank you for all the writings you've done for the community. And thank you again for being on the podcast. It's awesome to have you.

Benn: Yeah, for sure. Thanks for having me.

Dori: It's been an absolute pleasure.

Content from the Library

Visit library

Feb 3, 2026

Podcast

Data Renegades Ep. #7, Truth-Seeking Data Systems with Bryan Bischof

On episode 7 of Data Renegades, CL Kao and Dori Wilson sit down with Bryan Bischof to explore his journey from pure mathematics...

Jan 13, 2026

Podcast

Data Renegades Ep. #6, From Big Data to Curiosity-Driven Insight with Roger Magoulas

On episode 6 of Data Renegades, CL Kao and Dori Wilson speak with Roger Magoulas about the real bottlenecks holding data...

Apr 8, 2025

Article

Data Council 2025: The Foundation Models Track with Dr. Bryan Bischof and Tom Drummond

Heavybit is thrilled to be sponsoring Data Council 2025, and we invite you to join us in Oakland from Apr 22-24 to experience 3...