1. Library
  2. Podcasts
  3. Generationship
  4. Ep. #56, Vibe Coding for Data with Mark Brocato
Generationship
31 MIN

Ep. #56, Vibe Coding for Data with Mark Brocato

light mode
about the episode

On episode 56 of Generationship, Rachel Chalmers sits down with Mark Brocato, founder of Mockaroo and creator of Fabricate, to explore the evolution of synthetic data in the age of AI. Mark shares how a simple internal QA tool grew into one of the most widely used synthetic data platforms and discusses how agentic AI is transforming software development, testing, and data generation.

Mark Brocato is an entrepreneur, software engineer, and creator of Mockaroo, one of the world's most widely used synthetic data generation platforms. He later founded Fabricate, an AI-powered synthetic data platform acquired by Tonic AI in 2025, where he continues to lead product development. With more than two decades of experience building developer tools, Mark specializes in synthetic data, AI-native software, and scalable engineering systems.

transcript

Rachel Chalmers: Today I'm thrilled to have Mark Brocato on the show. Mark's an engineering leader and entrepreneur, best known as the founder of Mockaroo, one of the world's leading synthetic data generators, launched in 2014.

The idea for Mockaroo came while Mark was watching QA engineers struggle to test complex life science workflows at a startup called BioFortis, inspiring him to make realistic test data easier for everyone.

With over two decades in software development, Mark has built tools for developers at Sencha, Layer0 and beyond. In 2024, he launched Fabricate, the AI powered synthetic data platform that was acquired by Tonic AI in 2025, where Mark continues to lead its development.

A Ruby, JavaScript and Rust developer, he divides his time between Sparta, New Jersey and Tallinn, Estonia. Mark thank you so much for coming on the show.

Mark Brocato: Thank you so much for having me, Rachel.

Rachel: Nobody starts a company for fun. Tell us exactly what stresses you were under that forced you to start Mockaroo.

Mark: Two things, and I may be pushed back, that nobody starts a company for fun. Maybe at least people start a company for release. So, like many developers, I was working at a company that had a very niche product in life sciences that was a very expensive product. And on a good day you might have like 10 users, but those users are paying a lot.

So the company was doing pretty well. But I sort of had an itch to scratch in general to build something that would be more impactful in terms of a lot of people using the tool, not that the work we were doing wasn't impactful. It was in a very special way. It was, you know, cancer research and all that. But I wanted something that was like, I'd get a lot of feedback from folks.

So that was in the back of my mind. But at my day job I basically was responsible for all the development, including the QA. And like many products, life sciences products have very complex workflows and it takes sort of a lifetime of learning to understand how to use the product like a scientist would. It requires significant domain knowledge.

And if you're testing a product as was done back then, sort of monkey testing where you would fill out a screen manually, if that's step one of a ten step workflow, by the time you get to the tenth step doing it that way, your test is sort of invalid, your findings are invalid, the way that you judge the UX and the outcomes, totally invalidated by the poor data that's coming in.

So I saw these libraries that were out there like Faker and Forgery and the Ruby World and they were very simple. It helped you create realistic looking names and ages and all these things. And of course those were accessible to developers, but not necessarily QA engineers who weren't going to write code.

So I wanted to just build an experience around that type of tool that anybody could use. And it took on a life of its own.

It was used internally. That company was very gracious. They allowed me to essentially throw it up on the Internet and see what happened. And almost immediately out of nowhere, people just started using it. I have no idea where they came from. I answered a few posts on Stack Overflow back in the day saying this could be a solution to your problem, people looking for fake data for things.

And it bootstrapped traffic from there and it took off as a product pretty quickly, almost out of nowhere, like a thing that you couldn't repeat. So it was very gratifying for me to turn that into mostly a side project for many, many years and eventually a company.

And it definitely helped me scratch that itch of like, I want to do something that a lot of people use and a lot of people see and something that's simple, that people can hop into it in a few minutes, understand how it works, get some value out of it and then go about their day, you know, with some problem solved.

Rachel: Even the name sort of expresses that joie de vivre. Instead of Forger and Faker, it's Mockaroo. It's a jamboroo of mock data.

Mark: Yeah. One of my side projects is music and recording things. And there was an application back then, Vocaroo, where you could record vocals or singing and whatnot. So I just took the "aroo" from that. It was very common back then. There was file sharing that was like "file-hippo." There was always like a thing and then an animal name. So it took like two seconds and it worked.

Rachel: You don't have any Australian connections though, do you?

Mark: No, but you're not the first person that asked that. You're like the hundredth person that's asked that.

Rachel: You're appropriating my culture. Haha!

Mark: I literally, I think that's the like, Kangaroo crossing public domain road sign that I took as the logo. I had no idea that anyone would ever use this. So it was like, can I make a thing that looks like a product? You know, very innocent. Haha.

Rachel: Yeah. And that's like the classic story of an incredibly successful product is it scratches your itch, you think it might scratch someone else's, you toss it up there, you answer a couple of questions on Stack Overflow, and suddenly a gajillion people are using it.

That's the lightning striking that makes me as an investor, that makes the hair on the back of my neck stand up. Because you try to create the conditions for that in a lab and nothing happens. You just play around and put a road sign on the product and suddenly everybody's using it. So kudos.

Mark: Yeah. I mean, so much of what we do in industry is almost like justifying our existence. Like, okay, we have an idea and now we need to build out features to gain more users. But that sort of original, hey, it solves a problem and no one can deny that it solves this problem. That's a great start for a product.

And that's what Mockaroo was like. It was absolutely on day one, helping my internal team, so it would probably help others. And, you know, that's how things get started.

Rachel: I love it when new shiny things appear in this way. How did it become Fabricate?

Mark: Ah. So years of watching people use it in ways that were beyond my expectations. So Mockaroo was really about let me build one CSV file. And it got popular enough that it reached this inflection point, like many tools do, where it would be used for things that it probably shouldn't be used for. And sometimes that's the death of products and then sometimes you can sort of react to that.

So people would build out entire relational models painstakingly through Mockaroo and really suffer through. I had never envisioned that use case, but it was amazing the dedication that people had to building out these complex things with Mockaroo. So when I was sort of between jobs I finally got the opportunity to say, okay, what if I were to re-envision Mockaroo to support entire databases in one click?

And that was the original vision of Fabricate. And essentially that's what Tonic bought. But in my first year of Tonic, it changed a lot. Fabricate, when I first conceived of it, it was like Mockaroo on steroids. It was everything Mockaroo could do, but all the relational integrity between tables and really fast. It was built with a Rust core instead of a Ruby core.

But what Fabricate became pretty quickly after I joined Tonic was entirely AI driven agentic data creation. So now instead of what you have to do with Mockaroo and Fabricate, which was you'd have to configure a system. You'd have to understand all the buttons to click and all the settings.

Now you just talk with an agent and say, "this is what I want my data to look like. Maybe here's an API contract, here's a SQL schema. Make it happen." And it figures out how to make it happen.

Rachel: Tell our listeners a little bit more about Tonic AI and how Fabricate fits into what they're building.

Mark: Yeah. Tonic has three products. And so the two that came before Fabricate, which were developed in house, were Tonic Structural and Tonic Textual. And they both basically allow you to take sensitive production data, anonymize it, and then use it in a lower environment by people who wouldn't have access to that production data.

So like sensitive healthcare records, sensitive banking records. If you're a developer trying to reproduce a problem that occurred in production, you really want to get access to production, but you really can't. The laws just say you can't do that. So Tonic solutions, Structural and Textual, help you anonymize that data so that you can use in a lower environment.

Structural is all about relational databases and structured data in general, whether it's Mongo or something like that. And then textual is documents, and textual is more often used for training AI models. AI models all train on text. And so this is sort of like a solution for companies that are training their own AI models on all of the documents that they've accrued over time.

Fabricate slots in the middle of them, but with a completely different angle. Fabricate is all about creating data from scratch. It's data when you don't have data. Versus those applications. Assume you have production data and you just need to get a hold of it.

So Fabricate is like, I have a schema, but I either don't have access to data. No matter how good you anonymize it, I just can't touch it. Or I don't have enough data yet because it's a new feature that hasn't been launched. Or I need to simulate what a year's worth of usage is going to look like.

Fabricate creates something from nothing. So it handles both the structured and the unstructured data. So both sides of those two applications, but always starting with nothing.

Rachel: Can you give us a real example from your customer base of how this kind of synthetic data can help QA engineers solve problems?

Mark: Yeah absolutely. I mean, and there's such a wide variety of uses of Fabricate because it's agentic and it doesn't force much on the user. So we've got people creating everything from like, mocking a whole relational database to mocking a stream, like an incoming stream of events, maybe from a Stripe or something like that, to--

We had a customer that's literally training a AI model to detect, like, fraud in contracts or problems in contracts, where all they want is to upload some example contracts and have us spit out a lot more that are like that. So the use cases are such a wide variety.

They always are sort of grounded in the quality of your tests. And the quality of your software are really dependent on how realistic you're simulating the actual behavior of your users. So it all starts with high quality data. Sometimes you want to simulate what actually happens in production, and sometimes you want to simulate edge cases that could happen in production, but be very deadly if they did happen in production.

So simulating real production is like what Textual and Structural are great at. Fabricate can do that as well. But Fabricate also allows you to invent scenarios that might happen, edge cases, even broken data. And so it's all about improving the quality of software.

Or one use case that's common for Fabricate as well, that's not for the other tools, is for folks that are demoing software, either internally, but probably more impactfully to prospects where I know I'm going to go pitch to this, you know, large retail company, I know their products. I want to simulate what their life is going to look like using my analytics tool or my AI tool.

And so I want to simulate a year's worth of their data that will resonate with their users when they see me demo the software with the data that came out of Fabricate. The users never know anything about Fabricate, but they see what their future could be like if they were using the piece of software that's being sold to them.

Rachel: That's very cool. Being able to create an empathic imagined world. That's very cool.

Mark: Yeah. And that's essentially what launched Mockaroo as a product. Back in the day, you know, Mockaroo had very humble beginnings. And I was actually on my honeymoon when I got a call from somebody very high up at Salesforce.

Rachel: Wow.

Mark: That wanted to use Mockaroo at a scale that was like ridiculous compared to what Mockaroo could do. Mockaroo could generate like a thousand records and they were like, well, I need gigabytes of data.

And of course, you know, like a good entrepreneur, I was like, "yeah, we can do that. I'll figure it out." It sounded like this might be valuable. And that person was very kind and sort of coached me how much I could charge for that sort of a thing. And that was Mockaroo's first sale. It was actually one of its biggest enterprise sales before I made sort of $1, it was that one thing.

And then I had the amazing hill to climb of that not taking over what the product was like--

Rachel: Right.

Mark: You hate to land a whale as your first customer. That's like such an anti pattern. And probably most investors would say, no, don't do that.

Rachel: Yeah, try not to be a feature of Salesforce. Haha.

Mark: Yeah. But I got very lucky and that wasn't the case. And right after that I sort of launched the like, you can pay with your own credit card and it was a very cheap thing and it got a lot of customers and it diluted the impact of Salesforce.

Rachel: Yeah, yeah. No, that is actually one of our standard pieces of investor advice. Don't land the whale too early.

Mark: Yeah.

Rachel: And then ChatGPT happened. We're all living in the post ChatGPT world. What surprised you the most about how LLMs handle data sets?

Mark: Yeah, it evolves constantly. So I've been watching LLMs over the last few years and initially it was, "you'll never be able to create realistic data with LLMs at scale. Like it spits out one record at a time, sort of painstakingly."

But then LLMs that were capable of being the basis of agents sort of arrived with probably GPT4 and then eventually Opus and things like that, and they're really able to write the code that generates the data. That's the tactic that we take with Tonic and they're accelerating so fast now you have to think about software in a very different way.

I think classically, SaaS, you thought about really understanding the use cases and how to design a system to allow users to learn and do those use cases. And you wanted to try to keep the capabilities as sort of narrow as possible. In reality, I think all software is going to be agentified. And so now what the software can do and do well is far beyond what you as a designer probably design it to do.

You have to think in terms of, like, what capabilities can I give an agent to solve this very broad class of problem? I meet with so many different customers of Fabricate, and the ones that are actually the most satisfied are the ones that are most apologetic for how they're using Fabricate.

Inevitably it comes up that a customer will say, now this is working really well for me, but I sort of have to apologize because I know this is not one of your core use cases. Those are the ones that are most satisfied because they're sort of asking what they believe to be difficult and unrealistic for a product to be able to do.

And like, lo and behold, it does it because we've built it to be AI native to where we just sort of understand the broad case of I need to be able to create data, I need to be able to understand what the user is conveying, I need to be able to understand schemas, and I need to be able to iterate until I got something good and unleash it with the tools to be able to do that.

And so customers come in and ask for, like, "I need a patient cohort that has this really weird sort of edge case. And I don't need the output in like, CSV. I need it in this format that almost nobody supports, you know, some variant of HL7, can it work?"

And like, you know, a minute later it spat out exactly what they want. That was a thing that was hopeless years ago and is now sort of table stakes for anything that's, you know, in the AI age.

Rachel: And that's kind of a perfect use case for it because that's like going right back to the transformer roots of what the LLMs were built to do.

Mark: It is. That's actually what I've told our sales team. We should almost have a sort of pithy campaign around that where there is a growing backlash around AI and AI slop and AI fabricating a future that doesn't exist.

Well, this is an appropriate use of AI when you want realistic, but definitely fake data, because you definitely don't want real data, real user data, this is in the wheelhouse of AI. So we should almost sort of pitch that as like, this is one of the rare uses of AI that is fundamentally good to use AI for.

Rachel: Yeah, I love that. Robots, Transformers in disguise. Haha.

Mark: Yeah, yeah.

Rachel: You yourself have shifted from this config base to this agent front end. Has that widened the aperture for the people who could use it? Has it become easier to reach greenfield projects? Has it become easier for business users to jump on this?

Mark: All of the above, yes. I mean, one is there's no learning curve anymore and we sort of invite users. In fact, people might have to learn to be more lazy when using AI.

Rachel: Haha!

Mark: More greedy.

Rachel: Oh, I am born for this moment. Haha.

Mark: Some of us are. I am too. But some very polite users will not have unrealistic expectations when in fact AI can deliver some pretty amazing results. And it actually often works better to just assume the AI can understand you like a person and do what you will in order to make this thing happen.

So the accessibility has skyrocketed and anybody can pick this thing up. We've trained the agent to be inquisitive and to sort of pull the user through how best to work with it in order to be able to achieve a goal.

The best thing also about having agentic products is they're their own help desk, they're their own help chatbot in that you can ask the agent that you're using to create the data, "how do I use this application?" It can fill you in on details of how to reset your password and things that are outside of the actual core use case.

And then also use cases have just exploded because Mockaroo was row and column data, Fabricate was lots of rows and columns. And then Fabricate, when we added AI, was an AI studio for synthetic data of any kind. Documents, data streams, Kafka topics, databases, ad hoc formats.

Like any sort of thing that you needed a database for or data for, it absolutely can do. And I think it actually does the more non traditional way out there sort of cases better and more impressively than it does the, "I need to generate a database of a thousand tables."

Rachel: Very cool. Mark, looking ahead, what is your boldest take on where synthetic data is heading in the next couple of years?

Mark: So it's uniquely difficult to say because the world is moving very fast now.

And I think the whole way that we're developing software is being turned on its head and automated to an extent that we never thought before. I think it's fair to have, and actually productive to have sort of unrealistic expectations of software going forward. Because the way that you need to build software going forward is sort of AI native, where it's more about what skills and tools to expose to an agent to solve a very broad class of problems than it is to target very specific narrow use cases.

So where synthetic data is going to go, one thing that's going to drive it is certainly wherever this blast wave of AI is going over time. And one of the reasons actually I decided to join Tonic was the Textual product, which I thought was absolutely right at the right time of everybody's going to be leveraging and training models to work on the documents they've accumulated.

Like enterprises have a lot going against them when it comes to competing with startups. You know, anybody that's building AI native as sort of a leapfrog on an enterprise, but what an enterprise has is data. And they better capitalize on that.

And we sort of talk to a lot of startups who are Fabricate prospects who have this existential question. They don't have access to data because they're a startup and buying the data is prohibitively expensive or impossible. So they always bring up: Can Fabricate be used to create data that I can use to train an AI model?

And this is such a fun philosophical question because it's like, is this the beginning of model collapse? Because you're training an AI model to detect stuff that another AI model created.

Rachel: It's the worm Ouroboros eating its own tail.

Mark: Yeah. In a broad sense I don't think it's a great idea. But in a narrow sense there is an appropriate use case for it, which is if you're using an expensive model to create really high fidelity data so that you can train a cheap model to identify that data fast and you know, in an expensive way, that I think is a valid use case of AI to create data to train other AI.

Other than that, I think it's the beginning of sort of model collapse and a sort of existential problem for the growth of AI. But startups will continue to disrupt as time goes on, if not more now than ever. So I think we'll have to find a valid way of using synthetic data to bootstrap new AI efforts.

Some customers have sort of told me their idea is to bootstrap that effort to the point where they can get enough real data to then start shifting the training over to real data. Hey, building something from nothing is always difficult and requires more luck than it does skill probably.

So every startup faces its sort of potential doom to failure, but there will be some lucky folks that manage to do this sort of bootstrapping. So that's certainly one way is like synthetic data somehow being a valuable grounding for AI training.

Because I'd hate to see a world where only the established companies could survive because of the data they've accumulated and then making that data that they've accumulated in some cases in ill gotten means from the general public more and more valuable. I would rather have startups have a crack at the at bat and bring balance and competition to the world.

Rachel: Mark, you're so knowledgeable and you have such broad perspectives. What are some of your favorite sources for learning about AI?

Mark: Oh boy. I don't know if this is gonna sound sort of hackneyed, but I was a long Twitter user back when it was called Twitter and I still sort of gravitate towards X. Maybe my feed is just polluted now with all these AI enthusiasts, but I find a lot of the things I have to go research from that as a feed.

Reddit too. Although Reddit I think has a lot of sort of fear and negativity towards AI in there. Reddit is fear and negativity about everything, but I get a lot of just sort of my reading digest from X and then wherever that points to. It wind up being many different blogging sites and whatnot, but just sort of keeping a finger on the pulse of what is the latest thing that came out that maybe needs to be on my radar.

So it's not a particularly adventurous answer, but it still is a pretty darn good aggregation for adventurous and sort of positive developer community stuff amidst all of the other problems that social media has. I'll leave those alone. But as a developer tool I still find it quite useful.

Rachel: Have you played around with Mastodon or Bluesky at all?

Mark: I did. I was cross posting to Bluesky for a while. I never got a ton of engagement from there. I also cross post to LinkedIn, but that's a sort of special crowd. So no, it hasn't really taken hold . Like Hacker News and Slashdot too. Sometimes they're sort of tried and true things but I just actually feel like the feed from X points to enough things to where I can go do deeper reading and exploration there.

Rachel: If I make you Prime Minister of the Soloar System and you get to control how everything goes for the next five years, because that's definitely what prime ministers do, and that's how that works. In five years, you've been able to influence the world in ways that you think it should be influenced. What does the future look like?

Mark: Oh, boy. Part of me thinks that the future is very bright. And what I would try to make happen if there was some magic way to make it happen is, these AI tools are really rewarding for people that have ideas and maybe have shortcomings in their ability to make those ideas happen.

The key is now that the barrier of entry to seeing an idea through to fruition is ever lower. Like, that's been the whole story of coming up on 20 years in software development is the barrier to entry got lower and lower and lower with AWS and SaaS and all these things to where you could, out of a garage, build a company that could sort of take on the world. Now it's even lower. With AI, you don't even need a staff or a very broad knowledge base. You just need an idea.

Rachel: You don't even need a garage. Haha.

Mark: Don't even need a garage. Yeah, don't need a place, but you need an idea. You need passion, you need creativity, you need a little bit of fearlessness. So if there was a way that I could continue to unlock the benefits of that for the adventurous and for the people that are creative, I would love to do that.

But I realized the world is not all just a bunch of people that tend to be creative. Like, there are craftsmen too, who take great pride in, they don't come up with the ideas, but boy, do they see them through in fruition. And boy, do they build the best version of whatever. And I would hate to see there not be a place in the world for those people as well.

I know some people have gotten such fulfilling careers out of being the world's best programmer who writes the best code. And sadly, I think the value of that is moving on as AI writes perfectly serviceable code. But the people who are really benefiting right now are the folks that have a surplus of ideas and were maybe hamstrung by their ability to code those ideas. Well, no more. Now they can feel free to experiment as well. So what I'd hate to have happen is that we sort of become reliant on AI for ideas. I really do feel like the creativity is what makes us human.

I'm happy to yield some of the productivity to the machines, but I would love to be able to exercise my own creativity and have other people do that and continue to use AI as a tool to tighten the feedback loop on that creativity.

So, like, I have an idea now I can build it, test it, use it, feel as it was a bad idea, move on, adjust it, and then have something that is really fulfilling and that's not necessarily just work. Like, I see a lot of people use AI in music.

I wish I could do that. I can't sing, but I play guitar. So for years I wrote songs and I would record things and they would always just be instrumental. I could right now, if I wanted to have an AI vocalist that would sound like my favorite vocalist and finally put lyrics over my songs. And that'll never make any money or anything, but it would be very fulfilling.

So I love the fact that AI can be this tool that helps people express themselves creatively. I just hope that it doesn't remove people's ability to become craftsmen and actually do the work themselves, when you find fulfillment in that.

Rachel: Yeah. I worry that people vibe coding and not having to come up through the craftsman journeyman way of understanding how systems affect other systems and the larger implications of the code they're writing means that we're eating our seed corn. Like, if there's no jobs for junior devs, then how do people grow into senior devs?

Mark: Yeah, I think you could sort of have the perspective that that's always been what software development has been about. As we've developed higher and higher level languages, there was always the senior dev, now I find myself as a senior dev, who would say, "if you never coded in Assembly or you never wrote STE, you know, you don't know. If you never managed memory directly, what do you know about programming? You should not be given access to production."

But so many applications have been built by people who never had to manage memory. You know, they make valuable contributions to software.

Rachel: That's why we have Rust. Haha.

Mark: Right. That is why I chose Rust, because I never did manage memory until I needed to. And Rust doesn't allow me to mismanage it. But that's a different hot topic.

But yeah, I think the traditional junior developer that learned how to write some code and now is going to write a lot more code. There probably isn't room for that anymore. But what there is is now a junior can learn the impact of their decisions on design and architecture and UX much quicker because they can have an idea and in minutes they can test that idea.

Like they can have the thing built and see how it affects real users, see how it makes them feel when they use it. Your ability to have an idea and exercise it and test and have that very tight feedback loop is now way better than it ever used to be.

I saw a post recently that was like the pessimistic approach to that, which was, the reality is your organization doesn't have a lot of good ideas. And so now we're lowering the barrier of entry, of testing all these stupid ideas. And so you've created a whole new bottleneck that's fundamentally not good for your organization.

But that's too pessimistic. I'm an optimist, so I like the idea of learning from your mistakes. And it's just a different kind of mistake. Instead of learning from your mistakes, like your code doesn't compile, it's now learning from your mistakes of like this architectural decision or this decision about user experience had a negative impact or had a positive impact or affected other things in a way that, you know, the way the thing that you brought up.

Rachel: Yeah. Last question, Best question. A generation ship is a starship that takes longer than a human life to get to its destination. It's a lifeboat. It's a biosphere. As part of your prime ministerial privileges, you get your own generationship. What are you going to name it?

Mark: Oh, naming and caching and off-by-one errors are the two hardest things in computer science. Shiparoo? Haha.

Rachel: Haha! I love it.

Mark: It's never done me wrong, so I'll go with the "aroo" I suppose in the end .

Rachel: It's jumping through wormholes. Boing, boing, boing. Haha.

Mark: Yeah, it's jumping with a whimsy and a stolen Australian public domain--

Rachel: A stolen valor. Mark it's been a delight to have you on the show. Enjoy Shiparoo and have a great rest of your day.

Mark: Thank you so much for having me.