Library Podcasts

Ep. #2, Data Literacy with Maura Church of Patreon

Guests: Maura Church

In episode 2 of The Right Track, Stef speaks with Maura Church of Patreon. They examine data literacy within orgs, the democratization of data science, and tactics for integrating data scientists into product teams.


About the Guests

Maura Church is the Director of Data Science at Patreon, whose work spans product analytics & experimentation, business analytics, machine learning, and business intelligence. She is a Harvard graduate and was previously a product quality analyst at Google.

Show Notes

Transcript

00:00:00
00:00:00

Stef Olafsdottir: Hello, Maura and welcome to The Right Track.

Maura Church: Thank you.

Stef: Could you tell us a little bit about who you are, what you do and how you got there?

Maura: So my name is Maura and I run the data science team at Patreon.

Patreon is a platform for creators to get paid by their fans.

We're about eight years old as a company and I run a centralized data science team of eight people.

How I got here? I initially thought I wanted to be a software engineer so I just did software engineering, but I was working on a really small software team with like one other person and it just felt very heads down and very like having blinders on, you know and I really wanted to be in a role that was a little bit more high level and kind of seeing the whole business.

So it got a job at Google and worked in data science and analytics there, fighting spam and abuse, and then came to Patreon about five and a half years ago, started as a data scientist and now run the team.

Stef: Nice, okay, that is actually a fun story about how you thought it was a little bit too heads down being in software development.

What was the thing that made you think this would be less like that?

Maura: Yeah, I think if I had been in at like a modern software company, like a big tech company now, it probably would have been a different introduction to software engineering 'cause software engineering can be like quite collaborative.

But I was thinking that like data science, or like at the time analytics, would allow me to like kind of have my hands in different parts of the business at the same time instead of just being like, okay, build this feature or like work on this little piece of code.

And so it was this idea that data could like connect different pieces rather than being so granularly focused.

But to be fair, I think it was just kind of a weird introduction to software engineering 'cause a lot of software engineers have like quite collaborative processes, where they're involved in many parts of the business?

Stef: Yeah, I think that's a very good identification but I also think it is a good identification that data science tends to be a cross-functional role, and if you are a good team member as a data person you will want to have your fingers everywhere and have your connections everywhere.

Maura: Yeah.

Stef: So yeah, the world is lucky that you accidentally got kicked off in a software team that was quite introverted I guess.

And now we have you working your data out, which is great.

Could you maybe kick us off a little bit by telling us an inspiring and a frustrating data story?

Maura: Do we want to start with the fun part or the hard part?

Stef: Okay, let's start with the fun part because that's fun and then go with frustrations because that'll lead us to how can we fix it and all those questions.

Maura: Great, so my fun and inspiring data story is, at Patreon we've been doing randomized experimentation, A/B testing for let's say like four years or so.

And we're a pretty small company, like 250 folks. So we're not a big, you know, Netflix-sized experimentation product company, but we last year ran the kind of most important and biggest experiment we've run, which was to make a major change to the layout of the Creator page.

And if you don't know about Patreon Creator page is where the creator tells you about why they're on Patreon what they're offering, what their prices are, why you should join them as a Patreon and be part of their creative community.

So it's kind of a sacred place for creators and making changes to that page is a very risky thing to do because not only does it drive the majority of business and revenue but it also it's like creators kind of creative space and so when we make changes to it, we have to respect the way that they want their page to look and artistic people like tend to know how they want things to look.

So we made a change that was informed by data which was to change the entire layout of the page to be one column and kind of have the prices right upfront.

And the experiment was a huge win. It sent way more money to creators.

And the other cool thing about this inspiring story is we did our first holdout group at Patreon which is like where you take a really small percentage and you keep it in control for a long period of time so that you can study like more longitudinal effects to the metrics that you care about.

So it like was a technical win because we got to do our first holdout group, which was really cool.

It was a business win 'cause we sent like millions of more dollars to creators and what I'm most excited about it was inspired by what we knew about the data of like how Patreons looked at prices and looked at tiers.

So inspired by the data, huge business impact and we got to do some like technically cool experimental stuff.

Stef: That sounds like that is a very inspiring data story, I have to say.

I might leave it to later but I think probably everyone listening might be curious to hear more about the tools that you used for the holdout group and the A/B testing and things like that.

So leaving that and planting that as a seed for maybe revisiting it later.

Maura: Love it.

Stef: So what is your most frustrating example of a day at work?

Maura: Yes, so in thinking about like how data is broken or how it has been broken at Patreon, I think that it's been broken in many different ways over the five years I've been at the company which is kind of fun to see because the way that your data breaks can actually tell you about how mature your data processes are.

But my favorite and most frustrating example was a few years ago, maybe two or so years ago.

We had an outage on what we call payments day which is most of our payments at Patreon, so tens of millions of dollars, process on the first of the month.

So as a Patreon your patronage renews on the first of the month right now that's our current, the payment schedule, and that causes a lot of challenges of scale because you have like huge amounts of load going through a payment system.

So we had a pretty bad outage and part of the outage was that we dropped all of our experimental flags.

So we had a bunch of experiments running at the time and we took all the users that were in experimental groups and put them into control.

And when a data scientist hears that they're like, "Oh crap, all of my variants "are now like put back to control."

Not only is like the experiment are broken but users are seeing really weird experiences because suddenly they had a flag on data they don't have the next day.

So we were like up at 7 a.m. on a Saturday morning running scripts and queries to try to figure out who was in which group and reset them, and the resulting impact was that the analysis for many of these experiments was either like super buggy or we had to redo the experiment and just cost a lot of time and a lot of data strife for us.

Stef: Oh my goodness.

So I'm curious, 'cause you mentioned, okay, you're up on a Saturday at 7:00 AM fixing the things who is involved in that, that, you know, you have the fire extinguisher who is the first person to pick that up?

And like who was running which scripts and on which databases and how big was the team at that point?

Maura: The data science team was I think probably three or four of us and the folks fixing it were like a mix of backend engineers who were able to kind of figure out where these flags, the kind of previous historical versions of these flags were stored and like how we could retrieve them and then data scientists trying to understand what was the impact?

How many people looking at our analytics logging, like how many people saw the control experience that they weren't supposed to see?

Like what was the overall kind of size and magnitude of this issue while we were resetting the flags?

Because part of like any big data bug I think is understanding the impact, right? If it's like a really bad bug, you'll want to understand is it affecting 10 people or is this affecting like millions?

And part of that on the analytics side is understanding if we could kind of keep these experiments, continue to analyze them or if we need to like start fresh ground zero.

Stef: And which one was it?

Maura: It was a mix. We had some experiments we had to like completely rerun which is always such a, it's just painful.

Like we've had many experimental data bugs where the results has been rerunning the experiment and you learn a lot by how painful that process is to say like, this experiment is a no-go and we have to start fresh, but yet we had to rerun some and then others where we were able to save.

Stef: Yikes, this must have been pretty frustrating at the same time, you know, events like this I find sometimes they tend to bring people together.

Maura: Totally.

Stef: I mean, always you never want to have that experience, but when there's a fire going on everyone's like, "We'll work together to fix it."

Maura: And you have a good story too.

Like I think most data scientists I think have experienced this where you think you're running an experiment, turns out you showed the same experience to both users and then you go to do the analysis and you're like, why aren't these metrics different?

And I feel like you only really care about checking the logging that people are seeing different experiences, if you've had that awful experience like running an A/A test accidentally.

So I do think it, like, it puts you on a higher alert and it makes you care more about doing things the right way and knowing what it looks like to do things the right way.

Stef: Yeah, I think that's a good point and in addition to like having these things happen, it's typically like a prompt to do future things a little bit better.

So ideally you want these things like, you know maybe we should gradually adopt chaos data science, like chaos engineering where you have artificially introduced chaos into your systems.

Yeah, yeah, yeah, yeah, like how do you deal with that?

And what can you learn from that? That'll be funny.

Anyway, thank you so much for sharing that.

You talked a little bit about like or you mentioned before that you've been thinking about the different ways that analytics are broken or get broken.

I know you also talked a little bit about how data trust is also sort of a complex thing.

That's not necessarily about data being broken.

I would love to hear you talk a little bit about that sort of how does your data break?

And then maybe, segue that into the trust thing.

Maura: What I was thinking about is how interesting it is that our data breaks in different ways than it did before.

So I would say, you know, three years ago we would have really full data outages, like for example our core revenue and process payments reporting.

We would just miss a full day of reporting or we'd like miss copying over, you know, 20,000 rows.

And those are very painful bugs like that is broken and you have to build systems to fix that.

And now what's interesting is I think we struggle more like today, our analytics are more often broken by--

We have all these different systems and they're inter-playing with each other and sometimes they update at different times.

And so our breaking today looks more like inconsistency like someone saying, "Hey, this dashboard looks updated here, but this one looks like it's two days behind. What's going on?"

And so it's kind of like an inconsistency and then an also an education gap where things are actually broken and trustworthy but the end users of the data within the company need to understand kind of what gets updated when.

And I think the other way is like often sometimes look broken because they were built for a very specific purpose and someone's trying to use that resource or that data product for a different purpose.

So there's also this kind of like this looks broken because it's not answering exactly what I want and it's kind of just like broken by the way that it was designed rather than the underlying data being untrustworthy.

Stef: Yeah, this is a really good identification and I think it touches a little bit on data literacy and not only data literacy for the data that you have but also data literacy for the data that you do create.

I think there's definitely like there are these cases where you designed it and it worked really perfectly for the case that you needed it back then, now it's just you should have something completely different.

But I think, yeah, data literacy for designing data is really interesting.

Can you talk a little bit about that?

Maura: So at Patreon, one of the things with data literacy and the way that we are trying to design things is that our data is very frequently evolving.

Which I think if you're in a high growth startup your data should be evolving, right?

Your databases are changing, your analytics needs are changing, and so part of what our role is as a team is to like keep up with that evolution of data, right?

If there's like a new data source that's available or a new backend table changes in some way we need to adapt our analytics system to it.

But part of like, I think a hard part of data literacy at a company that has a somewhat mature data science team is saying, okay, "This is the way we used to measure this thing "and this report is two and a half years old."

We don't do that anymore. Like we, this report is now deprecated.

And deprecation is a part, I think of strong data literacy because otherwise you going to, we definitely have this problem right now.

Like you have a sea and a mountains of dashboards and 20% of them are the way that the business use to think about measurement.

So I think building strong data literacy across the business is also helping people understand like what has changed over time.

I do our metrics onboarding for all new employees and I think like really savvy employees will say, "Okay these are our key performance indicators today as a business, what are they used to be?

Like what did we grow out of?

And what does that tell us about where Patreon has come as a business over the past seven years?

So I think that's part of it is like, helping paint the picture of how the business has changed and how the data has evolved over time.

Stef: That's a really fun story. Thank you for highlighting that.

But also I love that you do the metrics onboarding for every new Patreon employee.

How frequent is that and how does that work?

Maura: I think we do them every month now.

And so every new employee has a kind of like week long onboarding series with security in it and product and their manager and part of that is a 25 minutes metrics introduction--

And we talk about, it's a great opportunity to like start them young or like starting them when they're just getting into the business and say, "hey, data matters to us here."

"Here are the most important metrics of the business."

"You're going to hear these acronyms all over the place, probably your team goals will ladder up to these in some way."

So it's a bit of education and then it's a bit of like starting the self-serve journey.

So like, hey, if you're looking for data within the business, here's where you goo and here's where you ask questions and no question is too stupid.

So kind of like beginning to spark that data curiosity in new employees and make sure they know like, "hey Patreon is a place where we use data well and we use it responsibly and like, here's how you can kind of help us foster that culture of data at Patreon."

Stef: I love that and I mean, you talked about, you've now talked about both.

There are all of these dashboards that are completely outdated and having good data literacy among the things that means is a person can spot if a dashboard is outdated or if that's no longer the thing that the company should be watching, how does that happen?

Is that then internal marketing of analytics and reports?

Is it some sort of a documentation for like what are the metrics that you're currently looking at?

How do you keep that data literacy going on?

Maura: Yeah, I'd love to hear ideas from teams that feel like they've figured this out.

We're still figuring it out.

We have kind of like regular cleanups and audits and I think the way we're really trying to get at this is to really build towards sources of truth.

Like, Hey, if you're on this product team here are the three dashboards that are really relevant to your domain.

Or if you're on marketing like here's your key funnel dashboard that's going to help you figure out what's going on.

So it's like building those sources of truth and investing in them.

Obviously, it takes time to build those kind of like big robust kind of comprehensive data resources for people.

And then having these like regular, I don't know quarterly cleanups or so, which are not the most fun thing in the world.

But if you were in an office and you could order some pizza and get everyone some like soda, spend a few hours just going through your spaces, deprecating old things, finding out what's unused, that's how we've been approaching it so far.

Stef: Wow, this is a really fun story.

And so this is both a data team, sort of a governance role and also like a collaboration sinking with the teams on what they need and making sure they know what they should be looking at.

Yeah, that makes a lot of sense.

Maura: Yeah, and this kind of comes back to trust too because I think a big thing that can drive down trusting data is like you open a folder called Revenue Reporting and there are like seven different revenue metrics for one month, and that's where people start to lose confidence if you have kind of like old crufty versions of metrics hanging around.

And at Patreon what's interesting is like, I think a big part of we have a really complex business.

We have like, we're kind of a payments platform but we also have some network dynamics.

We also kind of our SaaS company 'cause we're selling to creators so it's a pretty complex business for especially new hires to come in and understand.

And so we actually do have kind of different versions and different flavors of the same metrics.

So one of our key metrics is called, Total Membership Volume basically total process payments and there are all sorts of flavors of TMV.

Like you can talk about TMV from creators in Germany, or like TMV from podcasters or TMV retention, TMV acquisition and so part of folks coming in and learning to trust the data is understanding like, hey these aren't actually 20 different definitions of the same thing.

There are different flavors that correspond to different ways to understand the business.

So I think a big part of building trust in data is again, that education piece of here's how the data looks at Patreon and here's how it might look different than this other company you maybe just came from.

Stef: Yeah, this is sparking a lot of questions for me.

I know we have a lot of exciting things of other exciting things to talk about but do you see any of those things, the clean-outs and the audits and the, I guess the standardizations--

Do you see them happening also organically and sort of peer-to-peer or would you say that the data team owns that a little bit?

Maura: I kind of see our role as owning that and I'd love to encourage more kind of peer to peer ownership of that.

I think that where I do see it as like really good strong product teams will say, "Hey, this is how we define this metric and this is where the source of truth is."

And the PM will be like really involved in that.

The PM the designer, the engineering manager will really own that definition and say, "Okay, here are the five old dashboards we used to use. Like let's deprecate those."

But I think a lot of that has to be strongly in collaboration with the data team.

And I generally find that the data team tends to drive that and say, "Hey, actually, the way we're defining this metric is maybe not the best way given what we're trying to achieve," in collaboration with what we call the tripod, the PM, the engineering manager and the designer.

Stef: The tripod.

Maura: The tripod.

Stef: I love that is that. So that's a PM, the engineering manager and the designer. Did you say that?

Maura: Yep.

Stef: Where does the tripod come from?

Maura: I think it's a like mixer permutation of the Spotify model.

I actually don't know enough about product management to know like, where did this specific definition of the tripod come from.

But we definitely are in the kind of like Marty Cagan school of product development, so somewhere in that lineage I would guess is where that term came from.

Stef: So this is sort of migrating as a little bit into org structure, although maybe before we migrate into org structure, because I think all of the things that you have already built in Patreon and how you've structured the teams I think a lot of people can learn a lot from that wrapping up a little bit on sort of a the data trust and the data issue conversation.

Can you talk a little bit about sort of those underlying reasons for why people don't trust data?

"I don't trust this data." Why do people say that?

Maura: I think there are a few reasons.

So one that I see pretty frequently is because the data is surprising in some way and the person doesn't know yet enough about the business to understand why the data is trustworthy.

So for example at Patreon, we have most businesses look at like cohort retention curves like how do your new groups of users retain over time?

And what's really exciting about Patreon is we have awesome creator retention.

Creators make more over time if they stay on Patreon, which is great.

But if you were to come from another business and look at our cohort retention curves you're going to be like, "I don't trust this data, this looks wrong. It looks like unlike any other cohort curve I've ever seen."

And so part of it is like not understanding the business context to believe the data.

I think another that's very common is sometimes anyone that works with stakeholders and works in data science will know this, but people sometimes have a story they want to tell. And if the data is not telling that story, they will say I don't trust the data as a way to kind of like put the blame on the data scientist rather than just facing the truth and like trying to understand rigorously what's going on.

And then I also think I've seen people say I don't trust the data when they don't trust the person the person telling them the data.

Whether that's the data scientist or the PM or the analyst.

And part of I think being a good data scientist is building trust right?

Being rigorous, being candid, understanding the data and if you don't have that trusting relationship in the person you're less likely to believe what they're telling you even if it's true.

And then of course like the fourth reason would be the data is actually untrustworthy, right?

You have issues in your data quality or it's bad.

And so, yeah, but at Patreon at least I think we're in a place or I'm pretty proud we've gotten to a place where most times the underlying data is accurate and it's one of the other reasons that's driving that statement.

Stef: How did you get to there?

Maura: Lots of years of hard work.

I think, when I talk to other data leaders and folks trying to fix data quality I think the biggest thing we have done in the past few years to improve it is focusing on where it really matters.

Like you're never going to have 100% of your data warehouse be 100% reliable or if you are like, you should start a data company and help other people do that.

But at Patreon we focused on like, there are a few key metrics that we need to get 100% right, 100% of the time and start there and make sure those systems are like really well-documented and reliable and robust and have alerting and error reporting and then like branch out to the other areas of the product.

And I think the other thing that's important with how we've gotten to trustworthiness is understanding that there's that statement like, "All frameworks are wrong, but they're useful in some way." Or something.

Like technically like, when you look at metric definitions or how you're defining data like all data is probably wrong in some way, but if you stick to a definition, it's useful for interpretation.

Like as long as your interpretation of it and the actions you're taking from it are not grossly off base because of the underlying definition it doesn't need to be 100% accurate.

Again, the ideal is that's 100% accurate but I think that many people strive towards that impossible ideal without actually getting 20, 40, 80% of the way there along the way by making improvements.

Stef: Yeah, I think that is a really good identification.

I've seen a lot of companies try to do a sort of a holistic, you know, fix their data type of thing.

Settle your analytics steps in two weeks or something.

And it's been my experience that that is very unrealistic to do and that sort of the teams that get to a better place they apply a different strategy to that.

I just wholeheartedly agree to.

Maura: Yeah, and I think that ties in a lot to data literacy.

So for example at Patreon, like one of the areas of our data warehouse that is not great is our API data.

So our data about integrations with Patreon built on top of our API.

And the important thing is we know that, we know that that's kind of a crufty area of our warehouse that needs a lot of love. And we can tell that to people we can say, hey, if you're querying on top of this area of the warehouse it's a minefield and there's, it's a big spaghetti mess and the queries are super complicated.

And so if you can communicate that to people, that is also building their trust because they're saying, "Oh, I am working with the data science team that knows where the potholes are ad where the issues are."

And that I think is an important first step doesn't mean you have to fix 100% of the things all the time, but being able to tell people like, hey, here be dragons, like there's some stuff here that's going to be a little wonky is a big part of building that trust and literacy.

Stef: Yeah, I love that.

That is I think a really good point, educating people on what the data actually means.

On that note, we've talked about the data literacy, we've talked about data trust, and we've talked about who works with who a little bit, we've mentioned who was on call on a Saturday morning when all of the control, all of the experiments just broke down entirely.

Can you walk us a little bit through the data org structure at Patreon?

And I am curious particularly to hear, how does basically data work with product and how does data work with engineering?

How does data work with finance? Are the data individuals integrated into the teams?

And then you've also talked a little bit about sort of people querying things who is doing that?

Who is doing querying and what is the scale of self-serve a little bit in Patreon?

Maura: So today we are, we're a centralized organization.

So everyone's reporting into one data science org and has data science managers that understand data science.

And then we have a kind of it's called, the Product Data Science Model or I call it our staffing, which means that within the centralized data science team, data scientists are staffed to partner teams for a certain amount of time.

And that time could depend on how long that initiative might need a data scientist for.

It could depend on that data scientist interests in getting like knowledge and expertise in that area.

It could depend on teams changing missions, whatever.

So kind of within the data science team we have a few folks that are staffed to product.

So they're working with product teams with PMs with engineering managers, with engineers, with designers to build better product.

We have some data scientists that are staffed to our Go-to-Market teams, which is like marketing and brand and account management and sales, and kind of working with those functions to measure what they want to measure.

And then we have some folks that are working with our like finance and operations and FP and a legal to get them all the data they need to make great decisions.

And then we also have folks that are working on, I would call it a little bit more horizontally.

So like an ML engineer and folks doing machine learning, where they're building tools, services, models that are going to be leveraged across that spectrum.

Technically that's four pillars but we only have eight people on the team.

So people are wearing many different hats, right?

Like you might go from building a machine learning model one day to like building a dashboard the next because we're still a pretty small team and a pretty small company.

So you can't get too specialized.

Like if we had one person whose entire job it was to make dashboards like they'd run out of dashboards to make.

So that's the way we are today.

In terms of the product side, what I found is, at least at Patreon, different product teams have different needs for data scientists.

So for example, at Patreon we have a fraud team that's helping identify and fight fraud on the platform.

Since we process half the amounts of payments that is a way more core data science focused product team than a team that is building our merchandise product, which is pretty new that allows creators to add merchandise to their membership.

Now, merchant membership we still need to understand how is it being used?

What's adoption look like? What's retention look like?

But those are easier data science questions than like let me build this machine learning model, that's going to fight fraud real time in the platform.

So we staff based on like relative data science need and kind of like where strategically data science can have the most impact within product.

Stef: Yeah, that makes a lot of sense.

So there is a, would you say, you have a hybrid of centralized versus integrated?

Maura: We do yeah, we have some sort of weird hybrid.

The nice thing about that there are trade-offs to each but being centralized, the nice thing is that data scientists, first of all have managers that understand data science, right?

If you have a data scientist reporting to a product manager or to an engineering manager it can limit their career growth if that person doesn't understand kind of what a data scientist looks like versus a senior data scientist versus a staff data scientist. And then the other reason is because it allows us to have these cross team collaboration for methodologies and for tools and for packages we're using.

And a data scientist can say, "Oh I did an incrementality analysis over here and I think we could apply that "to this cool area over here."

And you don't get that if you have data scientist kind of like fractured across the company.

Now the downside of course is that as a centralized team you sometimes are missing the business context of what you're working on.

So you have to be sure as a data scientist that you're building that context, those relationships, you're attending the product team meetings, you're sitting with the marketing manager and learning about their goals, which obviously takes like more work in relationship building than you might have, otherwise, if you were embedded.

Stef: This is all about culture building and again, we're talking about relationship building and it's such a similar thing to building that data literacy.

It's such a similar thing to building the data trust.

Just making sure you have like people everywhere.

You're sort of sneaking your data everywhere.

I mean, there are definitely also, you know, you mentioned there's a downside there off the centralization, but then it seems like you have already optimized for that by having what do you call the staffed model, right?

Then you have the benefits of having people integrated into the product and engineering teams.

Can you talk a little bit about that as well?

Maura: So for the staff model I think that, I mean--

We've just, as I mentioned like Patreon has changed, we've changed pretty rapidly and so that's part of the challenge of being staffed is like you might spend six months working on the Patreon experience side of the product and then switch into the creator side of the product and suddenly you have to learn a bunch of new tables and a new kind of part of the product org.

But I think the benefit is that at least every PM has a data scientist that they can go to and that they're partnering with.

And I think that that PM/data scientists partnership just like piano and design and PM and engineering is like so important.

That when a PM is going into a product space and saying "What problem are we solving? Who are we solving it for? How are we going to solve it?"

That they not only have the data at their fingertips to help guide them but they have a data scientist next to them who can advise them and say, "Here's how we should measure this. Here's how we should think about the performance of this area of the product."

And so that's like how that's working with that kind of staffed model within product.

Stef: Can you tell me a little bit about, you've mentioned, there can be also some happy accidents when you have that model.

Can you tell me a little bit about that with the integrated data scientist?

Maura: Yeah, I think the happy accidents I think that you end up, or at least we have the ended up creating data scientists that have more of a product mindset and then PMs and product teams that have more of a data mindset.

Like I think that if data scientists were completely embedded in product it kind of would be like, your role is to do the data science stuff.

But because we have the staff model, data scientists are thinking a bit more strategically about like, what are we trying to build?

I at least have an expectation that the product data scientist on my team have strong opinions about our product strategy and about what we're building.

And then I also have the expectation that PM's like have strong opinions about our data and will come to us and say like, "Hey this area of our data stack needs some love." or "Hey like this metric doesn't make any sense."

So I do think having, like, I call it like the balance of power can kind of help us push on each other's orgs in a healthy way.

So like data scientists can kind of be the like truth tellers like have a PMs reporting on a goal, a data scientist can be like "That's totally sandbagged."

Or like, "That's not true, that's just not how it is."

And sometimes you can't say that if that's your manager, that's reporting on the goal.

And same with like product coming to data science and being able to offer us like really candid, sometimes critical feedback.

It can be harder if that's within your org.

So I do think that there's like a healthy tension that it can create in helping the org kind of like push each other to be better.

Stef: Yeah, I think you're spot on and just to share like a nugget of what I've been through myself we did a very similar journey at QuizUp, where we started off with a single person.

It starts off with a single person.

And then that person supports a bunch of other people in making better decisions.

And then you need more people to support all of the people and gradually build a team of people, and then we sort of, at some point, we absolutely split out.

So there was no centralized team.

I mean, there was a weekly session where we like a discipline meeting but ultimately we were a part of other teams and that was a good experiment because it helped us learn that we really needed something centralized.

And so both of those models taught us that we needed the other one as well.

And I really appreciated that that learning for everyone on the team, so everyone understood the impact of like being entirely disconnected from the product teams and also the impact of not being able to support each other in tooling and ideas and things like that.

And I think that brings me a little bit to, I know we've talked about a little bit before like, where do they sit?

And I know that's not maybe like a COVID friendly question but I think it can be translated to like which team meetings do they attend?

Which team funds do they join? Which sync meetings do they attend?

Which knowledge sharing events do they attend?

Can you talk also a little bit about that, how that works in the Patreon model?

Maura: Yeah, so joke about like the seating thing because when we were in the office I was like very insistent that data science sit as close as possible to engineering and to product.

Because I do think, you know, again, non COVID times, there's a magic that happens when you can like walk by a product manager's desk and you see that they have an Amplitude chart up and you're like, "oh what are you looking at?"

And they're like, "Oh, I'm looking at conversion on our mobile web flow."

And you're like, "Oh, what are you seeing?"

You know, and you like, can you strike up this conversation about data?

So when we go back into an office like I will continue to try to have our data scientists sit near our product teams where possible and there is kind of like magic about being co-located in that way.

Stef: I remember this feeling.

Maura: Yeah, you're like, I see you looking at data. What are you looking at?

Stef: And I like I really remember this feeling like, I'm so happy when I see charts up on screens of people, and I'm like "My job is going well." It's like a great success.

Maura: Or a feel like walk by a meeting room, and you see that someone's like presenting a dashboard that you've worked on that's really exciting.

I do think for the, you know-- It's a fine balance when you're in the kind of model that we're in, which is sort of this hybrid because if a data scientist attended every single product meeting for the product they're working on they would not have any time to do data science.

Like there's a lot of discovery and delivery and scrum meetings and things that are happening that the data scientist doesn't really need to be there.

So helping and data scientists figure out like what's the most high impact use of my time?

Like which meetings should I be in? And in many cases that's meetings about performance and goals.

How are we going to measure things? How we're going to instrument things?

Those kind of initial meetings for like, what is this product even going to look like?

But I definitely am one of those data science leaders where you know, once a quarter, I'll just look at what my team's calendars look like and see what are people going to, and then see, like, what are the meetings we're not in?

Where maybe it would be useful to have a data science voice at that table?

Like maybe there's a strategy meeting where data science should be more involved, or maybe there's a retro where we didn't talk about this bad analytics bug because there wasn't a data scientist in the room.

So I'm definitely a fan of like calendar stalking in a healthy way occasionally, and helping ask data scientists like, what aren't you involved in that would be a valuable use of your time to be involved in?

Stef: That is a really insightful way of seeing things.

So you're saying there are two ways, one way, in which you proactively as a data science leader in the company just like what's happening.

And then also you support your people by asking them if they have already identified something that they're missing out on.

Maura: Exactly.

Stef: That makes a lot of sense.

And when you identify things like that, and of course like, Patreon is a very data-driven organization.

In general, there's a lot of respect for data and you've had good success making data-driven decisions.

I think though in many organizations that is not the case. There is, it's at a different maturity level.

So I'd be curious to hear you talk a little bit about you know, when you identify those opportunities who do you go to as more are today at Patreon?

And what would you maybe advice to someone who might not necessarily have the same amount of data buy-in already in the company?

Maura: So today for me, like as Maura, I go to our product team and our product leads.

Whether that's like our directors of product or our product managers directly and say like, here's an opportunity where I think data could inform this decision.

If you don't already have though, that kind of, I'd say that permission to like freely put data where you think it's most useful, then I think it's important to find the people in the organization that do want to use data that do believe in data and build those relationships first.

Because what I found is if someone doesn't want to use data to inform their decisions, it can be very, very hard to convince them otherwise. And it can sometimes take that person looking at other people who are using data and having success with using it and building strong arguments and building strategy in order for them to want to use it themselves.

So I'd say like find the people in your organizations that are excited about talking to a data scientist and start there and see what you can do there.

Stef: Yeah, that is very good advice.

You always have to just convert the people who are the closest and it's going to be a low effort, high impact every time.

Maura: Yeah.

Stef: Thank you for sharing that about sort of the Patreon team structure.

I also wanted to talk to you a little bit about hiring for these roles, because I know you have been very explicit about the responsibilities of the roles and sort of, yeah--

Could you just share how you have gone about finding people for these roles that are actually passionate about product analytics and the context being, it can often be difficult to find people who are analytics specialists but actually have any insights or passionate about product versus just SQL or something like that.

Maura: Totally, it's hard.

I think that my approach to it has been, and a lot of this is from either my own mistakes or from seeing other folks in the data industry make these mistakes is to be very explicit and to say like, "Hey, for this role, the majority of your job is going to be SQL and query and working with PMs and building relationships and then building like super impactful product that's going to be used by millions of people across the world."

And if they light up at that, then like, that's great.

And if they say, "You know what? I do rather spend most of my time using what I just learned in my fast AI course and like making neural networks."

Then I'll try to say, that's awesome. I'm super excited for you, that sounds interesting and like, this is not the role for you.

What's interesting about like where Patreon is out right now is we actually have, as I mentioned like, data scientists wearing many hats.

So you could come in in a product analytics role and still spend a portion of your time doing machine learning or making data visualizations or writing Python.

Like everyone on the team is fluent in Python and SQL.

So we're, we're a backend skilled team, right? People are writing DAGs or writing an ETL but as we grow that will change.

Like we will have roles that really are 95% analytics and roles that are really 95% machine learning.

And so I think that's what I try to do. I just be really upfront with candidates about it.

I think titling roles is also a really interesting thing.

So we've titled our roles, for example, data scientist, product analytics, which if you were to see that title like four years ago, you'd be like, what's a data scientist that also has analytics on it?

But now it makes way more sense, right?

You are bringing the rigor and background of data science and statistics and model building.

You're applying it in a product context and for a product like Patreon that has some productionized models, some ML, but we're not Instagram, right?

We're not making algorithmic home feeds. A lot of your work is going to be on the more analytics side.

Stef: I love that. Yeah, I think you've hit the nail on the head with--

You're basically search engine optimizing also your ads, people are looking for data science roles.

Maura: Yeah, if someone wants to do, you know, core research data science and they see analytics on the title they're going to be like, "Ugh, I don't want to do this."

And so I think that is also part of it is like we want to find people that want at least a portion of their role to be in analytics at least for these like product data science roles.

We have other roles in the team as well.

And so we want to have people that have that analytics mindset that are going out of question and saying, "Okay, what can I answer with a query?"

And the query is just as valuable as spending the day working in Scikit-learn.

Stef: Yeah, exactly. I think you're also touching on something which is that, I don't want to do that analytical.

So it has analytics on the title. I think you're touching on something that I find really interesting.

So when I kicked off the data science division at QuizUp, that was 2013 and data science and data scientist as a title was super young as a concept at that point.

I feel like maybe there's like a caste system of titles.

Maura: Like some gatekeeping, yeah.

Stef: There's some, yeah, there's some gatekeeping.

And I'm wondering, have you given that any thought like why do data scientists think it's better to be data scientists than being analysts?

And I'm not saying that it is just to be absolutely clear or that everyone thinks that, but there's this sort of like a mind mindset type of thing.

Maura: It is yeah, this could be a whole separate episode, it'd be a whole book.

I think it's really interesting and I think what's cool is I've seen it start to change.

Like I've seen people, first of all, there are companies that have unified their ladders.

So if you look at Airbnb and Lyft, like they have data scientists that are doing this broad range of work.

Some working like way more in the research side, some working way more on the analytics side, and they're all called data scientists.

I think Lyft also has a separate real analytics org. But Airbnb did the same thing.

They said, "Hey, we're all data scientists and we have three separate tracks.

And Patriot is the same, everyone on the team is called a data scientist.

So I think that the gatekeeping, I mean, part of it I think came from salaries and compensation saying that, "We're going to pay a data scientists more than an analyst because those skillsets are more valuable."

But what's happening I think with data science is it's becoming pretty democratized.

Like anyone can, with their amazing tutorials online, the education and open-source software available, you can learn the core data science skills.

And so I'm kind of seeing a world where actually someone who's like thoughtful about strategy and understands logging and data and understands like, how to build a really complex metric is on par with someone who has done the first like, logistic regression tutorial.

So I'm beginning to see that change and I think that there are people in the industry too, that are saying like, "Hey you can do analytics work and be in data science."

And there's this cool, really broad spectrum of the way that you use your data science background and your data science tools and the way that you're applying them is really at the end of the day, like what your job is.

So I'm seeing it start to change at least it has, I think, in the past five years. I think the gatekeeping is already official.

'Cause I think if someone comes in and says, "Hey I know how to query and I'd love to learn how to write Python."

I say, awesome, cool. It's like teach you about pandas.

So let's teach you some Python skills and if you have someone who's like, "I've only worked in machine learning and I don't know anything about SQL query optimization."

I'm like, cool let me teach you how to write a great efficient SQL query.

So I'd love to see people be learning on both sides of the stack, because I think when many companies the full range of skills is going to have impact.

Stef: Yeah, that's very well put.

I think it's also interesting to think about a good analyst will never just give you the answer to your question.

And that is a specific skillset.

And I think when you have good analysts working with product teams then you will be building better products for sure.

Maura: Absolutely, yeah.

Stef: I wanted to maybe take this a little bit into, I guess then who else is involved in analytics?

So you are doing all sorts of data science and all sorts of analytics and you have experiments going on, you're making big-bet, you're analyzing feature releases and things like that.

Can you talk a little bit about what is involved in that process and who does what?

How does the PM behave? Who opens which tools and things like that?

Do PMs open SQL? Do engineers open Amplitude, et cetera?

Maura: So today at Patreon, I think that the PM is deciding like what we're building and engineering is deciding how we're building it, right?

There's like the what and the how. Obviously design is very involved there as well.

On the analytics side, it's sort of similar, like the PM should have an informed opinion about what we're going to track and then the data scientists and the engineers should help understand how we're going to track it.

So today, like as you know, Patreon using Avo, so there'll be an Avo branch that's like suggested by an engineer or it's suggested by a data scientist or even have some PM's making their own Avo branches which is exciting.

And engineers owning the implementation of that analytics and then it usually a data scientist is QAing that implementation and making sure the logging is coming through.

When we think about querying, it's a little bit different.

Patreon historically has been kind of a place where like only the data scientists write queries.

And then we have a couple of really savvy engineers that we'll write queries too, and we're starting to open that up and say, okay we'll teach you how to query.

We'll kind of show you the layout of the database, you know, go have fun.

And that's a delicate balance because if you go too far then you end up with 400 people using the data warehouse and making 400 definitions of their own favorite metric.

So there's like this fine line and kind of seesaw of democratization that we're trying to like inch towards more query writers.

So at Patreon we're sort of like on that journey to PM's writing some more of their own queries, engineers writing more of their own queries and beginning to cross that one.

Stef: Can you talk a little bit about then the other tools involved like maybe-- I'll leave it at that as an open question.

Maura: So we today our analytics is all off of a big Redshift warehouse so that's kind of like where all of our data is flowing into.

And we, in terms of instrumentation, you know we're using Avo for event definition and then that's flowing to Amplitude and then we have a copy of Amplitude in our data warehouse in Redshift.

And then we also have a copy of our production data and we pipe in all other sorts of data sources like stuff from Salesforce and from other third parties that that Patreon is using.

And then the data science team has a bunch of, what we call like data science tables that are built through ETL and Airflow.

So on top of our production data, our airflow data, all of these other things, we build these like beautiful, I'll call them beautiful, analytics tables.

So that if you want to understand how much a creator made in a month, that's a one-line query instead of like a 20 line query on top of the production tables.

So that's the stack we're using and then I'd say Amplitude--

Most of that query writing is happening in mode which is a tool we use for query writing and visualization.

And then Amplitude is being used by PM's designers, engineers, data scientists, marketers to understand our behavioral analytics, so conversion and event tracking and traffic and all that good stuff.

Stef: That is good stack, very juicy.

Maura: Lots in there.

Stef: Yeah, lots in there.

When someone decides to go into mode rather than pointing and clicking somewhere, in Amplitude for example, what is their role and why would they do that?

Maura: Today, it's usually a data scientist or one of those other people in the organization that's querying or learning to query, or trusted to query, building a report for something that isn't available in Amplitude.

So like a lot of our revenue data or that third-party data we're not sending to Amplitude.

And so that's like the kind of creation flow that's happening in Mode but we also have a pretty big consumer flow in Mode which people using Mode reports to get data about the things they care about and to drive decision-making.

And again, we kind of separate that as Amplitude being this really behavioral analytics space for like the activities happening on the app, the way users are flowing through the site, how creators and Patreons are behaving and then Mode being tying all of that data into Total Membership Volume and revenue and retention and all those other kinds of like juicy or details there.

Stef: Yeah, exactly, what do you want to put in the board report and all those things?

Awesome, I would love to hear you--

I mean, this is a juicy stack, like we've talked about can you talk a little bit about like how the industry has changed over the past couple of years?

I'm definitely curious to hear about this from the stack perspective but I think those are typically very intertwined.

Like what are the industry standards for how we do things and things like that.

And I'm curious to think about even just the past couple of years because things are moving so quickly, if you're thinking about like a different type of timeline I'm curious to hear your thoughts on that as well, some shifts.

Maura: Yeah. I've been talking to the team a lot about how if I were to like to start a new data science team today it would be so much easier.

Like you just have all of these tools there's just an explosion in the tooling space.

The thing that I'm most excited about as a data quality nerd is like all of the tooling for data quality.

Like, Avo obviously plays a role in that, Amplitude has products for quality, even thinking about like the way that we are using Redshift in a different way than we did three years ago thinking about like how it's organized and how it's backrooms and how the data's in there.

So I've seen that change just in terms of like there's now a tool you can put in this particular point in between where your data gets generated by the user and all the way towards when someone uses it that's going to make that process smoother.

So Mode, for example, we started using Mode about three years ago.

And the reason for that is because we're still using Tableau but you're either kind of in this like very professional robust Tableau business space, or you're like writing a query in a terminal and then sending someone the results in Slack.

So, there's now a whole kind of suite of tools where you can get this query to chart flow very quickly.

So that's really how I've seen it change is like a focus on quality and then really companies and tools taking each little step of this entire data consumption process and saying let's like build the perfect solution that's going to solve, make this step easy and like hook up to the other two steps on either side as well.

Stef: Yeah, that makes sense.

I feel like there's a section there, like we're basically talking about, there's so many things so many things have happened in the last two to five years that relate to data quality and tooling and things like that.

And then in addition to that, I think all of those have also driven really the need for data literacy or it's sort of like a self enforcing thing.

We want more self-serve analytics because we saw that all of the data that we had, we could ask her so many questions based on them but we just had to throw more people at that problem.

So we need more people to be able to answer their own questions.

And that sort of, I think maybe also drove a little bit the data quality needs and the tooling needs because we were just trying to bring more people into answering those questions than we're full stack experts in the data stack.

Maura: Yeah, I think there's this really interesting.

I was talking to someone about like there are different models of where you think you should bring in quality to the process.

So for example, there's like the "let's log everything and we'll like clean it up after the fact."

And then there's like the "let's only log like really perfect clean things."

And then you also kind of have the middle framework of like, " let's do a bunch of transformations."

Like I think dbt kind of gets at this and let's like solve the quality issue in the middle.

And I think what you're getting at which is kind of like fascinating to think about is in some ways the explosion of data tooling is a response to this huge push towards data democratization which is like, "Let's open up data to the whole company."

And we open up data the whole company and we're like, "Oh God, it's terrifying, it's messy. It's like really disorganized."

How are we going to build tools to fix that all along the way?

So that's what I mean by kind of that like spectrum of democratization is like you have to be careful with how much you open up data and like for the record, I'm very supportive of really open, transparent data cultures.

I think it drives better decisions, it drives better products, but if it's open and it's a mess and it's hard to navigate and it's untrustworthy, that's certainly not driving better decisions. Then you're going to make the wrong decisions.

But it's interesting to think about the explosion of tooling as like a response to a push for data democratization and for everyone kind of being a data citizen, writing their own queries, making their own metrics and all of that.

Stef: Yeah, exactly.

And I think this, that this framing, I love this framing and I totally agree with you, there's this scale.

And there are definitely pros and cons of all of these models.

The interesting thing I also find about it is there are specific roles that are passionate about each one of those models.

And they have different types of backgrounds and they have different types of skills and they have different types of belief.

And I want to say just like they believe in different amounts in other people also.

And in the ability of self-serve analytics a little bit. Can you touch on this a little bit?

Like sort of maybe if you want to place Patreon on that scale and shed some light on like what are the goods and bads of these three?

Maura: Yeah, do you mean have like where you put quality or you mean of like how much you democratize?

Stef: I mean, that's a-- thank you for asking that question.

I mean, yeah like the "log everything fixed it afterwards," you log thoughtfully and then have something reliable and you have people that are very rigorous here, but then you have also like some lenience and then you have that just like, "okay, just try to get some stuff in here."

We're not logging everything, but just like chip it, chip it, and then you have all of the dbt models short transformation layers.

Maura: We're somewhere in the middle.

But what's funny is if you had asked me four years ago I would say, log everything, fix it later.

Like we don't care, just like put the logs in the product, log every event, log every property, we'll deal with it.

And then we dealt with it that we like really had an issue and things were really dirty and we had to clean it up and it was painful.

And now, especially with Patreon on the scale, like our volume and our growth at a certain scale you can't log everything or maybe you can, but it can be very expensive to log everything and very expensive to spend all the time figuring out all the things that you're logging.

So we're moving a little more towards "log thoughtfully, consolidate when you can."

Obviously we're trying now to log correctly which I think is a big part of that as well.

But then we do have quite a lot of transformations and you know, like ETL and kind of analytics brain that's saying, --

"Okay, we logged 100 things. The business right now really cares about these 20 things. So let's put them in like a really nice shiny table. And then these other 80 we'll kind of just like dump in a big table somewhere."

So that's, we're moving up the spectrum.

I don't think we'll ever get to like, I don't think log-ins ever going to get 100% perfect but I don't think we'll get like to the full spectrum where we log three things instead of a hundred.

But we are being trying to be more thoughtful about it.

Stef: I love that. The other thing is sort of like a shifting a problem.

Maura: Yeah, yeah, you're shifting a problem.

I think that there's a like relationship with engineering piece that's important here because some people don't trust their engineers to log well or they've been burned by things being logged incorrectly.

So I think that's also part of it is like being able to trust engineering to log things and giving engineering the tooling and the education to know like how you want to log analytics as an organization.

But I think part of that some people may have this desire for like very tight control because there's not enough trust built between the teams.

Stef: Yeah, I love that. I think that's right and it's a mutual thing as well.

It's really interesting. I think this is a good segue into what do you think people's biggest misconceptions about how data and product and analytics work?

Maura: At Patreon, I think one of the biggest misconceptions, which I'm trying to like actively fight against, and I see in another org as well, is that data science should be brought in after the fact.

Like bring in data science when you've launched the product, except for obviously data scientists they're like, building the product, building predictive, machine learning or whatever, and rather than being part of the product development process.

And I think if you have a great data science team your product managers will want them in the room early.

'Cause they'll say, "Hey, like Maura she knows a lot about the different areas of the product and who's using what. So we got to have her in the room when we're talking about what to build."

So I think that's one big misconception is like bringing in data scientists.

You only need a data scientist after you've launched a thing and you want to study how it's doing.

The other is something that I've recently kind of changed my mind about, which is that, sometimes for strategic initiatives or for big bats, you don't need to be so granular and rigorous about like what is the precise lift in user retention that we're going to get from this huge big bet feature.

Because if you measure it with that, like fine-grain lens, you're going to get to like a local maximum, like you're never actually going to get to this huge change in user behavior.

So I've been trying to push myself to think a little bit more about like what level of granularity, what kind of rigor and super thoughtful measurement is needed?

Because when you're doing something strategically sometimes you just got to like go for it and see what happens rather than A/B testing everything or getting in there with like a really fine tooth comb.

Stef: I love that, I think that's a also a good thing for anyone who's listening, who is kicking off their data team, or even just kicking out their product team.

Both of those are something that really is worth keeping in mind.

On that note though, we're about to wrap up what is the thing that you would want to leave someone with, the first thing that teams should do to get their analytics right?

Start paying down their analytics depths or take something good off?

Maura: I think the first thing to do to get your analytics right is figure out what you care about as an organization, as a business.

And those are hard, long discussions.

They're discussions that look like would we care more if retention fell off after 12 months or after three hours?

You really got to figure out what matters to the business.

Think about if you had different metrics as different goals what would you build differently?

I'm kind of obsessed with that idea that like your metric, actually in some ways, determines what you build which can be very dangerous when you're designing a metric, 'cause if you have the wrong one you're going to build the wrong thing.

So that's the first thing I'd think about is like what does your business care about?

Have those discussions with product, with design, with engineering, with your leadership team and that will help you get started in measuring right things.

Stef: Perfect. Thank you so much.

Maura: Thank you, Stef.