Ep. #53, Community Metrics with Georg Link of CHAOSS

APR 1, 2020

27 MIN

Ep. #53, Community Metrics with Georg Link of CHAOSS

GuestsGeorg Link

light mode

about the episode

In episode 53 of JAMstack Radio, Brian joins Georg Link of the Linux Foundation CHAOSS Project to discuss the importance of community metrics, which metrics are most useful for modern projects, and what open source tools are available for collecting and analyzing metrics.

about the guests

Georg Link is an open source strategist. He co-founded the Linux Foundation CHAOSS project and is the Director of Sales at Bitergia.

show notes

about the episode

about the guests

show notes

transcript

Brian Douglas: All right, welcome to another installment of JAMstack Radio. On the call we've got Georg Link.

Georg Link: Hi there. My name is Georg Link. I'm from Omaha, Nebraska. That's where I live by choice, but I come originally from Germany so it's been a long way.

Brian: Awesome. Do you want to explain your background professionally, and what you're doing today with the community?

Georg: Of course. I'm an open source strategist , and my mission is to help organizations and open source projects become more professional in how they use metrics and analytics.

I want to help everyone understand their projects and communities , so this came out of research.

I was at the University of Nebraska at Omaha earning my PHD And looking into how organizations engage in open source, and metrics kept coming up and the idea of community health.

So we started the CHAOSS Project under the Lummox Foundation and CHAOSS short for community health analytics open source software, where we talk about metrics and analytics and we define what that all means and build software.

After I graduated I now work at Bitergia, which-- The company has been providing this as a service for almost eight years now.

Brian: That's awesome.

When I think of open source, I think of the random NPM packages that I install into my projects, and then every now and then I'll pop in there and see what's going on with the GitHub commits.

Where did you see this need for community health and metrics for open source projects? Are we talking like, these small packages?

Or are we talking large TensorFlows and all the other large projects on GitHub?

Georg: That's an interesting question, because the need for metrics is at very different levels and different people have different needs for metrics.

They use different reasons for why they engage in the metrics journey, and one of the reasons that comes out of the community is by the contributors that they want to understand who is contributing, who's doing what, "How's the project doing?"

They want to recognize their different contributors and all the different types of contributions that are being brought forward.

Another reason to start a metric journey starts in an organization, like a foundation or a company where they want to understand what our projects are doing that we rely on, or that we want to work with and engage in. There it's very much about mitigating risk as well as identifying areas for having the most impact when they join these communities.

It ranges, so you can have these large projects where you want to understand what is happening but also the small projects, to identify "All the projects that I'm relying on as a company, are there any that are being out of date or they're not updating?"

There are many metrics that you can look at.

Brian: Do you mind, can we talk about specific metrics? Some real life use cases?

Because I'm curious, I know that not every contribution to a project is going to be code related.

So I'm curious of what metrics do you find are most useful for projects?

Georg: The CHAOSS Project, I'm coming back to the CHAOSS project that we have here.

We are looking at all the different kinds of metrics that you can possibly think of, and we started this project three years ago just listing everything out.

Over the years we have come to organize them, so they are different kinds of metrics, and we organize them in working groups just because that's natural for how we think and engage with the metrics.

That we want to group them logically.

So the working groups we have in the CHAOSS Project are the Diversity and Inclusion Working Group, where we look at "How do we understand the diversity of the contributors? The different types of contributions that they make? How do we understand how welcoming is the community, do people stick around? Do we have any biases there?"

Another working group is the Evolution Working Group.

It started out as the growth, maturity and decline working group, where we are looking at metrics around "How is the project overall evolving? Is this a growing project? If you look at the number of contributions, commits, issues, are we maturing and leveling off? Or is this project in decline?"

I'm not saying this is good or bad, we just want to understand where the projects are and some projects have reached their goal and now the people involved are moving elsewhere, and that is perfectly fine.

Those are some metrics there. We have two more working groups, one is risk and one is value.

Then one final one that has meta metrics or common metrics that we share across all of these working groups.

Brian: The value as well as the identifying projects either growing or declining, I think it's super valuable.

I think a lot of people look at hype and Twitter and interactions there, but also even GitHub pushes and commits and how long PR stay open.

That all seems like valuable information, but I think most people use anecdotal evidence to make a statement around-- One of the popular things I heard in the last five years, "Is Ruby dead?"

Because the language itself is pretty synonymous with a lot of stuff, you see the growth in JavaScript.

How do you compare taking that value and compare that to JavaScript, or compare another project or compare this other requests library that does HTTP requests against another one?

Georg: Comparing is something that we hear a lot about, because how do you know something is good if you don't have a baseline to compare it against?

The tools that we are building in the CHAOSS project, especially the Auger tool, Auger is a software that is developed at the University of Missouri in cooperation with the University of Nebraska at Omaha.

Comparing between communities has been a feature from the very early stages of prototyping this tool.

The thing I want to mention though, is we've tried-- Not me personally, but we have people on the team that have tried to look at all the metrics that they collect.

We went to the GitHub archive to really have a long history there, to see "Can we identify trends? Can we compare metrics?"

One of the issues we ran into is "How do you segment or group the projects that you want to compare with each other?"

Because open source projects are so different and every project works a little different, they use GitHub a little different or even their mailing lists, or whatever tools they used. Every project is a little bit different, so the same metric means something different depending on the context of the project.

Comparing between projects becomes difficult , and that's something that we're working on and we're thinking about.

But the much more useful comparison is comparing it against your own history.

If you're looking at the project, you want to start tracking metrics and then see over time, "Are there any upticks? Are there any people leaving? We have a module that is not being maintained anymore."

Whatever the metrics are that are critical to you.

Brian: That's really interesting. Can we dig a little deeper in that Auger thing that you mentioned?

Auger, I'm familiar with the name. Maybe not as familiar, but I've heard of it before.

This whole comparison thing, that's something that is coming through.

You mentioned two universities, one being Omaha.

Is this something that is coming and funded by more university research, or is this something that is open source maintainers or maybe the Linux Foundation, are they specifically looking for this and funding it? Or is it all of the above?

Georg: It is something that gets quite a bit of attention.

The project was funded initially by Alfred P. Sloan Foundation, and I don't currently know if there are any other funds involved since I'm no longer with that research group.

But I know that Sean Goggins, the professor who is leading the research project there , he has worked with Remy DeCausemaker from Twitter.

If you go to the year in review on Twitter, that is powered by the Auger tool in the back.

For collecting the data and then exposing it through an API, and then the team at Twitter built out a nice interface that uses that data.

There's a nice presentation from the last CHAOSS Con, which we can probably put in the show notes where they explained how they worked.

Brian: Yeah, for sure, we'll link that.

Georg: That's the Auger project.

Brian: Let's talk a little more about tools within this space.

The organization's maintaining something called Cauldron, which I had an opportunity to play around with at FOSDEM .

Actually, can we just talk about that, and then maybe talk about some other tools that are in the space?

Georg: Yeah. Cauldron is based on a tool that we have been developing for many years now, Grimoire lab, the CHAOSS Grimoire lab project.

It's one of the founding projects of the C HAOSS project, originally developed by Bitergia, the company that I work at now.

They've developed this toolset, it's multiple python libraries.

One for collecting data, one for enriching the data, one for managing identities because people use different usernames and different email addresses when they contribute to an open source project.

To resolve all of those issues, the Grimoire lab project has several tools.

Then the new service, the Cauldron.io service that Bitergia is developing, is a hosted solution for all of these tools that is nicely integrated and has a nice interface.

Where you can just go to Cauldron.io and you can type in whatever your GitHub organization is and it pulls in all the repositories and collects data.

We go to the git log to get the get history, and then to the GitHub API to also get information about the issues and pull requests.

You can immediately-- Not immediately, it takes some time to collect the data.

But within a couple of minutes you can start having some metrics there.

Brian: Very cool. Then this is community-sourced metrics too as well, so if we both look at the same project will those metrics already be there or do I have to leverage all my own keys to get that kick started?

Georg: That's the interesting part about the Cauldron platform, that we store the data and anonymize it, and then everyone shares the same data sets.

If you go and look at your project and I go a day later, the data is already there.

Then I just need to get the difference of what happened since the last looked at it.

That's a really cool feature, because you can get the data eventually if you have all the projects in it almost instantly.

Brian: That's awesome. That's awesome that all this is out there in the open.

I'm curious, this is all based around open source, so is this all open source? Is there some foundation as well around just community health?

Georg: That's a fair question, and maybe I'll untie a little bit all these different names and organizations involved here. The CHAOSS Project is hosted by the Linux Foundation.

Brian: Gotcha.

Georg: That is the umbrella where we all get together in this CHAOSS project.

And CHAOSS is short for community health analytics open source software, so the CHAOSS project has two parts to it.

One is the metric side, where we identify metrics and we talk about how they can be used and we specify how you collect the data and try to establish some standard, so that when you and I look at metrics and anyone else listening to this podcast looks at metrics, that we're talking about the same metrics and we can start comparing them.

That is the metric discussion in the CHAOSS project.

The other part is the software side, and the CHAOSS project has three active projects.

Initially we had a fourth one but it's been put in the attic, Auger is one of those projects and Grimoire lab is the other one.

Then we have Cricket, and Cricket is basically a git blame, but much more granular.

You can see who added this variable, or who edited this variable the last time.

It's not line-based based like git blame, but you can go down to the tokens. Then the fourth project would be Prospector, which was developed at RedHat.

The cool thing about that is you can look at a portfolio of projects and then have different weights for your metrics, and then you get a green red/yellow signal for how those projects are doing based on the parameters you define.

So we have at the top the Linux Foundation CHAOSS project , we have these three software projects, and different organizations like Bitergia and others are engaging in these software projects as well.

Brian: It sounds like there's a lot of people who are pushing this forward.

I'm curious, with all this effort, what are some limitations around getting community metrics and identifying healthed-up projects?

Georg: The limitations that we face are typically around ethics.

"Is it the thing to do for us to collect the data about what our contributors are doing?"

There have been some laws recently, like the GDPR and CCPA, that have really made us think about how we treat the data about what individual people are doing.

But the thing is that as long as you have a valid reason for looking at this data, the law is typically OK with you doing it as long as you let your contributors know that you're doing it, and that they have a way to opt out at some point or somehow.

Another challenge has been around this idea that the metrics are very specific to the projects we discussed earlier, where you cannot really compare just out of the box between projects, but you can.

You have to look at the metrics and the context of the project, so when you are looking at the metrics I cannot give you the five metrics that apply to all the projects and tell you, "Is this project healthy or not?" That's something that we've been asked for in the CHAOSS project time and time again. "Just give me the five metrics to look at,"and we just can't.

Our approach now is to say "We define the metrics. We can tell you how you collect the data and what you can look at, but then it's on you to tell the story of the community.

" You now have more objective measures that can back up your story or identify anomalies that you were not even aware of, because communities on those projects can grow so big that you don't know what's happening everywhere."

Brian: I might prefer that approach too, because when you talk about ethical and the possibility of gaming metrics.

I know the green squares on your GitHub profile can lead to bad behavior.

If I need to make sure there's a commit happening every time on this project every day, or whatever it is, or at least once a week.

That's not really necessarily a good metric to look at, which is why I appreciate projects like CHAOSS, it's a broader look at the health of the project.

It's not just commits, but it's like "How many issues are there?" Or "Are we talking about this on a forum, or are we talking about something on the third party Twitter?"

Or whatever it is, whatever is the value for that project itself.

I appreciate that it's not a strict guideline, because I think you could definitely push people towards one thing and then open source turns into not a fun place to be in.

Georg: That's something that from the very beginning people have talked about, is the idea of gaming metrics.

Once you start measuring something, you shape the behavior around what you're measuring.

It's super important to not focus on just one key metric, because it will become obsolete. People will start gaming that metric.

Brian: Are there any metrics around the quality of the open-source project?

Some projects, they have great contributing guides and then some projects have very lacking contributor guides.

Some projects have a Discord, but some projects have no way to communicate other than email or GitHub issues.

Is there a way to grade that as far as quality goes?

Georg: That's an interesting question.

The working group in the CHAOSS Project that deals with those kinds of questions is probably the Diversity Inclusion Working Group, where we are looking at "Does this project have a code of conduct?"

Because that is an indicator for, "Is this project taking a stand on saying 'We want to be welcoming, inclusive. We want to have an interaction that is enjoyable for all of us,' but then they also go into those things that you're talking about with other communication channels open and welcoming."

And "Are we excluding people because of that?" We are also looking at types of contributions, because well-rounded projects do not just have commits.

You also have event organizers, you have people doing bug triaging, you have quality assurance and testing, localization if you have multiple languages, you need documentation.

There's so many different things that need to happen in an open source project, so a good quality project makes sure that they pay attention to all of these different areas.

That's a type of contribution symmetric in the common working group, because from that one metric it can derive multiple metrics, and I'll just mention it because we just finished that last week.

Brian: Excellent. I'm curious, we've run the gamut about explaining CHAOSS and different types of metrics.

But I'm curious if there's a listener that's out there that wants to get involved, wants to get this applied to their project, or just wants to know more about what y'all folks are doing within this realm, where do they go?

Georg: The best place to find CHAOSS is CHAOSS.communities website, and we have a website page there , CHAOSS.communities/participate where we list out all of the meetings, and we are a very synchronous community with a lot of phone calls or Zoom calls where we talk about what we are working on, how we are working.

We have work sessions where we spend 40 minutes thinking through a metric and defining it, or you can just come and ask questions if you want to start your metric journey or you're stuck at some point.

We also have calls for the different software projects, where we have hands-on sessions on doing things w ith Grimoire lab and Auger.

That's the best place to find where the community is, and then we have a mailing list there.

On IRC, we're not so active on IRC but I'm monitoring that, and then I can point you always in the right direction.

Brian: Excellent. Cool, Georg.

I really appreciate you coming on and chatting about the CHAOSS project, I'm hoping that the listeners have gotten a lot of really awesome information that they can go and find in the show notes and click the links, and then discover how they can improve the quality of their projects.

But for now, I think I want to transition us to picks. These are things that we're jamming on, this is JAMstack radio so we jam a lot.

These are music picks, this is tech picks, this is all of the above. The stuff that keeps you going throughout the day. But if you don't mind I see you have picks, but if you don't mind I'll go first.

Georg: Of course, yeah. Go ahead.

Brian: My first pick is Gridsome. I've been doing a lot of work for a project at GitHub, and we're using a view static site generator which is called Gridsome.

They call it the modern status site generator, and it looks a lot Nux but it's quite different.

It feels a little bit more like Gatsby, but honestly I don't know.

I can't compare everything one to one, but if you're doing JavaScript and you wanted to ever try VUE, I think these static site generators are really good introductions to trying some of these other frameworks, like React and VUE.

I have loved it for the sense that I've never done any VUE that hasn't been small tinkering, like copying and pasting or massaging some existing code.

But I've been able to jump in directly into this project, because I didn't start the first commit in this project.

But as far as getting the project moving forward and feeling productive as a senior engineer, I had no issues.

I do recommend checking out Gridsome as a project to get started for some of these marketing apps or static sites.

My second pick is Netlify Analytics. I recently just got on board with this tool, and we've talked a lot about metrics.

Netlify Analytics, if you're using Netlify to deploy a site it's a one click install to make this work.

You don't have to put any tokens or cookies onto your site itself, something like Google Analytics, because Netlify is looking at this data on the edge.

On their CDN they're able to infer unique visitors and some other data, and also sourcing where your users are viewing from.

I've just started tinkering on that on one of my projects, and I found that super useful.

I like the fact that I don't have to migrate analytics tokens and put it in cookies and do all this weird stuff to my site just to find out who is using it.

With that being said, I have one last pick. Which is listeners of the podcast know I do like cooking, I do make quite a bit of bread.

I've recently just picked up a June oven. It's weird, I've seen the June oven on Instagram ads and Facebook and stuff like that for a while, and I finally pulled the trigger.

They came out with a new version which was cheaper than the original version.

It's still a pretty cold $600 bucks I think, or $500-600 bucks, but I use this thing every day. It's changed the way I approach cooking.

It's a smart oven and you're able to-- There's a camera inside of it so you can see the food.

I'm not sure how useful it is to be honest, I haven't really used it very often.

The camera itself, I used to the oven all the time.

But it also comes with a built in thermometer, so if you want to cook chicken or rare steak or whatever you're looking to cook, it has a built in thermometer that you just plug it in.

It's changed the way I approach cooking. It's also supposed to be dehydrating, which I haven't done yet, and slow cooking and roasting and.

Just check it out, it might be a super frivolous buy for some people but I tend to cook every meal at home so it's been super useful for my family.

I'm happy to share that jam pick. Georg, do you have some picks?

Georg: I do, I just want to ask about the oven. When you say it has a built in thermometer?

Brian: Yes.

Georg: So for the chicken, do you actually insert that into the chicken? Or is this like--?

Brian: Yeah. It plugs into the side of the oven so you can take it out to wash it, but it plugs it in and you put it inside the chicken and then--

I didn't know, but chicken has to be 165 five degrees or higher to make sure it cooks out all the bacteria, is really what it is if you want to be specific.

But what I'm getting at is the best part about that is that if you have a chicken breast and you plug in the thermometer and it's instantly a juicy piece of chicken breast that's safe to eat.

I usually obliterate chicken and cook it way too long and then it's super dry, but I've had only great results from using a thermometer. Who'd have thought?

Georg: We cook a lot at home, so maybe I should look into that.

Then Netlify Analytics is another good pointer, because we use Netlify. We just switched to using Netlify.

My pick is Riot .im, it's an instant messenger and it's built on the matrix platform.

In the CHAOSS project I've never been able to monitor our IRC channel, and now with Riot I get the messages even when I'm offline, and I have the same software on my computer as on my phone.

It's helped me to stay on top of what's happening in the IRC channel for the CHAOSS project, and it's been a game changer for me.

Brian: Awesome. Thanks for sharing, I will definitely check it out.

One of the things at CHAOSS, I don't know how much we touched on that, but being able to introspect all different platforms seems super useful, and I guess you're already doing that with this other tool.

Georg: Exactly. The Grimoire lab project has 30 different data sources, and one of the data sources IRC and Slack and other platforms, so that you get the holistic view of the community.

Brian: Cool. Georg, thanks again for coming on to chat about the CHAOSS project and telling us about all these cool tools involved within that space.

Listeners, keep spreading the jam.

Subscribe to Heavybit Updates

You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.

Content from the Library

Visit library

Feb 20, 2025

Podcast

How It's Tested Ep. #16, Test-Driven Development Demystified with Jon Jagger

In episode 16 of How It’s Tested, Eden speaks with Jon Jagger, Director of Software at Kosli. The conversation dives into Jon's...

Jan 23, 2025

Podcast

Generationship Ep. #28, Collective Intelligence with Emily Mackevicius

In episode 28 of Generationship, Rachel Chalmers speaks with Emily Mackevicius about intelligence in all its forms—from songbird...

Jan 14, 2025

Podcast

The Kubelist Podcast Ep. #45, Live from KubeCon 2024

In this special episode of The Kubelist Podcast, recorded live at KubeCon 2024 in Salt Lake City, hosts Marc Campbell and Benjie...