MAY 28, 2025

38 MIN

Ep. #82, Automating Developer Toil with Morgante Pell of Grit

GuestsMorgante Pell

light mode

about the episode

In episode 82 of o11ycast, Ken and Jess chat with Morgante Pell, the visionary behind Grit, an AI-powered agent designed to automate developer toil and technical debt. The discussion covers the evolution of AI in coding, the challenges of building and deploying AI agents, the future of combining code awareness with production awareness, and Grit’s acquisition by Honeycomb.

about the guests

Morgante Pell is the founder of Grit, an AI-powered agent designed to automate technical debt, code migrations, and upgrades for developers. Grit was recently acquired by Honeycomb, where Morgante now contributes to R&D, combining code awareness with production insights. He began his journey in AI after struggling with backwards-incompatible API changes at Google Cloud.

show notes

about the episode

about the guests

show notes

transcript

Morgante Pell: So at the time I was working at Google, working on Google Cloud's various API services and Terraform providers and all the time was struggling with the fact that we kept breaking our APIs, right?

We made lots of backwards incompatible changes and customers really hated this, right? They hated having to go through the work of upgrading.

I'm sure many people listening have had the painful experience of seeing Google ship a new API upgrade and asking you to make that change.

And I wanted to find solutions for not making developers do this work and saw what was happening with AI, what was happening with large language models, and thought this is really a great opportunity to take away this developer toil and replace it with something computers can do really well.

Which is the inspiration for originally starting Grit back in 2022.

Jessica "Jess" Kerr: 2022. So was that right after ChatGPT came out to the public?

Morgante: Right before actually. Started Grit about nine months before ChatGPT launched.

Jess: Oh, oh nice. Okay, So you didn't have the same level of tooling we suddenly do when you started trying to automate this toil away?

Morgante: We had most of the tooling actually because, you know, OpenAI was originally just an API company, right?

They actually had some of the early versions, at the time it was DaVinci was the codename for their models.

And then we were using those as some of our very early models that we were building were on top of, as well as some various open source language models.

Like the language models weren't first available in with ChatGPT, we were just building them on the API.

Jess: Okay, so you had them. We didn't know about them as the general public?

Morgante: Yes, we were building relatively early and then saw this massive explosion in interest after we first started working on it.

Jess: Cool. So I know the models have gotten a lot better at coding in the last few years.

How did it work then? Were you able to usefully upgrade code bases?

Morgante: We could do very targeted changes, but they made a lot of mistakes.

Sometimes we can talk now about AI being like a junior developer. At the time AI was like maybe a kindergartner who had read like one textbook on programming. So you had to really guide it through the changes. It was very, very basic and made lots of really silly mistakes.

Jess: How did you deal with that? How did you constrain it?

Morgante: So we did a lot of work on error recovery. So making a change and then verifying that change is correct.

The nice thing about working with programming languages is we have lots of great static analysis tools that can tell you often if you made a mistake, right? If you had a little typo or you forgot to declare a variable and these are the kind of mistakes that these early AI models would make a lot.

We just feed those to a static analyzer, figure out what was went wrong, ask the model to self correct itself and improve from there. So we had to build a lot of guardrails early on to get something usable out of these models.

Ken Rimple: And you're building this company, you're starting this company with a goal of having, at that time what was your initial goal for what you were going to launch with?

Because I know that can grow and change as you're starting down the path.

Morgante: Yeah, the original mission was to destroy technical debt.

So we thought every code base should be completely modern, completely upgraded. You should never have to spend time dealing with stuff that was basically archaic and on old versions. So we thought we should be able to, with AI to build something's going to do that completely automatically for you and everyone going to focus on the fun feature stuff.

Jess: Now that we have great agents, is it easy?

Morgante: It's a lot easier than it was for sure. I'd say it's still is not easy. Like we still don't have AGI, right?

We still have lots of stuff that agents don't know about and can't do completely autonomously. But for sure over the last three years it's improved massively.

Like stuff that I frankly never thought was going to be possible is now almost easy.

Ken: What are some of those things that are easier now than you thought they would be?

Morgante: It's really good at actually exploring code bases autonomously.

So going and searching, going through web resources, reading documentation, like you can build an agent today that you just throw it at some docs, ask it to incorporate, you know, a new SDK into your code base and it's going to do a pretty decent job of that.

And at the time I thought we'd have to do a lot of special guardrails to get that to work well and certainly at the time we did, but we've taken those guardrails off over time.

Jess: So you built an agent that was specialized at doing these upgrades?

Morgante: Yep, doing upgrades, code migrations, basically all of the backlog tasks that are not really core feature development.

Jess: How did that work as a business?

Morgante: Yeah, we went through a lot of iterations of the business.

You know, we at some point we were basically just doing professional services of saying, all right, you've got some big React upgrade you need to do, your engineers don't want to spend time on it, you can just hire Grit to come in and do that upgrade for you.

And basically the same way you hire a consulting company, you hire an agent to do that, you know you get added to your GitHub repo and take care of it.

That worked well for a while but you know, honestly I was never really excited about starting a professional services company. I didn't want to do consulting my whole life.

So we also worked on a model where you could just, you know, have the Grit agent deployed as part of your team and just toss a task, right?

Give it a fixed number of tasks to work on and it would continuously work on them.

And that's where we landed before the acquisition was they would just be working on, you know, up to five tasks at a ton for example.

Jess: Okay, so what did people find it really useful to have your agent do?

Morgante: Migrations was definitely the core use case.

Upgrades to some extent though I think upgrades are not as necessary in some code bases, but the people were doing a lot of like internal changes.

They were trying to do style guide conformance. People were trying to roll out new component libraries.

You know, I got really into the front end world even as though I originally had an infrastructure background.

We found that people really wanted to use it for JavaScript and dealing with all of the changes they're making to design systems over time. Some of them had different use cases for Grit.

Ken: So one of those examples that I could think of that I would've loved to have had Grit for was a development team that had built, before the pandemic, they had built an entire front end in React, picked a component library, and right after they picked a component library, that library went into the sunset.

Jess: Wah, wah.

Ken: Yeah, exactly. So now I'll spin forward to 2022 at the time I believe it was, and we get this thing, start looking at it and all of a sudden we have to completely shift or implement a little, 'cause they wanted a tool tip and that didn't exist in that particular library.

But the library was based on the standard design system since had moved forward and had the tool tip. So being able to move off of one component library to a similar one, it sounds like a hard challenge to say pick up and move from this component library to that component library.

Is that asking too much you think? That's probably harder than saying, you know, upgrade to version eight of this library for example, right?

Morgante: It was definitely something I would say you couldn't do in 2022.

Today, I think it's quite possible 'cause we now have models that are multimodal.

So not only can they go make the code changes, they can go look at screenshots of your application and tell you there's a visual diff here, I haven't done the transfer quite right, I need to go to just the CSS, right?

So you really, that's something that's a task that's very well specified, right? There's no real ambiguity over what you're trying to do.

You want to maintain visual fidelity but use a different set of React components. You could definitely have an agent that goes and spins on that for a while.

It's not going to get it done in five minutes, but give it five hours and it'll come back with something pretty high quality usually.

Ken: So, okay, so you tell the agent to go work on something, it takes its time, it comes back and says I got it.

What is the interaction then with the developer? Like how do you then go back and forth with it?

Is it a chat-based interaction until it gets it right or is there a point where it hands off to the developers to kind of finish it?

Like what do you see as the common usage patterns for people using this?

Morgante: So we tried to make it very much like Grit was a collaborator on the repository.

So the way we work with collaborators usually is pull requests. So Grit would, when it's done, it would open up pull requests. You could go and review that pull request, give feedback.

Grit would action that feedback, push new commits. Reply to your comments and interact that way.

Of course because it's Git you can always just check out the branch yourself and make commits yourself to, if it, you know, not getting things quite right and you don't want to go back and forth even more, you can also do that.

But it ended up being a pretty good interaction model.

Jess: Okay, so like GitHub agent is doing now?

Morgante: Yeah, yeah, I think their model is a little more tailored like that you have an actual like workspace that you go in and chat with.

Grit actually never had a chat interface, you just would specify what you want upfront and get a pull request back.

Jess: Okay, where did OpenTelemetry come into this story?

Morgante: Well, originally OpenTelemetry was just something we were using internally to monitor the agent.

So at some point had to do an OpenTelemetry integration to make sure the agent could be easily monitored, 'cause of obviously some very long spans and very big traces.

You can talk about like the single agent request is five hours, right? So we want to be able to see everything that's part of that.

Jess: Wow, that's some hard work.

Ken: It's like a session of interactions, right?

Morgante: Yeah, and it's very important that those are all grouped together, right? Because you're not just...

Like if it fails, you know, four hours down way in a call chain, often it's because it's some early mistake that was made, right?

And being able to debug those mistakes and understand them is really key to making a good AI product. So that was the original impetus for learning more about OpenTelemetry.

And then we had a handful of customers that were interested in one of the migrations being moving between observability vendors.

They'd used a older observability vendor that they wanted to migrate off of and they were interested in having Grit do that work for them.

Jess: Okay, so migrate from an existing observability tool to OpenTelemetry so that you can use any vendor.

Morgante: Exactly, right. Get off the proprietary SDKs.

Jess: I know somehow this story comes back to Honeycomb. How did that happen?

Morgante: Yeah, so we continue the conversations in the, originally we were mostly talking to Philip about some of the semantic conventions for LLMs particular because you know, we were adopting OpenTelemetry, we were interested in what some of the conventions should be.

And then eventually Philip reached out 'cause we were interested in helping people migrate to Honeycomb to OpenTelemetry. And he was curious if that was a use case that we'd seen before.

And it had been. So we had some conversations and we're originally talking about a partnership and then escalated from there to, you know, why don't we just work with Honeycomb?

Jess: Okay, so now there's the Grit agent, which can be added to a repository and it can help you with a migration. Does it become specialized on OpenTelemetry migrations?

Morgante:

That's the plan is to make it so it's the best OpenTelemetry migration expert in the world. Because you know, we've got a great team at Honeycomb that really deeply understands OpenTelemetry. They invented modern observability and can really go into a lot of the details there. We want to take all that knowledge and view it into the Grit agent so that you can get this really great migration done for you really quickly.

Jess: Nice, nice. And then you also have access to that, that pool of people who really understand OpenTelemetry to help you along with that migration?

Morgante: Yeah, exactly.

Jess: Did you find that it helped to have someone who knew the agent well, working with it?

Morgante: Yeah, certainly we had to provide a fair amount of expertise to customers and like, here's how you can prompt the agent well, you have to be more specific about things, you should mention particular details.

So there is certainly some guidance that everyone needs when they're first using AI that we had to provide.

Jess: Oh, tell me more about the things people need to know when they're first using AI. That sounds useful.

Morgante: Yeah, it is useful.

I think a lot of the advice is similar to what you might tell like a first time manager of like, you know, people are not mind readers.

You need to communicate expectations clearly. You need to give them full specifications for things, you need to give them corrective feedback.

Some of the things that are specific about AI is, you know, make sure if you wanted to look at a file you should mention that file very explicitly like what the file does.

If you want it to follow a particular convention, don't let just figure out the convention. You have to include that in the prompt.

Give it guidelines for how it should know it's been successful. That's one of the biggest things is sometimes the agent will do too much and you'll find some of the newer models, they're out.

They're really smart and really hyperactive and interested in doing as much as they can. So you'll come and find back, not only have they upgraded your React app, but they've also added internationalization or something else along the way, right?

Ken: You're absolutely right about this. Yes.

Morgante: Yeah. So you kind of have to tell them, don't do, you know, once you've done this, you're done. Don't go any further.

Jess: Oh, when you're done. That's always a good thing to know as a programmer.

Morgante: Yeah, the definition of done, right? That's a classic thing that PMs like to include and certainly you should give that to your AI developer too.

Ken: Or it goes on side quests constantly, depending on the tool.

Jess: So tell us about the process of developing and improving Grit. What does it look like to observe an agent in process?

Morgante:

It's looking at a lot of data, right? That's one of the biggest things that I think all the great ML researchers and all developers do is they just look at their data a lot, right? If you're not actually looking at what's going into the LLM and looking at what comes out, you're not going to know your system well and you're not going to make improvements, right?

And it's not like sort of traditional software engineering where maybe you can just write some unit tests and as long as it passes those unit tests, your code is be fine in production. You very much are testing things in prod all the time with agents.

Jess: Is that looking at it like reading it or looking at it with analysis?

Morgante: Both. You certainly want some summary analysis when look at, you know, large scale data and Grit we had dashboards to understand, you know, what's the success rate on agent trajectories, how many pull requests are being merged? That sort of information.

But a lot of cases it is actually just going and looking, you know, span by span, to understand what's going into the LLM, what's going out 'cause very frequently you would find, oh they, the LLM went off the rails and started making some mistakes and it turns out, oh, we just introduced some completely incorrect context and it was just going off that context and that was a bug on our end that we could have prevented.

And you don't really find these things unless you're actually looking at the exact data going in and out.

Ken: So if your dashboard shows that you've got like a huge number of files modified, you might want to go back and find the trace or whatever that says here's what went in.

Oh, look at this. Like it was a very vague request and so it went on its giant side quest and maybe we want to make sure we don't let them, I don't know if you'd like modify their answers.

Like if you submit something in a prompt, do you sometimes come back to the engineer and say, can I get a little more context around this?

Like how do you prevent it from going on these crazy quests?

Morgante: Yeah, so sometimes Grit will reject our request and say, you know, this needs more detail and that we had to build a fair amount of engineering into making sure that wasn't too hyperactive or wasn't constantly rejecting requests. So building that filter there.

And then a lot of it's also cases where the prompt was okay, it wasn't a great prompt but it was you know, sufficient.

Like certainly Grit should have been successful but we just didn't have the right information in our engine to get the right context.

And just going back and doing some engineering work to make sure we're introducing that like maybe you mentioned a file, but it happened to be another repository, right?

So maybe they were looking to a GitHub URL and not GitHub URL, if you just go to it just through HTTP, it's going to give you a 404, right?

So we had to do the work to translate that from the just naive GitHub URL to using the API that Grit's already authenticated to go and retrieve the content thereof ingesting a 404 error into the agent context, which of course is completely useless, we inject the actual content that people are trying to reference.

Jess: So it's important to be able to trace that full chain of input.

Morgante: Yeah, exactly right.

If you just looked at the final output, it might just say, oh, I can't figure out how to do this, you know, you haven't provided sufficient information.

But if it turns out, you know, they had provided sufficient information, we have to go into that trace and find where we actually had that HTTP call was, that was met with a 404 error to understand what actually happened.

Ken: So how about the LLMs you're working with?

Are you building and training your own LLMs or are you interacting with things like Claud and other back ends? What do you find that you're using for your analysis?

Morgante: So, we primarily use third-party LLMs. As a relatively small startup with about 7 million in funding, it's very difficult to train a true foundation model on that budget.

You know, these large foundation models have much larger compute budgets than that. So we're mostly using models from OpenAI, from Anthropic, from Google.

And then occasionally you fine tune some of our own models for very specialized tasks like applying a diff or doing certain code search tasks. We could take an open source model and fine tune them to important things.

Jess: Do you use a variety of models in the agents processing?

Morgante: Yep, yeah, we use a mix of models and different tasks we found have different viability with different models and there's always a trade-off between cost, latency and intelligence.

So we always want to choose for a particular task, what's the right Pareto optimal choice there.

Jess: How do you do that?

Morgante: A lot of testing. So looking at, you know, for this particular task we've got a set of curated evals for what's worked in the past and what the expected result is and we are scoring them and then we can say, all right, let's throw the best model we have at that task, right?

And of course should in most cases do best do really well at the evals. So let's throw a really large model at that task and assuming that goes well, let's try going back to a cheaper model.

So let's use a, you know, much cheaper model and see if that is still able to achieve the same level of success and then, you know, or is it sufficiently high enough that we're comfortable using that smaller cheaper model for it.

Jess: So you kind of build up a corpus of representative test cases?

Morgante: Yes, yes, you need lots of test cases to be effective at agent development.

Jess: So even when you're testing, it's only like after prod?

Morgante: In most cases, yes.

At Grit we built a system where you could take prod data and basically pull it back into sort of evals that we can deploy onto our pre-prod cluster where you can just run things through there, so we're not deploying it throughout to users yet, but it's basically similar to, we're still looking at prod data, we're not just looking at unit tests.

Jess: Right, so it's not just I reason about this and I came up with all the test cases that matter. Your users have to tell you what test cases matter.

Morgante: Yeah, so users have to, we have to do a lot of curation of the data that's been going through the system already and yeah, it's very much where you need to test and prod to be successful.

Ken: What was that like when you first started doing this and realized, wow, we had no idea these kinds of things would come up and we have to completely change the way we think about the questions.

Morgante: A lot of surprises along the way. I think some of the stuff that was challenging was the difference between open source work and large company enterprise repositories.

Right, so of course when we were first building the agent, we didn't have any customers so we started working on these open source repositories 'cause we can go in there and find, you know, some open source project that has done a React upgrade and see how well would Grit do on that particular upgrade.

And got pretty good at that on even looking at open source repos. But it turns out once you deploy this new company repo, it's a lot harder. They're just mostly much larger.

Most companies have much larger code bases than the average open source SDK or library you are using and have a lot more cruft built up over time from different design decisions made over time and interconnecting systems.

Jess: Oh right, because there is no reaching the target architecture. There is only moving toward it until it shifts every three years as the CEO changes.

Morgante: Yes.

And there's customers clamoring for things and in most cases I think open source maintainers are a little more willing to say no to features than a company is who's about to close a big enterprise sale. Who's going to probably ship that feature if they can do it.

Jess: That's a significant difference.

Morgante: Yeah.

Ken: Yeah, I guess the other thing is in large corporations there has been, and maybe it's changing somewhat, but there's always that architecture team that approves certain versions of things you can't go past that you can or can't use open source in various projects, right?

Have to use this targeted set of libraries only because we've approved them and vetted them and that's it. And you can't go off and do your own thing.

How does that affect, you know, the rules? I guess it's really how specific you make your requests, right? More than anything else.

Morgante: Yeah, how specific can you make requests and then also what sort of style and linting rules do you have?

So that's one of the things that helps with a lot of these guidelines are actually codified into analysis that can run automatically. So Grit would be able to do that really well.

I'd say one of the hardest things is customers that we dealt with where they had some new guidelines but their actual testing process was still entirely manual.

So if a developer makes a change, it's throw up the PR, deploy it to a staging environment and wait a few days for a team of manual QA testers to come back and tell you what you broke and had basically zero automated testing.

And that was a really big struggle to get Grit working with because our initial approach was assuming that we'd always have some sort of automated feedback that we could run many times versus having to wait for a human to come back and give you all the feedback.

Jess: Yeah, an expensive human.

Morgante: An expensive human, right? You know, in some of these cases they actually had as a pretty low cost approach. They were doing it overseas and stuff so it wasn't super expensive.

They were fine with us like putting up a lot of PRs, they weren't going to push back on that.

But it still was really slow. I would say that was actually the biggest problem is like it would be Grit would make a PR and it's like only another week before it actually gets feedback on that PR.

Jess: Right, so there's a level of care that a human puts into that. And also humans usually, we try to test it ourselves, which the agents can do decently at that now, they can spin up the web app and like actually go click on things and look at it.

Morgante: They can do that. They can do that well if there is a spec for what the app is supposed to do and how it's supposed to operate.

I think the challenge that we had with some of these more manual organizations is there's often a reason they haven't adopted automated testing yet is 'cause they don't really have a clear specification of exactly what should be done on every screen.

So it's not something that we can really feed into the agents, right?

Ken: From even too fast to write any kind of tests. What are you crazy?

Jess: So, we dunno what it's supposed to do, so how could we write tests for it?

Morgante: Yeah, or it's tribal knowledge that, you know, we've got these QA people who have been here for years and they kind of know how it's supposed to operate but it's basically on feel of they've used the application long enough that they have a sense of it versus the agent trying to figure out everything from first principles is not going to work as well.

Ken: Conway in action, yeah. I guess it really is like there's the way systems are organized and the way people are organized.

I'm thinking like there's certain types of things that you change your organization to fit the pattern and then there are other organizations will not change.

You have to fit the pattern to the organization that, I dunno if I even have a point here, but it's just like, because I know there are things that can be intractable in some organizations where you just can't get past some very manual steps that are completely variable.

Morgante: Yeah.

Ken: So I'm sure that poses lots of challenges.

Morgante: Lots of challenges and all this technology is sociotechnical.

But certainly for agents that are trying to be deployed and collaborate with engineers and collaborate with development teams, a lot of the organizational social dynamics come into play as well.

Jess: Oh yeah. Like so Grit was out before all the agents that we have today.

How did like the engineers in the organization react to some of your early customers?

Morgante: Mix of reactions. You know, we were always bought by engineers.

Like there we had champions who were excited about using Grit usually 'cause they knew they had a lot of technical debt that they just wanted to get burned down and they didn't want to have to make the case to management for why they should pause feature development for X amount of time to do that.

So they said, you know, let's just add this agent. It'll work in the background on that.

So certainly there's a lot of excitement about what Grit could do, but certainly these are our skepticism about how Grit would do it, right?

Whether it's going to be making lots of bad changes, is it going to follow their guidelines and conventions.

Generally speaking, my experience has been engineers hold AI to a higher standard than fellow engineers.

Jess: Ah.

Morgante: There's a lot of cases where Grit makes a small mistake that if a junior developer was making that PR you kind of just overlook it and just tell them, you know, fix this next time.

But they'll take that as emblematic that Grit is not going to work well in that particular case.

So we had to overcome some of those objections and make sure that Grit was always performing at a really high level.

And the biggest thing we did is try to guide people into doing things that Grit we knew would be successful at versus them having high expectations that we knew Grit couldn't do.

Jess: Ah, that's interesting. People hold the AI to a higher standards. I mean, I'm not going to hurt its feelings, so I'm much more willing to tell it to knock it off if it's doing something I don't like.

Morgante: Yeah.

If you're working with a human, there's a presumption of intelligence, right? You know that they're at least trying, they probably know something about what they're doing. They were hired at your company. You presumably are somewhat confident in your interviewing process. The AI, you have no real idea where it came from. You have no idea if it's actually smart enough to do the task.

So you might use any evidence as a bit of confirmation of your preposition about what its level of ability is.

Jess: So we don't give it the benefit of the doubt?

Morgante: No benefit for the doubt. Exactly.

Jess: Ah.

Morgante: And the really hard thing with this is we found that if an engineer has a particularly bad experience with an AI, they'll basically write off AI for another three to six months before they're even willing to try it.

And that'll be, you know, six months maybe someone can convince them to try it again. You can talk to engineers who the last time they actually used any AI tool for coding was like when GitHub Copilot first came out many years ago and it had some bad completions and they said, ah, this doesn't work.

They turned it off and they've never looked at it since. Or they're only starting to look at it again now.

And then obviously AI's been improving much faster than that. So sometimes these people will come back and be astonished by how much better it is and they're like, "Ow, I should have been using this for the last two years. It's so much better than it was when I first tried it."

Jess: So there's a bit of an early adopter backfireing happening there of of just because you tried it and it sucked, doesn't mean it sucks now.

Morgante: Yeah, exactly. And you know, most software is pretty bad in its 0.1 version, right? You have to understand that and be willing to try the future versions.

Ken: There's that whole belief system you have in certain things. I remember people who absolutely loved Ruby on Rails and they did it and the groundswell of support was because of the love of it, right?

I mean people were enthusiastic and it was magical, right? And I think that early AI did not leave you with that feeling, it left you with "What the hell am I doing here?"

But it has gotten better to the point where now, you know, I am working with Cursor on this latest talk project I'm working on.

And even though I may feel that some of it is like in a boxing ring and it's like the ninth, you know, whatever it is and I'm staggering towards the finish line, I'm going back for more because I'm going back for, "Hey, can you try this one thing for me? Can you do this experiment with me?"

And I'm getting results. And those results are enough in frequency and useful enough that it has value.

Whereas I would've been looking through a billion things, reremembering things I had forgotten about Spring Boot about various queries like a JPA query or whatever.

I know what I need to get done and I'm asking you to do it for me. And I've had the experience to know when it's going off the rails, but I don't have to do all that extra legwork that I used to have to do that might take me a week to do.

Within a day I can get a lot of it built up and stood up for me. And yeah, I went back and forth like eight or nine rounds, but those nine rounds took a day. They didn't take a week or two weeks or three weeks.

Morgante: Yeah, I mean it's certainly been a massive accelerator in my development work.

And even if I hadn't been working in an AI company, I would've been adopting this very quickly.

I think you hinged on something important there also, like the reason that developers were maybe more enthusiastic about Ruby on Rails, the early versions and now about early AI is Ruby on Rails is open source.

It's very hackable, right? If you see something that's a flaw or says it's not working quite right, you can go reaching the source code and fix that.

Ken: And they did.

Morgante: And they did, right? That's the entire rise of open source and how, so the open standards came out whereas, you know, especially early AI, it's not like it's a very black box.

It's like, "Here's the completions, here's what it's doing for you. If that's not what you want, good luck."

And one of the things we did with Grit was try to change that, try to, we realized that especially the senior developers who are main evangelists for technology within the organizations, they really want that ability to customize and modify and improve.

And we try to build tools that gave them that power to go into do things like define standard guidelines for how the agents should operate, give it tools that it can use to operate on the repository.

So if you want to have it be able to like run a particular command or search for files in a particular way, make that all hackable because developers love that and it also makes the agent experience better.

Jess: Oh yeah, true. There's something immensely more appealing about using an agent now that I can implement an MCP server to make it able to do whatever the heck it is I think it should be doing.

Morgante: Exactly.

And I think that's one of the reasons we're seeing a huge amount of interest in MCP. It's just, it's finally a way for developers to go back to that open source ethos of hacking and building up something quickly that's going to improve the tools that you use every day, right?

If I'm using Cursor to write my code and I want Cursor to do something new in the Honeycomb repo, I love being able to build an MCP server that's going to be able to do that for me versus being stuck until Anthropic releases a new model.

Jess: Yeah, I think you're right. Maybe that's why I'm suddenly excited about it and maybe like the killer app for developers is being able to write our own app.

Morgante: For sure.

I talk to a lot of dev tool founders and I always say if you don't give developers somewhere to write some code with what you're building, you're probably DOA because developers want to write code, they want to hack things, they want to add things. Let them.

Ken: Yeah, the zero code concept. I know when that started proliferating everywhere, I thought this is not the answer.

I mean there are some nice things you can find out from agents that can let you know what to do, but you need agency on your own to make things happen and to be creative and to have degrees of freedom.

Jess: That's a good point that these days use agents but share agency with them.

And part of that is either download and run or write the tools that you need them to have. And another part is take control of the prompts.

If they're not doing what you want, fricking tell them what you want them to do. And the other part is, "Well, that wasn't it. Delete it."

Ken: Start again.

Jess: Let it go.

Morgante: Roll back and retry.

Jess: Yeah.

Ken: There's another angle to this too, which is, and I think about like craftsmanship, right? Like I think about people who are building.

So years ago there was someone in our family who would restore Rolls Royces and things like that and there were no tools to do what they needed to do, so they built their own tools and they got it done better.

And the coolest thing about this community, the tech community and open source especially, is that we don't see a tool, we go build one, right?

So that's wonderful and that's always been what's been powering a lot of these innovations.

But then on top of that, now we have this child that we're dealing with and communicating with all the time that says, "I'm going to do this for you. I'm going to go."

And it goes and does it and it comes back and it kind of does it and it's, we have to teach it and we have to almost shepherd it and mentor it almost like we're shepherding and mentoring someone who's a journeyman learning how to be a craftsperson.

And it's a weird kind of meta thing that you're telling the AI, "Hey, you're doing this wrong, what you should do is X when you're doing Y."

It really is going to be a challenge I think, and it's a good challenge to have, of how we capture that knowledge and help instruct our tools to be better.

I know it's a weird thing but it's like we need to do that and it seems like people who are getting better at these agent type tools are getting really good at telling the agents how to behave, right?

Morgante: Yeah, or if you are really good at mentoring an agent basically, it's helping it to learn and helping it to grow.

Yeah, It's certainly not where I thought my skills and expertise would go, but it's certainly helped a lot too.

Jess: Nice. As a closing question, I want to ask you now that Grit is part of Honeycomb and you are part of Honeycomb R&D, what are you excited about?

Morgante: So I'm really excited about what we can do combining Grit's great code-based awareness with Honeycomb's great production awareness, right?

I was talking earlier about how Grit we had to test in prod. I think a lot more people should be looking at their prod data when they're developing their applications to understand how users are behaving, what failures you have, all this great production data that Grit didn't really have access to.

Grit had access to code base, so we could make some changes but throw it over the wall. But we didn't really know how that code is actually operating.

Now that it's part of Honeycomb, we have that full end-to-end life cycle. We understand the code base really well, Honeycomb understands production really well.

What can we build when you combine thosetwo together?

And I think there's a lot of opportunities and beyond just the LLM migrations to do things like looking at traces and maybe proactively fix errors and go and look and do some performance improvements.

There's lot things that are ripe for that improvement.

Ken: That's exciting.

Jess: So maybe you can go along fixing the tech debt that matters and also show why it matters.

Morgante: Yeah, we can show here's some spans that you had. Here's a phrase that was way slower than it needed to be and we can go and just issue a PR from Grit that fixes that and then show you the actual drop in latency that came from that.

Ken: Oh, and it's kind of like making new developers on call. You're making Grit on call.

Morgante: Yeah, exactly.

Ken: Right? Grit's on call. I just did something, sent it off. It's a new thing I built for you, says Grit. Or a new change I've done for you. I'm going to monitor and make sure nothing broke.

Morgante: Exactly. And like Grit, previously when we did those PRs, we, you know, being the Grit agent didn't know what happened after it was deployed, right? We just hope the code was right.

But now if we can go and look at the Honeycomb data and see how it happened, we can, you would go and see if we didn't do something quite right.

The Grit agent can even repair itself.

Ken: That's pretty incredible.

Jess: Nice. Can the agent after one run, like leave notes for its future self?

Morgante: Yeah, so one of the big things we developed was a knowledge system for the agent to learn both from like mistakes it made along the way, feedback it got from human developers on the pull requests, you know, directions it was given and retain that knowledge for a future pull request that was generating.

Jess: Sweet.

Morgante: Yeah, that's it. Certainly I think would've been non-viable as a product if we hadn't built that 'cause you know, you can direct the agent once and if it doesn't learn and you keep having to correct it at the same time every time on the same things, you're definitely going to get frustrated and turn off of it.

Jess: True. Great. Morgante, if people want to hear more about you, about Grit, about agents, where can they go?

Morgante: So go to morgante.net to follow me on all social media and I'm sure I'll also be blogging on the Honeycomb blog going forward.

Jess: Great. Thank you so much for joining us today.

Morgante: Thanks for having me.

Content from the Library

Visit library

Jul 10, 2025

Podcast

Generationship Ep. #39, Simon Willison: I Coined Prompt Injection

In episode 39 of Generationship, Rachel speaks with Simon Willison, founder of Datasette and co-creator of Django. Simon...

Jul 1, 2025

Podcast

Platform Builders Ep. #9, Manny Medina Explains: What the F Is an AI Agent?

In episode 9 of Platform Builders, Christine and Isaac sit down with Manny Medina, founder and CEO of Paid, to unpack the...

Jun 26, 2025

Podcast

Generationship Ep. #38, Wayfinder with Heidi Waterhouse

In episode 38 of Generationship, Rachel Chalmers sits down with Heidi Waterhouse, co-author of "Progressive Delivery." They...