Scaling Engineering Management w/ Twilio, LaunchDarkly, Atlassian, Heroku Heavybit
The strongest and most resilient engineering organizations are those that simultaneously manage technical debt in a sustainable way, encourage a culture of documentation and individual empowerment, and incentivize leaving things better than they were when you found them.
In this special fireside chat, John Kodumal, Co-founder and CTO at LaunchDarkly, and Jesper Joergensen, VP Platform at Twilio, explore the realities of today’s complex software systems and the teams behind them, while sharing their own approaches to building resilient and sustainable engineering cultures.
John Kodumal: I’m John and I’m the CTO and one of the co-founders of LaunchDarkly, which is a 300-person company based out of Oakland and now a little bit more distributed than we had been previously.
Before that, I was at Atlassian for about six years. I joined when they were about 150 people and left at about 2,500 people. I lived through that stage of growth at Atlassian. And then prior to that, I did a PhD in programming languages and type systems and stuff at Berkeley. Jesper?
Jesper Joergensen: My name is Jesper Joergensen. I’m at Twilio and I’ve been here for three and a half years. I’ve seen Twilio go through growth from about 1,000 people when I joined in 2018 to about 5,000 people right now. So that by itself has been a wild ride.
I spent almost 10 years at Salesforce andmost of those years I spent at Heroku. And before that I was at BEA Systems. If anybody remembers them, it was a Java application server back in the 2000s. So my entire career has been very much around developers, developed platforms with pretty strong focus on enterprise developers and that boundary between startup developers and enterprise developers, especially at Heroku.
Steady State Teams
John: So I’ll kick it off with a topic of conversation that Jesper and I were talking about yesterday. Jesper was mentioning yesterday in our conversation that something that was top of mind for him was, this evolution into, something that we were calling, ‘steady state’ or the transition from a team that’s operating like ‘ignite the rocket engine’ versus ‘flying continuously.’
I was hoping you could talk a little bit about that. What are you seeing and how does it impact the team?
Jesper: It was something that you and I had a good conversation about yesterday. One of the observations I’ve had over the years has been that especially from, let’s say like 2008 and forward, all startups really embraced this ‘be an owner’ mentality. You own it, you operate it. So you go in, you set up a startup, you start with like one team, then you add few more teams. You follow this mentality of “we’re going to write code, we’re going to own the code and we’re going to operate the code and we’re going to move really fast.”
The problem is that at some point, if you become successful… If you don’t become successful, it’s great because you don’t have to worry.
John: It doesn’t matter.
Jesper: One way you can talk about this ‘steady state’ is really when you get customers and those customers start using your services. In many cases, the moment that happens, it’s already written for the future that your service will be around for longer than any one of the engineers that are working on it at that point in time. The thing is that was actually not how you started.
You started with engineers that are living and breathing the code and knowing every nook and cranny of the code. You have documentation and maintainability, and sustainability, and trainability, or second thoughts as they should be.
The thought that I had that was, how do you ensure that you could transition at the right point? Because if you don’t, then debt will just start accruing from that point going forward. You’ll lose your first engineer. You have to train your first new engineer. Maybe that’ll be fine. It’s just one person that has to learn all this stuff, but that will soon escalate. And the harder it is for you to sustain a flow of people through your team, with a single service, the more you’ll start grinding to a halt and having issues.
Then that gets exacerbated by microservices. Now you don’t just have one app that you’re projecting to your customers. You have dozens and dozens of these mini products within your organization that all have at least implicit, and hopefully explicit, interfaces and promises that they’re making to other teams in your organization. And then this stuff can really start grinding you to a halt if you don’t get ahead of it.
Investing in Sustaining Services
John: One of my first questions on that is, as you’ve been through this so many times, when do you think the first indications are that you should start investing in the practices of sustaining a service? Is it when your first customer comes onboarded? Or is it a little bit later than that?
Because I think when you’re first customer comes on board, you’re still not assured of success, and you’re just trying to move super fast and throw things over the wall, and you’re not investing in the service catalogs and documentation. How do I reduce the bus factor here? Because you just don’t even have the headcount to support that. So when do you think is a natural place to start thinking “we need to start investing in the long-term here”?
Jesper: Yeah. That’s obviously a really difficult question. In those early days, days matter. And what you spend your days on can make or break you, if you take the eye off the ball and don’t focus on feature development and then moving your product forward. So this is a very difficult question to answer.
It’s probably not the first customer. You probably need a little bit more momentum before you do it. But I will say that one thing you can do for yourself, even before you start actually putting work towards it is to try to manage your culture. Because what you want to do is you want to make sure that you will embrace doing it, when it’s time to do it.
Another problem that often happens in startups is that you build your culture around the things you don’t need initially. So you build a culture to find all the things you don’t need. So you can build a culture around, “we don’t need sales.” You can build a culture around, “we don’t need QA” or “”we don’t need project management. You can build a culture around, “we don’t need to write comments because the people who own the code and the people who wrote the code, they understand it.”
I’m not saying anybody does that consciously, but just think about that.
If you start with a culture that says, “Well, one day we’ll have to do that,” it becomes easier to do it the day it’s necessary and you won’t resist it.
So at least doing that and being aware of what are the things we’re not doing right now that we will have to start doing at some point. I think even the awareness of that is actually a great place to start.
John: It brought to my mind this idea of, there are things that are reasonable sacred cows like, “Okay. This won’t ever change,” and those tend to be more cultural things. We value the idea of continuously shipping and keeping master greens so that we can always deploy. We’re going to maintain that forever, versus things that could change as the company evolves.
This idea of, “we’re going to build our culture around like, we don’t have salespeople,” that definitely resonated with me because that was Atlassian in the early days. Lo and behold, they have salespeople now and that was a hard shift for them. So I like that idea of saying, “Okay. Here are the things that are like, we’re really going to be intentional about. These really are a part of our culture,” versus things that need to be revisited almost on an annual basis at a startup.
“We don’t have salespeople or we don’t have QA even, or we don’t have project managers,” don’t necessarily seem like great things because once you have those people, they’re going to start reading the old documents and reading all of the blog posts about how you were so proud to not ever need project managers because they’re so wasteful. And then the first project manager comes in and it’s like, what am I doing here?
Your Customer and Product Roadmap
Jesper: As we think about how the teams mature up and then the notion of feature flags and then using feature flags to the point continuously is, I would say fairly well ingrained at least as a concept right now. But then when we’re talking about companies like LaunchDarkly, when you build that as a service, all these teams out there who are starting early, then building up and then adding feature flagging at some point, how do you get insights into that progression and their maturation?
I was really curious how you’ve seen that and how you’ve been able to incorporate that into your product and your roadmap?
John: That’s a great question. I think that’s one of the things I love about building developer tools. From your own customers, you get to see everything. Like you, I’ve been in this space for basically my entire career. So you always get to look at your customers and when you’re at a company like Heroku, like Twilio where your customer base goes from tiny startups to Fortune 100, Fortune 20, then you get to see the gamut. You need to see everything in terms of how people develop software.
We have that at LaunchDarkly now. One of the things that we’ve been able to see at organizations is, how feature flag adoption has gone from a company that’s ‘never done it before’ to ‘relying on it.’ And then we’ve even seen the bad side of that, because feature flags can be misused and abused. We’ve also gotten a good deal of insight into what goes wrong? What happens when you use feature flags and don’t clean up after yourself or what happens when you have no best practices and people are using feature flags as a crutch for everything?
So that’s been really interesting to see. And it’s a thing that you can use to directly inform your product. So for example, we’ve invested a lot in LaunchDarkly on functionality to help you reduce technical debt. I mean, honestly, even us dogfooding our own product, we’ve gotten into a state sometimes where we have flags of, “Hey, I don’t know who owns this. I don’t know what this flag does. It’s been rolled out to everybody for eight months.”
We even had a situation where we’ve built something and it was supposed to be turned on for everybody and we just forgot to turn it on for everybody. So it was like 20% of the customer base had the flag enabled and it was just sitting there for weeks at 20%. So we’ve seen the good and we’ve seen the bad, and it’s really been valuable in terms of informing what we try to build and how we try to adapt to get everybody to a state of adopting feature flags and being in a good place with it.
Jesper: It’s one of the areas of debt that I’ve seen sneak up on people because you tend to think about feature flags as a good thing for a long time. You’re really proud. You’re like, “Using feature flags is amazing. We can employ code without breaking customers, and then we can just flag them afterwards. This is really cool.”
And then it’s a little bit further down the line that you have the feature flags sprawl, and you really have a hundred different behaviors of your own product, because it all depends on what the flag combinations are set to. You start realizing that you have a problem.
What have you seen amongst both yourself and your customer base? What are some of the most excellent practices that you’ve seen teams exhibit in order to keep that under control, whether you’ve turned it into a feature in your product or not? What would you say is like, “This team figured it out. This team instilled a certain process or practice that’s all for that”?
John: I love that question too. I would say there’s been a bunch of process things that teams that have done this well have adopted. One of them is pushing the flagging concern into the planning phase, being deliberate about what flags are being introduced and what the use case is.
For example, at companies that have adopted this really well, it’s a planning concern where the PM and the engineers are like, “Do we leverage feature flags in this project? And if so, what are we trying to get out of them?” So that could be like, “We’re running an experiment and we want to measure the impact of this feature on a specific KPI or metric. So we’re going to flag it so we can create a control and experiment group.” That’s one of the things that we’ve seen.
Another thing that we’ve seen is like the deliberate cleanup process is incredibly important. And that actually requires work. I think one of the things we talked about is, if there’s the adoption of modern delivery practices, some of the new things that teams are doing now, feature flags being one of them. But when I look at each of these things, there’s always a way that they can be abused. There’s always a bad state of them.
Knowing that and knowing that you need to invest to avoid that, I think is incredibly important. A great example is continuous integration. 10 years ago, continuous integration wasn’t ubiquitous. It wasn’t something that everybody did, but the bad version of continuous integration is like a test suite that takes 24 hours to run. And now everything is blocked waiting for the 24 hour test to be determined.
“I’ve got 3,000 flags, 2,500 of them are active. 500 of them have been introduced and we have no idea what they do, but they’re not being called anymore. And we don’t even know where they are in the code base. Some of them are mission critical and we don’t have any idea how to clean them up. And we don’t have anybody maintaining them.” That’s what bad looks like.
One of our objectives in building LaunchDarkly is what are all the things that we can do to prevent you from getting into that bad state? Because we know that left uncheck, you are going to get into that bad state.
Jesper: Were there any cool features that you put into the product that helps with that?
John: Definitely. One of them, we actually launched at GitHub Universe a couple years back. We just find references to feature flags in your code base and then show them in one place. And that’s really useful. I think more broadly speaking, we’re introducing more and more around the lifecycle management of a feature flag. So like what fraction of people are getting this and then which ones are inactive, which ones can be removed from the dashboard.
And then moving that into a workflow of here’s all the things that you should look at and here’s what you need to do to get them out of your code base. I almost feel like at some point people are going to ask for a Kanban style thing of, you can only have this many flags in prod and we’re not going to let you create more because we don’t want to let you shoot yourself in the foot. So to introduce another one, you’re going to have to close one out. That seems like an inevitability to me.
Jesper: I think it’s important for people to realize that the value of software is that it’s homogeneous. You have many customers that are using one thing and every time you add yet another feature flag, and you add another combination use of your product, you’re diluting the value of your software. If you have 10,000 customers that are using it in a thousand different ways, do you really have a piece of software?
John: I think that’s true. I do think that there’s like a counterpoint to that, which is, are there ways in which product experiences need to be more tailored towards specific audiences? Maybe not as much in the B2B space, but in B2C you think about things like personalization, delivering different product experiences to different consumers and the value that provides.
I think there’s a version of that in the B2B SaaS world. If that turns into a sprawling code base with a million factorial combinations of different product features. That’s a bad thing in the B2C space. It could just be like a business need to do that.
I guess one anti-pattern that I see there is, are you introducing a feature flag because your product team can’t make an opinionated decision about how something should be built, so you just have it both ways by flagging it or are you actually delivering a unique experience that needs to be there? Or is this just like a rollout thing where you need to retire it right after? I see feature flags being used for all of those things.
Jesper: Yeah. I’ve seen them especially in B2B. They’re being used when you’re lazy and you don’t want to build an actual configuration option for the customer. So you’re talking about personalization. Obviously, most products have some admin interface. Some admin panel and settings panel. You go in and you can turn it on. Do you want this or that?
Every time you do that and you give it to the customer, you realize that now, you are creating permutations of your product, that all have to be tested, but if you do it that way, you’re being honest to yourself about the actual set of combinations you’re creating. You’re a little bit more incentivized or a little bit more likely to go test it properly and so on. That’s why an anti-panel I’ve seen is teams will go out and they’ll just have a feature flag for that and they’ll find a support ticket and somebody will fit the fact.
Jesper: And then there’s another behavior, which is a bad customer experience, but it also means you probably aren’t properly testing all sides of it. It also means that there’s not full visibility across your entire population, from your customers and salespeople to support people that this combination exists. Now, it’s not out in the public docs and becomes a little bit more hidden. So it’s eventually going to sneak up and surprise you at some future time.
John: One of the value propositions we provide is, when people have built their own feature flagging system come to us. “We have feature flags. It’s a bit in the database, it’s a hash table or something like that.” And it’s like that’s great. But are you recognizing the downstream implications or different experiences being delivered to different people? Your support team is not going to have visibility into that database configuration bit that’s on the account level that you’re using for feature flagging.
So when they get a support ticket, how are they going to understand what varying product experience has been delivered to that customer and how can they react to it? And that tends to be like an aha moment for teams that are like, “Oh, I get it. It’s a piece of infrastructure that we can use to deliver different product experiences.” But it’s also, “there’s this collaboration layer that needs to exist, that the product has to provide otherwise downstream, the weight that’s generated from different flags being on for different people doesn’t really get understood. It doesn’t really get accounted for.”
Jesper: Going back to this difference between ignition and controlled burn, LaunchDarkly is definitely past the ignition stage at this point. So it’d be interesting to hear from you, in your engineering organization, what are the sets of activities or things that you feel you’d do for a given team for a given part of your code base or the whole code base that is making it easier for you to accept churn, to offer a consistent product, while engineers join or leave your team? What are some of the top practices that you put in place there?
John: Candidly, I have to say, it snuck up on us. You always want to be able to anticipate what’s coming next. I don’t think we adequately anticipated the steady state and we’re feeling it. This year we’ve basically doubled our engineering team in size. We realized that things that were just tacit knowledge for a group of people weren’t documented. And there were just certain behaviors that we expected everyone to operate on.
I mentioned in our conversation the other day, daily standups. We never wrote down that every team was expected to have daily stand-ups but we just assumed, “Hey, we’ve been doing this.” As we’ve grown in new teams, at least somebody that’s been at LaunchDarkly for awhile that has gone through the ritual of the daily standup will say, “Hey, shouldn’t we have a daily standup for this team?” We found new teams were like, “We have a daily standup. Wait, we’re using a tool for that? Oh.”
We of course had done the thing of automating things. We do daily stand-ups through this tool called Geekbot which was all driven through Slack. The standups are actually tied to some of the metrics that we’re tracking. Specifically around how much time we spend keeping the lights on. So when squads weren’t using that same mechanism, we just had missing data for their teams.
To your question, I’d say, effectively,
it all comes down to writing down what it is that you’re doing, how you’re expecting teams to behave and what norms there are to reduce the bus factor, and onboard people more quickly, more rapidly and get them up to speed as quickly as possible.
It’s been hard. There are very few engineers that want to sit down at the start of their day and say, “I think I’m going to write down everything that I do.” That’s a hard pill to swallow. It’s hard to sell it to teams sometimes. One of the things that we’ve done recently that I think has been really helpful is, we define this document called Product Principles.
Unlike Twilio, we basically have one code base in one product. I’d love to hear how things differ at Twilio. But one of the things we defined was, “here’s how the product should operate. Here’s a mental model of the product. And as you extend the product, unless there’s a good reason, it should follow this mental model.”
It’s really tactical hands-on stuff. So for example, when we create a new entity in LaunchDarkly and an entity can be like something new in our data model, for like a feature flag or an environment or a project. When you create a new entity in LaunchDarkly, it should be a version. There should be an audit trail that’s visible to customers. It should have a REST API. It should work with our role-based access control system.
We wrote down all of those rules so that when an engineer comes on to the team and is starting to build new functionality on top of LaunchDarkly, it feels the same as the rest of LaunchDarkly because it works in a certain way. It’s like defining the rules of physics in the product. And if you can define the rules of physics in the product, then somebody that’s interacting with your product can feel comfortable interacting with a new part of the product because the same rules apply. Gravity works one way here and it should work the same way over here.
So that was one of the things that we just rolled out recently that I’m hoping is going to have a pretty big impact. I’m curious in the world of Twilio where things are a little more, it sounds like heterogeneous in terms of the products and the teams building them, are there things that are common unifying factors?
A Culture of Documentation
Jesper: It’s definitely a really interesting difference. It’s the difference between having an engineering organization and a product that’s defined by mostly one code base or one repo versus many. Both at Twilio and Heroku, it’s distributed. So there’s just lots of code bases. It’s not necessarily a matter of size of what you’re building. It’s interesting that in places like Google, they’re really trying to drive things through a model repo and through a little more rigor around that.
But both Heroku and Twilio has had a fairly similar culture around this. And it means that it makes it easier for you to start to build a new service and pick new technologies and be free. But it also means that you can’t write a file inside a repo that describes how to contribute so to speak, because the contributions are all over the map. So I like what you were doing with principles. It’s one way of trying to do something that’s a little bit more global.
We’ve definitely had discussions in both my jobs about just how rigorous, how much consistency we want to impose. I found that taking a page from something like open source has been helpful. If you look at any good, fairly successful open source project that does a repo, the authors of a project like that are forced t document it in a way that assumes that any random person might want to come and contribute.
So they really have the extreme version of not being able to impose anything on anybody. Or at least impose anything in terms of who’s going to come and be interested. Once they engage and want to contribute, then they can impose the policies right. But before that even happens, they’ll have to make the documentation and information about the project rich enough that people feel compelled to contribute.
So I take a page from that and we’ve encouraged people to think about their code basis in that way. Not just if you’re hiring and somebody joins your team, but also if you want to encourage what we call guest engineering. Somebody from another team comes and contributes to your code base. What is the documentation that is needed there?
Another thing that I found, and this is specifically when you start having microservices and you have a lot of services that are internal facing, it’s very easy to not be as disciplined about those services as you would be a product. For LaunchDarkly, I can go to your website and read the documentation about how to use it including API documentation, the developer facing documentation.
If you have an internal service, what I’ve seen is that user docs can easily get mixed up with system docs. So the documentation about how we were running the service is mixed in with how would you use this service? A first place to start is to separate it into those two buckets, because then you can better judge the quality independently.
So you can look at, is this service easy to use? Can we actually document how you might use it? Is this service easy to understand? Can somebody come in and actually operate this service and make changes to this service? What would we need to document in order to make that happen?
That’s been the first thing that we’ve been focusing on and evaluating. Another thing that we’ve also spent some time on is thinking about tests, not just to prevent things from breaking when we deploy, but also as documentation of what the service is supposed to do. It’s been really encouraging when we see developers invest in writing tests and they come back and say, “This is going to make it much easier for the next person to understand this.” I wish this was here when I got here.
John: I’m curious if the motivational tactic of allowing other people to contribute is enough of a motivation to get people to engage in those behaviors, to write great documentation on how to work on the service or how to contribute to it and how to run it, or are there other forcing functions that you have apply to get the teams to do that kind of work?
Jesper: This goes back to what we were talking about earlier. I think you can do a lot of different things there, but at the end of the day, building it into your culture early on is going to be the most impactful thing you can do. So build it into your culture that you document. Something along the lines of a Ralph Waldo Emerson quote-ish like, “You leave the world a better place than you found it.”
Think of it as just an objectively good thing to do. As you learn, you document and then the person after you won’t have to learn the same thing all over again. And interestingly, that’s something that should be ingrained in anybody who has been learning the scientific method and has been studying engineering.
People in the computer industry should really entice that kind of behavior. It’s a matter of encouraging it with culture. Same with how you don’t want a culture that says, “Sales is not for us.” You also want to early on talk about how these things are valuable, even if we’re not doing them yet. It’s easier to tell yourself, “we should be doing it. We’re not doing it yet. We’ll get to it,” rather than give yourself excuses saying, “The reason why we’re not doing it is because we don’t do that stuff around here.”
Give yourself a rationale for not doing it. Then that’s going to stick. Otherwise when it is time to really prioritize it, then you’re going to continue to not do it because now you’ve built sort of an excuse into your culture.
John: If I had to go back and ship anything about our culture, it would be to bring in the idea of writing as an important part of the culture from the earliest stages. One of the things that we’ve discovered is, it’s really hard to retrofit that into a culture that didn’t have it in before because you don’t hire for it. You don’t test for it as a competency in the staff, in hiring.
If there isn’t this established baseline of, “Here’s what our expectations are around what you write and how you communicate information in a written way,”it’s hard to reboot that or change the direction of the company once that hasn’t been established. If I ever started another company, I think that’s one of the things that I would, change from day one.
Jesper: At Heroku we used to talk about Readme Driven Development, doc driven development, and that’s always just something other companies out there are doing. I find that there are two things about documentation that are important. One is when you work backwards from documentation, it’s just a great way of building something. It’s shorthand for forcing yourself to understanding your users. Obviously, there’s user documentation.
The other thing that I think is important is that it becomes a check. One of the problems I’ve seen is, you write code and the code exhibit a certain behavior, and now you have a user and the user starts using your service. Now, the question is, is it behaving correctly or not? We’re not talking big picture. It could even be, should you use single quotes afor formatting JSON strings.
Jesper: But it behaves in a certain way and somebody starts relying on the behavior. And then later on you find you have a debate about, “Well, it wasn’t supposed to behave in that way, so I changed that. And then I broke my customer.” Or the other way around. The customer says, “You can’t change it. I expected it to behave in this way.” Having documentation gives you the third leg of that stool.
So you can have the documentation that explains how it’s supposed to work. And then you know that if you wrote your docs that it’s supposed to work a certain way and the service actually does not work that way, then it’s easy because it’s a bug and you’ll go fix the service. Now, it can also happen that you have a doc bug where the service is actually behaving as it should, but it’s not documented correctly.
But holding yourself to the standard of then fixing that and then getting it documented correctly is very valuable. It increases trust with your user, whether that’s an internal or external user.
This is why I like to think about documentation as code. A practice I’m seeing more and more out there is where you have documentation checked into a repo, maybe the same repo as your code base. You can make a pull request on it and say, “We want to update how this is supposed to work.” You’re going to express that through a change to the documentation. If you accept that pull request, now you have a delta between how it actually works and how it’s supposed to work. And then you can go and either do all that in one pull request or do that in a subsequent one. You can have that workflow and you can have that audit trail afterwards.
John: I love that. We did that to our docs. We opensourced our documentation as a Git Repo. You can submit pull requests. You can also go into our Gatsby site and suggest edits. You don’t get a ton of contribution that way, but the fact that you can and that it’s backed by Git and you can go back and look at the documentation, circa 2018 or something like that, is super valuable.
Jesper: I love it. And you hold yourself publicly accountable that way, right? There’s been situations where we receivea support ticket from a customer, and we swear, it used to not work this way. What changed? Especially as you grow bigger and bigger, the support engineer that’s answering their request will tell them, “Well, I look through the docs and it is supposed to work that way.” And the customer counters, “Those docs changed. I swear.”
Investing in Shared Infrastructure and Productivity
John: We can check our docs from 2019 when we thought it worked the other way. It’s much better than going into the Wayback Machine to see what the docs look like in 2019.
The topic we were talking about earlier is, when is it appropriate to invest in going into the steady state? Tangentially from that, I have this question for you, which is, when does it become appropriate to start investing in shared infrastructure and productivity internally?
A concrete example of this is at some point in the scale of a company, you start building internal tooling. You might even have a developer productivity tool or a team of people who are tasked with making everybody else more productive. They don’t build anything for the customer, but they’re developers. They’re building infrastructure for the developers within the company. When does that happen in your opinion? When is an appropriate stage for that versus like a premature optimization consideration?
Jesper: That is a huge question. If you look at the big companies out there today, you’ll see that the cultures drive them in different directions. Some will resist it and some will rely a lot on it. A company like Microsoft started as a platform, and so it has a like platform in its life. But you have a company like Amazon that is in a weird way not a platform company and that it’s all about building value for customers all the time. But then they’ve taken that cultural value and used it to make them good at building products for themselves as well, which then become platform products.
You have companies who set out to become really good at building products. If we get good at building one product, then we can build probably number two, number three, number four, number five, and then that’s how we’re going to really grow. But I think it’s important there to realize if you really want to do that, you have to really silo it and treat it as a product end-to-end. It has its own market, its own products, its own technology stack.
I think a more common scenario as a company is, you build product number one and then product number two and then you start distilling something out of it and realize that you’re building a secret sauce and you’re distilling that into a platform that then becomes a competitive advantage. That’s probably the most common way to strategically think about why you would want to build a platform.
So the when follows that. When you start feeling like you have something you can extract, that would be one way to make that decision. Now, there are other ways to think about this decision. I would say in the software world we live in, most companies and teams that I’ve seen are probably not as focused on automation at the next layer out as they should be. It always comes a little bit too late. And you always end up doing a little bit too much manual work.
You told me you were running multiple instances at LaunchDarkly. Sometimes people think about platform as something big and shiny, but platform can just be more common tooling or how do you deploy different instances in a way that’s fair to automate it and you don’t really have to do a lot of work. That’s a great example for if you went and addressed a new market opportunity by deploying LaunchDarkly one more time, but it’s totally manual. Every deployment is kind of hand-cranking it. Now, you have two of them. At that point you probably should automate it already. But you do of course have to justify.
So you have to think about, is this a pattern that’s going to continue? Are we just going to have two forever? But even if you’re only going to have two forever, you probably should still automate it. So I would say that being able to do meta programming as I would call it, step away from where you are.
Step out one layer from where you are right now and look at the things you’ve considered manual tasks and start automating them.
As one way, you can platformize this. It’s not the only way, but that’s one way you start building actual code that customers don’t see, but that help you run your systems.
John: I love that because I think when a lot of people think about the platformizing step, they’re primarily about it in terms of how much developer effort is it costing to deploy so we can make everybody more productive.” But I like this idea of thinking about it in terms of, “Okay, if you’re eventually going to grow organically by acquiring somebody else or building a second product, then the abstraction piece of that becomes a strategic or a competitive advantage that you can apply.”
So I love that lens of it as just not just in terms of improving people, developer productivity, but also providing a competitive advantage in the marketplace by providing you a set of abstract components or tools or code or processes that you can reuse to get you to a second product or a second opportunity faster. It’s a cool insight.
Jesper: There’s another classic reason or what people will call ‘platform to invest in,’ and that is customization and building an ecosystem around your product. Putting an API so that your customers can customize your product, but also so that partners and third parties can integrate into your product, that’s a place where the word platform is being used a lot as well.
All this sums up to, platform is an overloaded term and to start by knowing which one of these is it really about. Is this is what you’re talking about right now? Then based on that, once you’ve answered that question, you can figure out, is now the time and what’s the ROI?
LaunchDarkly, you started with smaller customers, right? But at this point, you’re working with enterprises. I think of you as a product-led company that ships frequently. Your customers love that generally, but as you moved into enterprise, have you found that the way that you deliver a product has had to change?
Building for the Enterprise
John: You know, I think in some of the earlier days of LaunchDarkly, there wasn’t this tension between the things that we would have to build for enterprises and the things that we would have to build for smaller customers. And I think a piece of that or a good chunk of it was a function of us being in this category creation mode, where we were a new capability that people didn’t really have, except through homegrown systems that, frankly, for the most part weren’t very good.
That meant that a lot of the basic functionality that we had to provide for people was stuff that would be valuable, not only for large enterprises, but also small customers. Even today, when we’re thinking about making a product decision, a decision to build something, we do map that to personas and enterprise versus non-enterprise.
But it’s often the case that even for the enterprise grade functionality that we’re building, one of the realities that we point to a lot of times is, software teams are fractal. So a small organization might have one engineering team. As a company gets larger it might scale out into multiple engineering teams with a layer of management on top of it. And it’ll grow sort of fractally from there.
We’re seeing that more and more as an organizational model that even larger enterprises are following. And that means that we can build functionality that works for smaller teams. And a lot of times that will work fine in a larger enterprise, as long as we’re thinking about how that maps the fractal adoption of that feature.
All that said, there’s still stuff that when you go build practically, it doesn’t make sense for the smallest unit. For example, enterprise administration delegation, that’s not something that a small startup is really going to get a ton of value out of, but it’s a persona at a Fortune 100 company that you need to build for and a set of functionality that’s helpful for them that you have to tackle.
I would say overall, we’re still in a state where greater than 50% of the functionality that we’re building, is stuff that’s applicable to almost anyone which is kind of a cool space to be in. It avoids that tension between, are you investing in and building for enterprise versus are you trying to go for the product-led growth motion and build for customers at scale?
Jesper: What about the delivery model or the delivery cadence? Have you found that if you keep changing your product, enterprises don’t actually like that, and they’d like to have a little bit more fixed version of your product and you’ve had to write out a release mechanisms?
John: I think that’s one that I had definitely seen a bunch of at Atlassian. When they shifted into cloud, there were enterprises using their cloud offerings that moved from hosting their own Jira or Confluence for example to moving to the cloud model and accepting constant change and hitting a lot of struggle with that. I remember at Atlassian, we used to release on Sundays. So on Monday teams would come in and say like, “What’s fresh in my Jira instance? What fresh hell am I in for today?”
I know that that was something that you experienced a bit at Salesforce as well. We hear a little bit of that at LaunchDarkly, not a ton of it. I wonder whether it’s a function of companies becoming more accepting of cloud-based delivery models or whether cloud-based continuous delivery needs to adapt the other way and be more accepting of hesitance around change being deployed in enterprise software?
I think it’s probably a little bit of push and pull. But one thing I will say is that even with enterprise customers that are accepting of rapid change that you would get from a fast moving startup like us, they do want better visibility around change. So “let us know what’s happening and let us know in advance if you can when something is going to happen,” so they can map that to their ability to manage it on their end.
Jesper: That is so true. I think as you say, it’s really important for smaller companies to know the difference between something enterprises don’t like because they’re in the middle of a transition versus something they don’t like want a certain way because it’s a completely valid request and you just have not really been used to dealing with enterprises.
I find it really important to be humble and a great listener. But the hard part is you can’t flip all the way over and go tell me everything you need and expect me to go give it to you. So that’s a fine balance to strike. I was also just thinking specifically in the case of LaunchDarkly, even in enterprises, your customers are probably very development oriented.
One of my experiences from Salesforce was that Salesforce early-on had this concept of ‘do not auto enable’ and they still have it. So Salesforce, the end-use of sales was salespeople way back in the time when it was just SFA. Every company had an administrator that owned the experience of the sales people. And it was very important for us to deliver the capability to the administrator and that the administrator roll them out to the actual end users.
That led to, which I see every now and then on Twitter and you just refered to, when Salesforce comes out with its release notes three times a year, it’s like a thousand page document. But the administrator actually like reading it because they finally get to discover all the great things and answers to past problems that have now been delivered to them. It becomes their job to roll it out to their customers, which they’re really good at and proud of.
Those are some things that definitely are different in terms of delivering to enterprises compared to what we were doing at Heroku startups.
John: I love the idea of having the administrator persona be somebody that you target deliberately. You have product managers that are servicing that persona. That’s so incredibly valuable especially if you’re like almost anyone building a developer tool.
At some point, you’re going to sell to the enterprise. And if you want to sell to the enterprise, understand that there is a person using your product that doesn’t give a crap about feature flags.
All they care about is rolling out that tool because somebody has told them that it’s valuable for whatever reason. They want to roll that tool out to their entire organization and make sure that from a compliance perspective everything is buttoned up. From a control perspective, they’re able to do what they need. They can audit changes. They can make an add and remove administrators. They can do administration delegation to the different team members.
All of that to say, there’s so much that you have to build in your product to service that persona. And if you’re not thinking about that persona, you’re going to struggle selling to a company with a thousand developers or 5,000 developers or 50,000 team members within the organization.
Jesper: Yep. I agree. And I don’t really feel like we have any excuses anymore. I think that that’s changed over the last 10 years. 10 years ago, I think we had a wave of SaaS products being available and it was unclear how enterprises would work and adopt them. And the people who built those products were learning at the time what enterprises needed, what large organizations needed.
But that’s no longer the case. We have that knowledge now, and I’ve seen a lot of startups build new SaaS products targeted at B2B SaaS products that very early on have some of these basic enterprise features. SSO is an obvious one.
Jesper: Organizations and teams like management structures within the product that helps an enterprise or a large organization. Like you said yourself,iIt grows fractally. It gives them the management constructs that they need.
John: It’s becoming cheaper to build that functionality. Not only is it becoming more well-known what you need to sell, but it’s becoming cheaper to build it. There are more SaaS offerings that help you get to that point. Take compliance as an example. When we started, I remember we were selling to an enterprise customer early in the days of LaunchDarkly and they asked us if we were SOC 2 compliant.
I’m going to admit, I had no idea what SOC 2 compliance was at that point in time. And I was hemming and hawing like, “Ah, I don’t know. I think we could do it pretty quickly. We focus a lot on security here. We could probably get there pretty quickly.” Then the person on the other side was like, “Well, it’s pretty obvious you don’t know what you’re talking about because SOC 2 is relatively difficult for one thing, and it requires six months of continuous operation to get your type 2. So you’re clearly making this up.” That was the world back then.
Nowadays, seed stage startups, through better tooling and more mature engineering capabilities, get those compliance things out of the way.
Compliance is just one example. Across the spectrum, from role-based access controls, audit logging, things that are understood that you need to have them. It’s becoming easier and easier to not only justify the investment in doing them early on, but also to just execute and build on them.
Emerging Cultural/Technical Trends
One of them as you pointed out, like the shift into more remote and hybrid. I think there’s going to be a bit of a snapback for a time. I think people think that this hybrid model is going to be easy and they’ll just roll into it. Once people start rolling back into the office, people are going to start realizing just how challenging it is to support a meeting where 10 people are in a room and 10 people are remote.
That’s actually really difficult to do. So there’s going to be a bit of a snapback. I’m not sure how people are going to adapt to that, but I think that that’s a reality that we’re going to have to sort out. Another thing that I think about is this fractal model of software teams. I think we see this more and more with teams becoming more and more integrated in how they operate continuously.
From a development side, the idea of continuous delivery and how it impacts how developers do their job is relatively obvious. The DevOp is moving. Now, there’s this idea of, “let’s throw security into the mix and have them operate continuously.” So now we have DevSecOps and it’s like, “Wait, PM should be able to get into the game too.” So should we have like dev sec PM ops? Is that a thing? Then it’s like, “Well, what about designers?”
I think the reality that we’re moving to is this continuous operation model, where we have rapid small batch change, operating continuously, using tools to create faster feedback loops. I see that becoming a discipline that’s embraced across all the facets of software delivery beyond just the dev teams or the ops teams. I think PMs will be much more mature about looking at a piece of software running in production, knowing how they can extract useful information from that and use it to develop better tools or security for example.
We’ll be operating more continuously. I see that as a trend in the next five to 10 years too.
Jesper: Maybe I’m a little overly optimistic about this, but to take like this notion of the repo where you used to just have the code in there and then now we’re talking about having the docs in there. But you should have the analytics scripts in there that the PM is using to drive dashboards and you should have of course the design and all the front end stuff in there.
And it’s important to know that it’s not like you could deploy all that necessarily. It’s like a unit of product where you have all the artifacts that actually represent the product but also the processes around the product, a representatives artifacts in a single place. And it doesn’t necessarily of course have to be one repo. You can use GitFlow type mechanisms to then drive change around it, and to drive collaboration around it.
I’ve seen a lot of that at Heroku, where I felt like it was almost still in its initial stages and it has a long way to go. I know, GitHub, the company itself also is very big on that, and I’d love to see more of that.
Defining Technical Debt
We have two mechanisms that we use right now, and I guess this would be large-scale debt or big picture stuff. One is that we maintain a risk register. So up and down teams, starting at the team level. But it involves PMO leaders, high level engineering leaders and so on, to evaluate a certain risk item and then recording it and then having a global register for all of that. Then that allows us to bubble it all the way up and influence top level priorities, which then basically creates room to go do something about it afterwards.
What usually happens is you’re sitting in a corner screaming and something is way off. But meanwhile, the marching orders are coming from far away that you just got to keep building features. So that was one way that we send a signal up and down so that we can find the right place in the organization to create the space needed to address these risks. So that was one way.
The other thing that we’re doing is we have an operational maturity model. So at any one point in time, we have a mechanism to tell us, a methodology to tell us how healthy a given service is. It doesn’t necessarily translate to us stopping the line. If there are some issues in the service, it’s still a judgment call.
That gives us the data to say, “Okay, we have a service over here. That’s not where it needs to be health wise. Healthy here means maybe it doesn’t have test coverage. Maybe it doesn’t have documentation. Maybe it doesn’t have other things. Maybe it doesn’t have the right kind of performance testing, or maybe it doesn’t have the right kind of scale testing. We’re not really sure if it has enough scaling runway.” Those are some examples of what we would capture. And then we can take action based on that.
Enterprise Features You Should Prioritize
John: Yeah, enterpriseready.io captures a lot of them. I’ll add in on the compliance front. Compliance is infectious, so you basically have to do it. What I mean by that is, if somebody is SOC 2 compliant, they’re going to require all their vendors at some level, the majority of their vendors to be SOC 2 compliant or get exceptions for them. So that means if you’re selling to someone that is SOC 2, which is the vast majority of enterprise companies, I would wager at this point, then you need to be SOC 2 as well. So that’s sort of like a necessity.
I’ll go out on a limb a little and I’ll tie this into what Jesper was saying about GitFlow and storing everything as code. That level of change management is what enterprises want from all the software they use. So if you think about building towards that, you’re on the right track. For example, in Git, you can create a pull request. So you can audit a change before it goes live. Customers want that from all of their tooling at the enterprise level, which means approvals are a thing.
That means you need to stage multiple changes without having conflicts and be able to have approvers in either internal or external systems accept or reject those changes. You need to be doing that in a world where multiple collaborators are operating on a piece of data simultaneously. Git has audit logs like commit history. You need audit logs in your product. A good analogy is, we basically all need to build Git into our own data models because that’s what enterprises want.
I think what’s interesting about the blank as code movement is not the fact that actually it’s sorting code, but rather that right now we have great processes for managing change in code, and we are desperately asking for the same in all of our other tooling. And the only way we know how to get there right now is by expressing everything as code.
I don’t think everybody really wants like your HR policies encoded as code or stored in a repo necessarily. But you do want to be able to say like, “What was my HR policy a year ago and how has it changed? And Git tooling would enable you to do that.
Jesper: That is such a great point. That was a big deal between myself and a few other folks over at Salesforce because we were focusing on APIs and people developing to the APIs and then that kind of led into Apex and Visualforce and all the various ways you can develop at Salesforce. At first, we looked at the ability to have clicks not code and be able to just be in an interface and change everything. That was just a great thing. It was only goodness.
But as we got more and more exposed to enterprise requirements, enterprises will come back to us and say, “We can’t tell what changed.” Then that turned into a gigantic backlog of feature requests. We needed to audit this change. We started thinking about that and being developers, having a developer heart beating in us, we go, “We’ve seen this before this. There’s somewhere in the world where they built tools that does this really well.” It’s called version control. And the only requirement is that what you track is a file.
Well that actually turns out to be a complex requirement if you build an app, if you build a system that has his own configuration inside itself. So we started working. I started working with colleagues over there on how to externalize that. I want to express these changes that people want to make to a Salesforce configuration and files, because if we can express it in files, now we can externalize everything. There’s a lot of good work done towards that.
John: I dream of a world where there’s like a SQL database where you can then use in a production application that has Git-like operations attached to it, but that doesn’t exist.
Jesper: We became PCI compliant at Heroku. We were of course very Git-centric already. So that was more of a no brainer. What the team did was build all the instrumentation and integrations into Git. So whenever we needed, “Hey, we need to capture this other thing. We need to catch this other thing. We need this report to exist.” We would then just build it by extracting out of GitHub what we needed so that the workflow was always GitHub-seen from a developer’s perspective and with small augmentations and small enhancements, but as little as possible. That worked beautifully.
SOC 2 Compliance
John: SOC 2, we got in year three or so. But now, I’m seeing companies get it much earlier. you have to remember, we started LaunchDarkly about seven years ago and the world has changed pretty rapidly. And now it’s more table stakesy. So I’m seeing like year one, year two teams get it. There are companies like Vanta and others that make it easier.
To the broader question of beyond compliance, the feature set, we’re still constantly working on that. We just launched approvals this year. Rollback is something we didn’t build into the data model and that’s hard to retrofit in. So we have to take the time to retrofit that in.
As an engineering leader, I think one of the things to think about is it’s almost like you can’t overinvest in the data model even at early stages in a startup because you can’t change that stuff very easily later. We had multiple iterations where we for example added the idea of an organization as a layer above every entity in the data model. And it’s like, that’s so incredibly painful to do that migration. Just like do it upfront. If you think you’re going to succeed, you’re going to need it. And if not, the time required to adjust your data model to be forward thinking is not going to put you out of business at an early stage.
I’m curious to get Jesper’s take on it. I see it from many of the enterprises that we talk to that are moving into that model and you hear different names for it. I love holistic software development or integrated software development. I wish I had the Twitter following to turn that into a movement or something. Maybe in some of the stodgier enterprises, there’s a cargo culting of that in the same way they’re cargo culting agile, where they say they want to be that, and then they’re like, “But we have to have them all sit separately and be in different meetings because of reasons.” And then you just end up with people paying lip service to the idea of having those integrated teams.
I’d say it’s a trend. I’m curious if Jesper sees that as well.
Jesper: Yeah, I do. I think it’s going to be really hard, honestly. I saw it at Heroku when we started hiring designers. Of course it originally was just the founders doing the design and Heroku was very design oriented. That makes it big shoes to fill once you start trying to build the science. I hired one the first designers that started at Heroku. I think almost by necessity because it was Heroku, they became very code-oriented. Everything was just happening in Git repos all the time.
But then that ended up becoming a positive or we ended up having that integration by default, and then maybe we missed some other things, but I think that the science that grew out of that became an incredibly strong design team and an incredibly innovative, not just in terms of the science they created, but in terms of how they worked. I know some of those people went over to GitHub and built GitHub actions and other things after them. It has definitely left a legacy.
I look at that and go, “How do you replicate that or how do you make that the norm?” I think that’s going to be really hard. Design tools have evolved so much since then too. I don’t really feel I’m up to date on them either. I don’t really know where that one is going. I feel like I’ve seen a version of this working at some point in time and I see the value of it, but I’m not really clear on what’s going to go.