July 22, 2014
MySQL CEO on 2 Open Source Business Models
CEO of Eucalyptus.com and CEO of MySQL reviews the world of open source software - who is producing it, who is using it, and how they are tu...
I'm not really going to talk about either of those.
My premise is that great documentation is what drives developer adoption.
It's that if you really want your technology, your product, your tool, to be picked up by other developers, code is good, but English is better, or your language of choice.
In 2004, these were the exciting, hot Python web frameworks. I would be surprised if anyone other than Alex has heard of any of these. These were all in use, people were using them. There was an active Python web community, almost overly active.
Today, the picture is very different. There's really three and these are not any of the ones that are on that original list. Some of that is just technology change. The change over a decade. But why were these three successful in the Python web community over all of the other ones that were around 10 years ago? Over the ones that could've been around?
If you look at all of these, there's a lot of differences. But one thing that they have in common is that they all have substantial, well-written, clear, great documentation. I don't think that's a mistake. I don't think it's a mistake that in this evolution of web development in the Python community that the ones that emerged successful over hotly contested ground turned out to be the ones with really great documentation.
Let's talk about Django's docs for a minute, because that's what I'm the most familiar with. So Django's about 80,000 lines of Python. It's 120,000 lines of English. There's more human writing there than there is computer writing. To put that in a bit more of perspective, here are some well-known great works of literature.
Django's about four times the length of the New Testament, it's longer than Infinite Jest, one of the famous long works of fiction. We're actually about a third of Proust, which kind of blew my mind when I re-ran these numbers. Notably, Django's documentation is ten times longer than what a publisher would tell you your first manuscript should be.
So if we brought Django's documentation to a tech pub, the first thing they would tell us is to cut 90 percent of it. Length doesn't tell the whole story, quality is obviously important, too. Time and time again you hear from people, "I chose Django because the documentation was great." "I saw there was a great community." "I saw great documentation." "The strength of the documentation was originally attractive to me."
It's very rare to hear people say things like, "I was really impressed by Django's ORM, so I..." That's not common, but documentation definitely drives that.
We put a high premium on documentation at Heroku because we put a high premium on developer adoption.
A happy and active developer is one of our key success metrics and documentation is one way that we get there. You hear this term "growth hacking" passed around. The idea behind growth hacking is you want to do things that increase your conversion.
You have a certain number of people come to your site, come to your tool, and then some of them will attrit. We've discovered one of the most successful things that we can do, is really good documentation. We recently launched some new getting started guides and can actually directly measure the impact that these new guides have on conversion, from people looking at the guide, to signing up, to deploying their website, to becoming a paid customer.
We can track that through, and we can directly note the changes and improvements so these guides translate directly to better conversion. Looking outside of Heroku, one thing that I think about a lot when I think about documentation is when Stripe first came around.
I was already doing "e-commerce-y" stuff. I had tools that worked. I had no real reason to switch. Yet Stripe comes out and launches these tools, there were some things, some fine touches. On the documentation page they have a live, working example that lets you see exactly what the product looks like right there in the getting started documentation.
In their examples, your personal API key is actually embedded directly into the examples if you're browsing the docs while you're logged in. Which means that you never have to deal with this problem of, "I copied and pasted the code but now I'm using the testing tool."
It's these little touches that make people think, "If the documentation is this good, imagine how good the product's going to be." I really think that these types of steps, these types of activities, translate directly to better adoption. When you're in a community like this that is making tools for other developers, that's what success means.
Hopefully at this point, I've got you a little convinced that documentation is good, so I'm going to spend the rest of the talk getting practical and talking about how I approach documentation, how I think about the problem, how I write, how I structure what I'm doing, and hopefully give you some advice that can help you think about your own developer documentation.
To get there, I'm going to ask three questions about documentation, specifically around developer tools.
Why do people read documentation? What are our users, the readers of our documentation, interested in? Who should be the person writing that documentation? And what should we be documenting? How should we document?
So why do people read documentation? Well, think about it. When have you referred to developer documentation? Probably a lot of different things come to mind. We read documentation because we want to learn about a new tool. We think about it because we want to learn, we want to level up.
We come back to documentation because we need help, there's something that we're trying that isn't working. A lot of times people reading documentation are reading it in anger, "It's not working, why isn't it working, help me!"
Other developers read your documentation. One of the things I love about Heroku's documentation culture is that our public API is also our internal API, and because we take documentation very seriously, it means I don't need to bother people working on that team with details of how to use some API or some interaction. I can just go to our documentation and look at it.
Documentation is not just written for your end users. It's also written for your colleagues, your fellow developers, and other members of your community.
When we're talking about developer tools specifically, no one can hold every piece of information in their mind. I know, conceptually, how joins work in a Postgres database, but I'm not going to remember the exact syntax every single time. I need to go back over and over and over again to all the developer documentation that I use to look up the specific details of the piece that I'm trying to use right now.
Fundamentally, documentation is communication. So when we ask you who reads documentation, it's kind of silly. Who talks? Who communicates?
The other thing that you realize looking at that list is these are conflicting; the needs of a brand new user who's never heard of web programming before is very different from someone that's been using Django for 10 years and needs to know the details of one specific thing.
Great documentation is often at odds with itself. It has to serve multiple people. This means that you can't just write things once and expect them to stick. You need to write them repeatedly in different ways, for different audiences, with different tones, with different levels. You have to be thinking about these users all at once as you think about writing your docs.
We know who we are trying to talk to. Who should be doing the writing? I hear this a lot when I talk to tech people.
"I got into technology because I'm bad at writing." That always cracks me up. I just noticed I've been at Heroku about a year and I've sent 3500 emails, about 10 emails a day, every day, including weekends, many of them quite long.
You're probably the same, you may even be more than me. Now that's not to mention all of the IRC, Hipchat, Slack communications, and text messages. We are writing more and more today than really any generation ever has.
People who say they're bad at writing probably go home and send their mom an email. It's very interesting to me to hear that people are bad at writing since we write every day. Remember, documentation is communication.
Your colleagues are the same people you're writing your documentation to. That's one of the really cool things about working in developer tools is we can empathize very easily with the people reading what we write. Writing documentation for your users is the same as writing an email to a colleague explaining how to use some new thing. There really isn't any difference.
So really anyone can write documentation. It's one of the really key insights here, having a documentation group or a documentation person is a really good way to end up with crappy documentation. However there's a caveat to this which is there's this pattern that I see people use where they say, "Since anyone can write documentation, I know, let's crowdsource it! Let's set up a wiki, things will happen, and we'll have great documentation!"
This doesn't work in open source where you have a highly motivated volunteer community. It really doesn't work in commercial products where users have less of an incentive to help you sell your product. When I see a wiki, it tells me that you don't care about documentation, and if you don't care about me why should I care about you>?
There's no substitute for documentation written by people. The people that write your documentation should be the people that write your code. They should be the people that write your marketing materials. They should be the people that write your support systems, your support emails. Everyone involved in your product is a potential documenter.
There's really no substitute for having documentation be a cohesive and collaborative process spread across your entire group.
So in Django we institutionalized this. In our contribution guidelines, we talk about, as most companies do, that when you fix a bug, you also need to include a test. It's pretty common practice.
We extend this in our community and say that not only does a good patch include a test, but a good patch also includes documentation. So when I go to review a contribution to Django, I'm going to be looking for, "OK, does it fix the bug or solve the problem? Does it include a test? Does it document the new feature?"
By institutionalizing that from the ground up we can ensure that our quality doesn't backslide. We're always adding. We're improving our documentation at the same rate that we're improving our code.
We have a similar mantra at Heroku. Jon Mountjoy who runs our Dev Center team famously has said, "If it's not in Dev Center, it doesn't exist." Features don't count as launched until they're documented, until they're documented clearly and explained. Again, this drives a culture where it's not sufficient just to throw something over the wall. You actually have to be able to explain it.
One really interesting corollary to this is that often times in writing the documentation, you discover ways to make your product better. You'll sit down and you'll start writing how to use something, and think, "Oh man, no one's going to do these 11 steps, this is ridiculous. I wonder if I can make it easier? I wonder if I can make this API better."
The act of writing documentation can actually drive improvement back into the thing that you're writing about.
So what should we document? How should we actually document? I have a very specific way I think about writing technical documentation. It's helped me write a substantially ridiculous amount of it, and I even use it in my email, etc.
Good documentation breaks down, in my mind, into four steps you need to follow when you're documenting anything. As we'll see in a second, this applies to anything from the structure of your entire library of documentation down to an individual paragraph level.An introduction, an explanation, reference, and troubleshooting.
Introduction, explanation, reference, and troubleshooting. Introduction is the quick, new hits. When you're introducing a new feature to someone, they should be able to pick it up very quickly. There's a common rule of thumb that if the user doesn't experience or have a positive experience with a product within about 30 minutes, they're going to set it down and never try it again. So introductions at the macro level need to be very quick and easy. You have to balance that a bit, you don't want to cover up for things that're going to bite them later.
The most important part of an introduction is that you want to give the feel of the thing. You're not necessarily being comprehensive. Introductions at the micro level can be very short, a single sentence. You just want to give the general concept, the feel, and the understanding.
When you move into the explanation, that's where the bulk of the material is. This is where you move from how a thing feels into making sure that your reader actually understands it. Explain in detail. You're looking here to explain the "why" of the topic, not the "what".
The introduction says, "Hey, this is a thing" and the explanation explains why you might be interested in that thing, how that things works, and what's actually going on behind the scenes. Then you need to back that up with reference and the really fine-grain details.
If someone really wants to understand the individual specifics of how a query plan is actually constructed, that's not really relevant to most people writing SQL queries, but in some situations, they're going to want to dive really deep. So you need to back up your higher level explanations with really complete, detailed reference material.
This is the stuff that users will come back to over and over and over again. This is where you can really lose that long-term love. If you've got a great tutorial, great introduction, and then none of the specific APIs are fully documented, people are just going to skip right over and miss it.Troubleshooting.
Troubleshooting is the part of your documentation that tries to pre-emptively help people who are running into trouble. This is the part that people are reading when they are angry, so this is the last place you want to sugar coat the truth.
You want to be very direct about problems. You want to address the common things. Much of the time you'll know, when you're writing about some widget that, if you use it in this particular way, it's going to break. Putting that right up there where someone's going to see it is a good place.
FAQ's are really tough because that's usually what people reach to when they think about troubleshooting. But the problem most of the time is that FAQ's are more accurately 'questions I wish people would ask', 'questions that I anticipate people will ask', or 'questions that I feel like answering'. FAQ's are fine as long as they're actually frequently asked.
We're super guilty of this, Django's FAQ is not great. Most of the questions there are literally questions we anticipated 10 years ago and haven't updated since. Do what I say not what I do.
My big revelation about how this structure works that really helped me figure out how to do this well is the realization that this structure works at every level, from the most macro to the most micro. I'll walk you through some specific examples to show you what I mean.
Remember, we're talking about introduction, explanation, reference, and troubleshooting.
Django's documentation has an introduction, introductory material, it has an explanation, and various different topic guides. Each topic has some associated reference, and there's a bunch of in-case-you're-having-trouble troubleshooting material. Pretty easy to see at the macro level, but let's zoom in and let's look at one individual document.
At the document level, this structure works, too. You start with a little overview of the topic, so this document is going to explain middleware. You give details. You give explanation. You give usage. You provide detailed APIs, cross-references, and where to go next.
You've learned the basics, now here's the next step and any sort of common problems, notes, warnings, etc. So here's a specific document, again from Django. We start with an introduction, this is a few paragraphs, it talks about the topic in a high-level detail.
Then the individual sections of this document have explanations of how to do specific things. How to turn on and off middleware, how to write middleware, how to deal with certain types of responses, and individual explanations of common scenarios.
Then there's reference material. Each individual function, each API exposed by this larger API, gets its own individual bit of reference.
Then, finally, at the end there's some guidelines pieced together from common problems that people run into, troubleshooting at the end of the document.
We can start with an overview of what we're going to talk about in this part of the document. We're going to give some common tasks, some example code. Again, references, detailed APIs, cross-references, and we can have warnings, pitfalls, etc.
Here's a section from that document, this is on turning on and off middleware. We start with a brief introduction. If you sort of know what's going on and you just need a reminder, this one sentence will give you everything you need.
Then there's a detailed explanation. This is how you do it. This is the default. This is how you change things. Reference, in this case, is scattered about. It's cross-referenced to some detailed reference documentation so every single function, class, and method is a link to another document that explains what that is in detail.
Then there's this little section at the end. There's a common problem with this API that people expose to a lot. This list is in order and the order matters. That bites people a lot, so rather than putting that somewhere else or leaving it out because that doesn't make the thing look good; no, you just drop it right in there. This is what could go wrong and this is how to fix it.
You can zoom even farther and look at an individual paragraph or couple paragraphs, one specific element. Again, you can give the basic usage, you explain the details of that usage, cross-references, return types, arguments, defaults, and, "If this didn't work, try..."
Here is again, from that same document, one individual function. This is the process view function, so we start with the overall call signature. It's very quick, very high-level, and gives the overview of what we're doing. Explanation, what does this method do? When is it called? How is it called?
Reference, again, is in the form of links and some specific details about what the arguments are and when they get passed. This particular thing has a note about some specific situations, again, troubleshooting guides.
So documentation is fractal at every single level, no matter whether you're sitting to structure an entirely new library or whether you're just trying to write or explain one particular API method.
At every level, you can use this format. Introduction, explanation, reference, and troubleshooting, to help guide what you write.
If this bears a bit of familiarity to the five paragraph essay structure that you learned in high school, that might not be entirely a mistake. This is a little more specific to technical documentation, but it really gives you a way of structuring and thinking about what you write. Structure lets you focus on the content and not think about where to start, how to start, or what to do.
Why do people read documentation? Everybody reads documentation, and they do so for different reasons. We have to document at different levels, with different detail, different clarity, and different audiences in mind.
Who writes documentation? Developers write the best documentation. The same staff that's building the feature should be the ones explaining it to users. That's what leads to the absolute best quality, easiest to use, and best to read.
The best documentation is fractal. It lets you dive in at any level and see the same features at additional detail the more you zoom in.
So go out there, improve your documentation, and thank you very much.
Heroku does starter projects where we have candidates in for a day or two, sometimes three days, to do some simulation of what it would be like to work at Heroku. She was asking about whether we include documentation as part of that project.
You know what, it really depends. There really isn't a one-size-fits-all approach to the "starter project." I would say, for many things, probably not because most of those are prototypes planned to be thrown away a little earlier. Then it's hard to document something that doesn't exist yet.
That said, one of the most important parts of a starter project is presenting the results to your potential colleagues, whether that's in stand-up fashion or in a written form. So certainly, written communication is a critical part of that.
If you can come in and you can do something really amazing technically, but can't explain to everyone else on your potential team what you've just done, the odds of getting a hire are... What I mean is that it had better be some pretty amazing technical work for people to want to hire you if they don't understand what you just did.
Is documentation itself a project of its own, or do you try to spread that amongst a bunch of different projects?
I think your case is super complicated because of the open source angle. You can't necessarily go out to a bunch of different projects and say, "Here's how you document. Do it and let me know when you're done!" It doesn't work that way, so that's tough.
We face this problem a lot with Django's docs in that other tools that we wanted people to use with Django weren't particularly documented up to our standards. We would usually just do it ourselves. We would reproduce or we'd re-explain how to use this common tool that we're pushing people to use. Sometimes you can work to push that upstream but sometimes the values don't really match there.
If I'm trying to produce the polished product that I want people to use, I think there's a value to having that all in one place. There's something that's inherently frustrating about, "I'm not going to tell you how to do that. Go see over there."
It's like the DMV, you know, that's the third floor, window seven. If you can provide a coherent experience by syndicating, that's probably easier. I think you have to think about that experience in order to get the best stuff, the best results.
If I had direct control over those subrepositories, I would certainly want to standardize on the same sets of tools. I deliberately didn't talk about tools because I think for the most part, tools don't matter that much. We make too much over the specific tool choice.
Certainly if you're trying to coalesce a lot of different things, then that's where tools do matter. Not in that any specific choice is better than another, but in that you kinda want to make sure you're using the same thing so that people can collaborate across those boundaries. It's always those boundaries between multiple projects, especially in the open source world, that get really frustrating.
You can find documentation on Django and documentation on Postgres, but using both of them together where they intersect is where no one really thinks is their responsibility. That's where collaboration is really important.
Your question is about having a dedicated technical writer, a technical documentation team. Are you asking do I think that that's good or bad or how do I see the interplay?
So, I do think that having a dedicated staff or a technical writing staff, and having that be their job is an anti-pattern. I don't have a lot of data to back this up, mostly anecdotal. My experience has been that that is a very tense relationship.
There's a reason we have this idea of Dev Ops. As companies settle into this pattern of, for example, developers fighting operations, they always point fingers and blame the other. I've seen the exact same thing happen around documentation.
That said, Heroku does have a documentation team, Jon. I put his picture up there. He runs the Dev Center team, who's responsible at a holistic level for our documentation. I think the model that seems to work the best is certainly it's appropriate to have people who are accountable for the quality of the documentation and for the tooling, and for copy editing, tone, consistency, and help. But expecting that team to do all the work themselves I think is a fool's errand.
I see that very similar to my team's role and security's role within product development. It's not every time someone wants to check a password we have to come in and write the code. We just set some standards, give advice, review, and we help. But it's not like we're actually writing all of the security code, if that is even a thing within the company.
I think documentation is a similar function where it needs to be somebody's responsibility. Ultimately someone needs to be accountable for it, but that doesn't mean that you can just throw it at them and say, "Here, I built a thing. Document it."
Forum interaction is probably documentation. The problem is that it's not very long-lived. It's funny, the main reason I write documentation is so that I don't have to have the same email threads over and over again. I think that's probably true within companies as well. If you keep seeing the same thing come up from your support team, it's probably better to write it down somewhere rather than making people write the same thing over and over again.
I guess forums drive interest in documentation. It certainly helps you figure out what people are frustrated by.
I assume you're talking about when you're the one writing it. But how do you know if it's good? Get someone who doesn't know the thing, sit them down in front of the computer, and have them use it. Then stand back and say nothing. Don't touch their computer, just watch. It is one of the most excruciating experiences that you'll ever have, and it's incredibly valuable.
Do that with just a couple of people and you'll learn so much. I was just at a workshop called Django Girls in Amsterdam, and I watched about 40 women paired with a mentor trying to learn Django. And Heroku, actually, it was the trifecta for me.
I learned a lot. I sent a lot of emails after that workshop. It's excruciating watching people try to work through your documentation, but you will learn an amazing amount from it.
The question was about keeping documentation up to date. It's definitely very hard to do. That was the original goal behind enshrining documentation into Django's contribution guide from day one. Adrian and I decided to open source Django in about March. Between March and July, we wrote mostly documentation. We were pretty happy with what we came up with, but we knew if we just dumped it over the wall, sooner or later it was going to get out of date pretty quickly and neither of us really wanted to sign on for an indefinite time period of maintaining changes to that.
So we baked in to contributions and into the culture this idea that your work isn't complete until it's been documented. I think that the only way you can keep it up to date is if you make sure that it moves in lock-step. I don't think that's that radical a suggestion.
We fought this fight around testing. There was always this question, "How do we know the tests stay up to date?" If your testers write a bunch of tests and then developers change the APIs, all your tests break, and so we answered that by continuous integration. We test every time. We change our tests in lock-step with our code and I would argue you can easily do the same thing with documentation.
There are the three steps to every change. Do it, test it, write about it.
What do I think about document first approach? I use it on a lot of things, especially API design. I'll often write the documentation for the thing before I build it. It really helps you shake out all the questions and usability.
Sometimes it's not appropriate. I'm not dogmatic about it, but it certainly can help a lot.
For non-developers working on documentation, do we require them to learn Markdown? Well, Django's documentation isn't written in Markdown, it's written in something even wonkier called reStructuredText.
At Heroku, it's not much of a problem since basically everyone at Heroku has some development background. Our product people write SQL, our marketers write web apps, our support staff writes and maintains their own ticketing system, so it's not really a problem for us at Heroku.
For Django, we would usually tell people go ahead and document it. If you get it wrong, we'll help you with the markup. We have a lot of contributors who don't speak English as a first language, and so for them, it's not the format, it's actually the language. Then we'll say the same thing, "Do it as best you can and we'll help you with the editing and the language." I think that's a similar role to having an owner of the documentation. You can help with the nuts and bolts and stuff.
What do I see as the pros and cons between using a CMS for documentation versus having it in a repository using Markdown or reStructuredText?
My personal preference is more towards the repository, because I'm a developer and I want my version control. I want my diffs and I want to use my text editor. There are advantages to both. I think if you buy into this idea that it's the developers who are writing the documentation, it makes the most sense to use the same tools to have that writing right in line with your code.
If you buy into that, it probably tends to lean a little bit more towards treating language similarly to the way you treat code. There are certainly advantages, especially if you're dealing with a much larger potential set of contributors to lower that barrier to entry to make it like a click-to-edit.
I've been dreaming for years about something that will meld those two. GitHub's click-to-edit stuff is getting pretty close to a world where you can have both going on. I think that would be the ideal situation, but I think there are arguments to be made each way. That was a pretty wishy-washy answer.
Images, tables, charts, all that stuff, those are things that are super overlooked and often really make a difference, they make your documentation look great and professional. A good diagram used correctly can explain something in a way a ton of text can't. It poses all sorts of weird technical challenges.
I remember the first cut of Django's documentation had a bunch of diagrams drawn in OmniGraffle, a drawing tool for Mac, and then we got a bunch of contributors who didn't have Macs, and they said, "How do I even..." It tends to be neglected in developer documentation because of the tooling problem, but I think that it's pretty valuable and we spend more time on it.
That 'How It Works' stuff is a really interesting thing because there's, especially when you're dealing with developer products, a blurry line between marketing and developer documentation. That's not, "really" documentation in that it's maintained by a different team, but is that really a differentiation that actually matters?
I think that line can blur a lot. To have people at your company who can help you with that visual user experience aspect of documentation is really valuable. It's certainly something that Django's docs could and would have benefited from.
How do you deal with that abstraction? I gave the example of the middleware documentation in Django, which is pretty funny because the thing that we call "middleware" is not, by the common industry definition of the term, actually that thing. I didn't really know what middleware was when I wrote that API and I pulled the word out because it sounded cool and it wasn't actually what I built.
We make up words all the time when we're dealing with technical products. In the case of the dyno, containers were basically nonexistent in developer parlance at the time. I suspect that they just had to make something up to call them. The key to dealing with made up words or words that you're using incorrectly or what have you, is to be very consistent about their use and to really make sure that you understand them clearly. That way people who read your documentation will eventually inculturate that term and learn it from context.
The other thing that really helps is linking it wherever possible. Almost everywhere in our documentation, the word "dyno" will link to a page about what a dyno is, and then goes into some detail about it. So when you're using those made up words or special terminology, reinforcing what it means continuously with links, with asides, with parentheticals, is important.
You have to do it so often it seems super redundant to you writing the documentation, but remember that your users are going to be diving in from all sorts of different directions, and you have to be able to serve them wherever they come in.