1. Library
  2. Arrow Icon
  3. DX and Open Sourcing at Netflix
  • Product
MAR 3, 2016 - 58 MIN

DX and Open Sourcing at Netflix


In this Speaker Series presentation, former cloud architect for Netflix and now Battery Ventures technology fellow Adrian Cockcroft discusses the pains of early adoption, the decision to open source, and the evolution of scalable cloud architecture.

  • Introduction
  • DX and Open Sourcing Netflix
    • Serverless Computing
    • Open-Source Program at Netflix
    • Developing Netflix as a Tech Brand
    • Cloud Prize
    • Microservices at Netflix
    • Meetups and Recruitment
  • Q&A
    • On Being Too Far Ahead
    • Succesful Open-Source Trends
    • Patterns of Failure
    • How Substantial is Serverless Computing?
    • Transitioning from Netflix to VC


I decided I wasn't sure what you would want to hear exactly, so I don't have lots and lots of slides. Although there are, in fact, very many slides if you find my SlideShare account. In fact, I have two SlideShare accounts. I have my Netflix one, which contains very many slides about Netflix and the Netflix cloud architecture, and then another one about Battery Ventures and what I've been doing there.

So, I can start off my explaining my slightly strange job that I have. What do I do? I work for a venture capital firm, but I'm not one of the people that writes checks. I'm a technology advisor. I do due diligence on deals. Fairly obviously, I'm the one in the audience that's asking awkward questions like, "Does it scale?" and "What language did you write it in?" rather than "How many sales reps do you have?" and "What's your ARR?" and all of those cool things that everyone else asks, though I do sometimes ask those questions, too.

And then for our portfolio companies, I provide advice, help, consultant to the CTO for some of them. And then I spend a lot of time networking with interesting people, which is all of you tonight, by definition, because you're here. And I'm looking for people that are starting companies, people that have interesting ideas or viewpoints, people that want to use new technologies, and generally build a network that we can draw on at Battery when we have an idea or a question or somebody puts up one of those logo slides in a pitch, and it has somebody's logo on it, and we go, "We've heard of them. We know somebody there."

We can call them and get the backstory of what's really going on. So, that thing. I spend some time tinkering with technologies. I learnt Go a year or so ago, and I've been writing. I have an open-source project that I've been tinkering around with, which is an actor-based network written in Go that simulates microservices and tries to visualize things. I use that mostly to keeps my hand dirty in some code and play around with a few things and also to try out a few ideas for things that haven't quite happened yet but I think other people should be playing with.

I spend a lot of time at conferences. I'm running some conferences. And then I do lots of presentations at companies. I did an internal event at NetApp yesterday. I did presentations at quite a lot of large banks and big companies that are big end-users, and some of the large Bay Area companies too.

So, that's it. And then also I have a relationship with most of the big cloud vendors at some level where I can give them ideas or try to figure out what they're doing that's interesting. So, that's it. And slideshare.net/adriancockcroft is the main place where you'll find way too many slides that you could even bother getting to read.

This is my main slide for the evening. I'm just going to talk through a few things that I think are interesting, and we can go to Q&A relatively early and talk about anything you want to, or drill in on any of these subjects.

DX and Open Sourcing Netflix

Just to briefly do the history of what happened at Netflix, is that around about 2009 it looked like we could probably get what we needed running on AWS. 2010 was the year we moved the entire front end onto AWS. When we started we had about 30 or so front-end web servers, web app servers, basically monster Java engines that spat out web pages and everything else. And there was a few back-end.

It was about 100 machines in 2007 when I joined Netflix and had increased a little bit by then. I think Netflix is now about 30,000 machines or something, probably about 100,000 cores, something in that range. 2010, we moved the front end. 2011, we figured out that it looked like Cassandra was a decent bet for our back end, so 2011, we basically migrated the systems of record, the truth stuff out of Oracle and into Cassandra.

I was going out and talking to people about this, and they were mostly baffled at that point and saying they thought we were crazy. In 2012, we got to the point where we started open sourcing bits and pieces of this, and the questions went from, "What are you doing?" to "How does that open-source project work?" and "When are you releasing more bits and pieces?"

During 2013, I ran a project. Well, I ran a contest. I ran an open-source contest where we tried to give out prizes to people for contributing to our open-source program. I could go into that in a bit more detail. It worked. I don't think it quite worked as well as we wanted it to, but it worked well enough.

Probably the most notable outcome of that is that 10 minutes into Werner Vogels' keynote speech at re:Invent with 9,000 people in the room. I'm onstage handing out prizes to 10 people that won our contest. It's interesting standing in front of 9,000 people when you're not used to doing quite that number of people.

That worked out reasonably well, but at the end of 2013, I saw an opportunity to spend more time on a larger scope and move to the VC world. So I moved to Battery Ventures at that point and tried to take a lot of these ideas up to the enterprise and the enterprise transformation that's been going on.

Somewhere along the way, we discovered that what we were doing as part of what we were doing was called DevOps. We figured that out. And a bit later, we discovered some of it was called microservices too, but it was more that those labels got applied later on. We didn't invent the labels. They were just other people jointly inventing similar things.

The situation now is a little bit different for cloud and a lot of these technologies. We were pathfinding in 2009, 2010. We were inventing stuff from scratch, reading papers, trying to write things that didn't exist. Now the problem is there's 10 different ways to do it, and you're rummaging around on GitHub trying to decide which project actually has some followers and might still be there in six months, or you might have to keep hopping from project to project. We've moved from famine to feast in some sense, and part of the problem is deciding which container scheduler you want, and there's a new one every week and whatever.

The problems have moved around a little bit, but that's quite fun.

Serverless Computing

I'll go back a little bit and talk about what we did with the open sourcing. Serverless computing is a relatively new topic. I think it's something that has been on my list of things that are happening in 2016 which didn't really have a name as such last year. A little over a year ago Amazon came out with AWS Lambda.

Most people were baffled to start with. Then a few people started figuring out what you could do with it. Amazon started extending it, and they've added features so they can emit events from just about every piece of AWS now that you can process using these Lambda functions.

That was interesting, and I started seeing people building things out of it. The people that were building things out of it were coming back with some interesting statistics like, "Well, I took this thing that was costing me $100 a month in AWS fees, and it's now costing me less than $1 a month, and it works better."

I told that to a few people after hearing that story, and somebody else said, "Yeah, my thing went from a few hundred dollars to a few dollars a month, too." And then somebody else said, "Well, I'm feeling left out. I only saved 80% of my money instead of 99%, so I obviously was doing something wrong."

There's some interesting about having everything on demand and metered by the hundred millisecond rather than trying to order scale machines up and down or just leave them on all the time.

I'm seeing a few things bubble up on the end user side. I've seen some frameworks around it. There's a company called Serverless. There's a few other frameworks around and similar things. And then in the last few weeks, we saw Google come out with Cloud Functions, or whatever they call it, but basically it's roughly the same thing. And then yesterday, IBM came out with their own version of this, and now everyone's saying, "Well, Microsoft is apparently working on it, too, but haven't released anything yet."

We've gone from this being an obscure corner, like one of those many, many things on Amazon's list that people are poking at and saying, "This looks like if I used it, it would be proprietary, and I'd get locked into it."

Not everyone's doing it, so we just need to abstract up a little bit level, and we've got some portable APIs. There's something happening there. It's still pretty raw. There's a lot of tooling needed. Let's say, if you have a normal system, and then you hit some of these Lambda functions, there's no monitoring.

You're not getting updates once a minute from this thing that only exists for three seconds.

Most monitoring tools go, "I have no idea what happened in there. I'm not looking at it. I can't see it." There's a bunch of problems with monitoring and just how you architect it, getting people's heads around it, so I think that's an interesting new area.

Whenever you see something go to one extreme, you should look at the other extreme to see if it looks interesting.


If you're interested in microservices and keeping everything really small and one page of Node.js code being your service, at the other end there's teraservices which is, well, it's easy to get terabytes of memory, or it will be later this year, and what is that going to do?

It hasn't quite happened yet, but it's getting easier and easier to get multiple terabytes of memory in a machine.

In particular, AWS is coming out with the X1 instance type. They announced it, but they haven't shipped it yet. And they say more than two terabytes available by the hour, so you'll be able to try something out on a machine with two terabytes in it. It might take a little while to boot and get your data into it, so you might need it for more than an hour, but remains to be seen.

But that's a pretty hefty machine. It's not just got lots of memory. They said roughly 100 cores as well, so it's a four-socket monster machine available by the hour, and I think that that is going to give people the tools they need to try and build things that operate that make sense of that.

It's very hard. If you actually wanted to create an open-source project that plays around with terabytes of memory, right now you'd have to go find somebody with a spare that will give you access to a machine that costs a few hundred thousand dollars, whereas now, it's probably about the cost of lunch to go do that.

Those are things that I think are interesting. If you're looking on premises, there's a few techniques now for putting terabytes of memory in cheap machines by using flash DIMMs, high-density flash DIMMs, and that's one of the companies that I've been working with on our portfolio.That's the stuff that I've been playing around with that I think is interesting.

Open-Source Program at Netflix

I'm dealing a little bit more on the open-source program. I'll tell you how that happened at Netflix, and then we can go to some questions.

The basis for this was that Netflix started off using open source, but that was the first thing. And then we thought, well, we'd like to contribute some fixes back to the community, so we need to understand how to do that. Ultimately what happened was we got our lawyers to talk to the Apache Foundation lawyers, and Netflix signed a contributor agreement at a company-wide level that basically went to all of the engineers and said, "If it's Apache-licensed, you can use it. You can contribute fixes to it. You're covered, that's easy. If it's another license, come and talk to us, and we'll talk to the lawyers and figure it out."

Almost everything we needed was Apache-licensed. That was the first phase, and then the project, like the training-wheels project for learning how to do this, was really Cassandra. We started using Cassandra. Cassandra's written in Java. We had a building full of Java engineers at that time, a few hundred engineers, and the team that was trying to evaluate what our next-generation, cloud-based storage database was going to be.

We liked the architecture of Cassandra and React, but we had no Erlang engineers, and React is written in Erlang. That's probably the main reason we went, "Let's see if we can make this one work, because we can actually get our heads inside it and figure it out." I think we had half an Erlang engineer. He claimed he knew Erlang, but I think Ben Black said, "No, he doesn't know Erlang" once, when he met him.

You've got this environment where there's a body of code that was really designed to run in data centers. Cassandra, at that time, was a bit data-center focused, and there were a few things we needed to do to it. There were features we needed, but one of the features was it had a very crude, "yeah, you can kind of run it on AWS, but it doesn't really work right" model. You couldn't replace nodes and a bunch of things like that because of those things. Anyway, so we went and we started fixing things.

We started working with data stacks, and we started contributing fixes back through data stacks. Along the way, we ended up with two Cassandra committers, top-level Apache project committers, both of them who now work at Apple. But for a while, we were contributing fixes back in, and we owned the evolution of parts of Cassandra.

I think a really good way to look at it is we learnt how to be a partner in a group of people that were collaboratively building a thing.

Data stacks obviously were contributing a lot of the fixes and project management, but there were people from Twitter and Apple and Netflix and lots of other companies putting things in. That was the first point, and we got used to it and we understood what we were doing.

Developing Netflix as a Tech Brand

The next phase was, and there's actually some slides somewhere we published with the original email sequence, which I'll try and remember roughly what it said. Jordan Zimmerman sent an email saying, "I'd like to open source this library which talks to ZooKeeper, but it's not really part of anything else. It's really its own thing. So, how do we do that, and should I go talk to the lawyers about it?" The response from my manager was, "If you think the lawyers are going to improve your code quality, go ahead. But otherwise, yeah, just do what you think's right."

He basically created the Netflix GitHub account and set things up. We formed a group to discuss what to do. But he'd open sourced stuff before, an experienced developer, knew what he was doing. So we just trusted that he knew what he was doing, and then we all followed and copied and learned from that.

We formed a little team, a biweekly hour-to-two-hour meeting of anybody that cared about open source. About this number usually attend, 20, 30 people, as the team grew. And we discussed what we were going to open source. I got involved at that point by saying:

"Well, we've got a platform. We've got a cloud platform here that we've built, and if we released all the pieces of it, we would be releasing a useful thing that is much bigger than just one or two bits and pieces."

Most of my contribution was, effectively, you can look at me as the product manager for Netflix OSS. Actually, the word "NetflixOSS" was one of my contributions, because we couldn't spell most of the names that Netflix engineering came up with for the names of their projects.

We started with Cassandra, and then there was Astyanax and Archaius and a bunch of other things, Priam, lots of Greek mythology. And then the team that liked Norse mythology came in, so we had a bunch of Nordic names, and then other random names appeared.

At this point there was some buy-in to there being a conscious effort to create an open-source program, and at roughly the same time, I'd started doing external presentations about the architecture. So I knew there was demand, people asking, "How did you do that, all of these different pieces you've got?"

We actually set up a conscious program to create a technology brand around Netflix, so this became, you know, you can do this accidentally. I mean, your main product is people watching movies or whatever, and then there's a technology brand that certainly geeky people tend to associate with Netflix. Maybe the people that are sitting there watching TV don't, but there's some halo effect between them.

We actually set out to deliberately do this, so I worked with our PR team. I went to the chief product officer, Neil Hunt, and I said, "By the way, you know we're doing this, right? We're creating a technology brand for Netflix. We're going to open source a pile of stuff, and it's going to be a thing, right? And it's likely to have at least some presence."And he said, "Yeah, that sounds good. I'm on board with that."

Part of this, early on, was to get management buy-in and not just do this as a sneaky thing that crept around the side. Then the next step was to try and figure out, well, there's lots and lots of projects. There's lots of code. How do we release different pieces of it? There's these foundational pieces that we need to be out there, because these next pieces build on that.

There's layers and layers and layers of platform. There's a few standalone projects, but you think of it as layers of code, and we needed to get out some of the most basic stuff, so we started off in a few places around Cassandra, because we knew we had a community that was there.

The Cassandra client library for Java, for a long time, was Astyanax. There was a Cassandra client library called Hector that didn't work properly, and that Hector was Cassandra's boyfriend or something in Greek mythology, and Astyanax was Hector's son, I think. So it was the son of Hector where that name came from. Then we released Priam, who was Cassandra's father, which is the think code that monitors Cassandra and makes sure it does backups and scales it and does all that stuff.

We started off in the storage areas and built out from there and started building out more and more pieces. Sometime in, I guess it was in 2012, early in 2012, we released Asgard, which is a better version of the AWS console that we used. So, various opinionated ways to release code and build things.

The day after we released it, somebody from the, what's his name? I've forgotten his name. But somebody from the Obama for America campaign was looking for something like that and noticed it. So, really the day after we put that thing out, he grabbed it and started using it, and a bit later on, the engineer that had put out this open-source code realized that the Obama campaign was depending upon his open-source project and slightly freaked out. Probably a bit more than slightly freaked out.

That was an unintended consequence of releasing stuff, that it could end up being used by something that might matter. So be careful. Think about it.

There were a number of places that started figuring out how to use this, and we released more and more pieces. There are actually over 100 projects now on the GitHub account, and they've started retiring a few of the ones that have got a bit old and aren't being supported. So they're trying to get a little bit more of a life cycle, and they've built some tooling for managing your open-source contributions.

One of the most recent projects is something that Andrew Spyker has created, and I forgot what he called it, but they use it to monitor and manage. It's a dashboard for all your open-source contributions, and it keeps track of whether people are keeping on top of the pull requests and all that stuff, so you can go and see how things work.

Facebook's built a similar tool, and again, they've got a large-scale, open-source program. So, that was coming along quite well, and I started doing more talks about this, the open-source program. This is the pieces, and this is how they fit together, and this is the stupid names and what they actually mean.

Cloud Prize

For 2013, I was thinking, how do you turbo charge this? How can I push this to the next level? I came up with the idea of having a prize, and part of this was that Netflix had a prize quite a few years before, on algorithm research for recommendation systems, and anyone in the whole AI-recommender world knows about the Netflix Algorithm Prize, the $1 million prize, and it actually changed the direction of AI research globally.

The tentacles of that prize spread out very deeply, and it was almost pure luck. We managed to set something which was easy enough for people to play with, then we set a goal of making it 10% better. Everyone immediately got to 8.5% better and then got stuck, and we arbitrarily chose 10%. We could've given the $1 million prize the next week, or we could've never given it, but it took them three or four years to actually scrape all the way past 9.9% until a bunch of teams got together, and finally we gave out that prize. The algorithms were used internally.

Even before the prize was handed out, we figured out what the main algorithms were, and it did make Netflix better.

So, based on the idea that Netflix, as a brand in the techie community, already had a prize that had been successful, I thought maybe I can kind of steal that idea and have an open-source prize. And the way the prize was structured was interesting.

There's a few things that might not be obvious. One is that you entered the prize by forking a GitHub account. If you go to Netflix GitHub site and you have Cloud Prize, that GitHub account includes the entire legal rules for the prize, which is all Apache-licensed. So if you need the rules for a global prize contest, they are Apache-licensed on GitHub. You can fork it. In fact, Canonical did fork it and ran their own contest using a fork of the Netflix Cloud Prize, so just to put that out there.

A very good lawyer worked with me to create this prize, all of the legal stuff you need to say who has to do what, and the stuff you need around doing a global prize. Global prizes are really annoying,because there's some rules in some countries where you can't do certain things. If somebody in Montreal had won the prize, we would've been in trouble because of French Canadians or something;there's some weird rules about Canadians and prizes.

You can skirt around that and assume that that's probably not going to be a problem, that there are definitely some problems with trying to do anything on a global basis. So we put the prize out, and what we ended up with was trying to figure out something that we thought would motivate people. Have 10 different prizes in different categories for different contributions to different areas like best new collab monkey, best contribution to testing, and best contributions to other, different areas. I can't remember all of them.

I wanted to spread it fairly wide. And then I think we had, I forget, it was $5,000 or $10,000 was the prize money per category, and we called up AWS and twisted their arms. They agreed to announce the prizes at re:Invent, so that set the time span. This was a limited-time prize. We made it from March to September, which gave us time to figure out who won by November. We could announce it at re:Invent, so it was a limited thing.

Then we got Amazon to cough up some Amazon credits. I think there was a $5,000 Amazon credit or something for each of the winners as well. And then we made trophies, and the most fun thing was I have a little leapy robot toy thing that I got from Bleep Labs.

Dr. Bleep makes little anthropomorphic robot-y, beepy toys, and he had actually made some trophies for South by Southwest a few years before. I called him up, and he made little cloud monkey robot toys, and when you push a button, they beep and they flash and they're fun. We ended up spending a few thousand dollars making those. Actually, people are way more happy about the cloud robot than the prize money, it turns out.

So, what went right, what went wrong? It launched really well. We had a write-up in Gigaom and things like that. We got some contributions, but we didn't get the contributions I thought we'd get, and part of this was that it turns out that the Netflix OSS at that point was a little bit too early. It was a bit too hard to use.

The onboarding for this open-source platform was, first read all of these Netflix tech blogs. Eventually, I coined the term "technical indigestion" to describe this process, because you can't consume these things. There's too many of them. There's too much information there. And then download these 15 projects and try and assemble something in order to try your one thing in the middle of it.

Microservices at Netflix

I'm going to jump forward. One of the prize winners was Andrew Spyker of IBM, who subsequently won the grand prize, that he now actually works at Netflix. This is part of the strategy: we'll identify the interesting people, and we might end up hiring some of them. So we ended up hiring two people from IBM as a result of running the Cloud Prize.

And Andrew created a thing called Zero to Docker, which is a Docker-ized implementation of the Netflix toolset, that you can start with everything shrunk down to be as small as possible so you could plausibly start it on a laptop. It really has a reasonable amount of memory. If you've got a 16-gig laptop, you can definitely start this stuff.

We didn't have that on day one, and that was part of the problem. We didn't have the base AmI for a bunch of embarrassing internal reasons to do with the fact we were running on incredibly hacked-up kernel that no one wanted to admit or release or build anything on, and we were trying to transition to Ubuntu and we hadn't done that for a long time.

There was a bunch of internal stuff, which meant that one of the key pieces we really need to make this easy to use wasn't available. But once we got all the pieces there, it's actually pretty easy to use. And the current state is really that Pivotal at this time started accumulating, getting interested in it, and Spring Cloud, the Spring Cloud team in particular, figured out how to use this.

Then some people at Netflix started using Spring Cloud, and now there's really joint development going on between Pivotal and Netflix. Basically, if you're doing microservices in Java, you should probably be using Spring Cloud because it's probably the most mature environment there for doing this, and you automatically get a lot of Netflix stuff.

And then you can get Zipkin in there, and there's a whole bunch of other pieces that just come in because of the way the Spring Cloud stuff all bundles together. I don't find the code particularly readable to have at-thingy directives all over it, but if you're in the Java community and that's what you're used to building, then that's probably the best current microservices framework for Java itself.

We've now got quite a lot more pull. It's got to the point that I thought it would get to a couple of years ago, so one of the lessons from this is it's hard to time things. When is the inflection point where something is going to take off?

Marten Mickos has a great rule of thumb: if you have an open-source project, if you can download it, figure out what it does, and do something useful in 15 minutes, it will go viral.

All right, that was his rule of thumb for MySQL. It pretty much describes why Docker took off so fast, and it explains why Netflix OSS took a long time to take off, because it was weeks of work to try and figure out how to build these things. On top of that, there's a few big end-users of Netflix OSS running right now. One of them is Nike, and the back end for a lot of the Internet of things running around, as I call it, Nike FuelBands and things like that.

They have a problem which is a large-scale, global problem with large amounts of traffic, lots of people, and it looks, if you squint at it, it looks pretty much like the Netflix API problem. Netflix is a global API for managing a large number of customers and all the things they're doing. So, that made sense to them, and they, with the help of lots of consultants and lots of work and all the projects that we provided, they ended up finding enough glue to build out a clone of Netflix OSS at scale.

The other big one was the IBM crew that I gave the Cloud Prize to, and they used that for building the Watson services. The first thing they did was build a toy application on AWS using this. Then they ported it to SoftLayer to understand how that looked, and then they put some of the Watson services.

If you're actually using IBM Watson, I don't know, at least a few years ago, a whole bunch of those services were built using the Netflix OSS architecture to stand it up. And from IBM's point of view, they were trying to learn microservices and get their heads around what they were doing. Of course, when we did meetups and all these people from IBM turned up, it was like, "What? Why are they here?" They were the last people we expected to turn up at our meetups.

Meetups and Recruitment

When I'm talking about meetups, we had a very good series of meetups. They basically labeled them like they were episodes, so they're "series one, episode one; series one, episode two; and the next year there was series two, episode one; two, three, things like that.

Quite often 200 or 300 people would come. We'd staff it with all our Netflix recruiters who would all be there. And it's a recruiting event without being too blatant and trying not to proposition people too hard in the moment. But definitely providing an opportunity for people that are interested in getting hired or figuring out what Netflix wanted to do.

I'm often asked, "What's the justification for the Netflix OSS program?" And there are really four things. The first obvious one was we wanted to recruit the best open-source developers we could find, and that worked. We started being able to recruit people that wouldn't otherwise have thought about Netflix, because we were obviously a safe place.

If you already had an open-source project, you could come to Netflix, and we would let you work on it. The Netflix employment contract was actually explicitly written with an open-source clause to say, "Anything you do in your own time or anything you did before you joined Netflix remains yours." Netflix only takes responsibility for the things that are on the official Netflix account that we've agreed is part of Netflix Open Source. Making that very clear makes it easy for people to join and leave.

There are some companies that have very restrictive practices on what they let people do in their spare time and whatever. That helped us recruit some very senior open-source developers. The other thing was breeding a team of people who were running open-source projects tends to attract more people like that. It's just the example. You just get to know that.

Then the other thing which was a little subtle: the way Netflix does compensation is it does mark to market, which means they look at what it costs to hire a particular type of engineer, and they take all the people they've already got and just mark them up to that amount once a year. So that's the pay system insight.

In some senses, if you work for a big company, good luck getting that past HR, but what that does is it means that engineers are incented to learn things that make them more valuable. Netflix starts breeding cloud engineers, so they're valuable, but they're invisible. And then they start breeding cloud engineers with visible, open-source projects next to their name, and then we send them to conferences.

Now they become very visible, and they're very marketable, and their market value goes up a lot, and we pay them a large pay rise. Because in order to retain them, that's what we would've paid to hire somebody like that. After a few people did that, people internally realized, "Yeah, I have an incentive now to work evenings and weekends to keep my open-source project running because it's actually going to contribute to my bottom line. Because I'm now a more valuable, more visible engineer."

The whole phenomena of GitHub becoming your resume, for a developer, starts to play out a little bit.

Some of the people we hired, we just didn't need. You read their GitHub account and go, "Yeah, it's obvious they've done all this stuff." You don't have to give them a whiteboard and ask them to code some stupid algorithm. They've obviously done what they need to do.

So open source in the recruiting space helps. We were looking once for some test engineers, so I said, "Well, we open sourced a piece of test code." It was a plugin for one of the Java stress-test environments. I forget which one. And so, we went and looked. "Oh, it's been forked by about 100 different people. Who are they?"

There's about five or six people here that we'll just tell the recruiters to go find who they are and contact them and see if they want a job because they've shown enough interest. It becomes a very powerful way of recruiting, retaining and rewarding people, so it's a lot of powerful feedback loops. That's one of the reasons we did it.

Obviously, another reason was just the corporate feeling of giving back to the community. We felt that it was important that we were consuming a lot of software that we weren't effectively paying for that we wanted to contribute back to the community for it. There's another, more subtle thing which is that if you're out early, and you're pioneering, and you're building something, and everyone else comes along later, they tend to build something different and go off in a different direction. Then you end up in a dead end, and at some point you have to back out of that technology you've dead-ended yourself on and join the mainstream.

This is a constant problem with being a pioneer. You can get stuck in a dead end.

What we wanted to do was validate what we were doing by putting it out there and making sure we had people following us. If enough people follow you into a dead end, it isn't a dead end. It just becomes a branch that keeps going. There's a conscious plan to create a big team of fast followers: the Nike, the IBM Watson team. There's a whole page of people doing this stuff on the Netflix site. What that did was it validated.

We could see which projects they use, which ones they ignored, so we could start steering away from some things that weren't working out. And it also helped us test our architecture in public, see if we had good ideas, because there may be better ways of doing things, things we had overlooked or new things come along. That was the third reason.

I'm now blanking on what the fourth reason was. It'll come to me. That long-term architectural validation, I think, is really the key thing, because you don't know. If you're building something and you're keeping it very private, you don't really know if it's any good or there's better ways of doing it.

Those were the, I guess, main reasons, and I guess the technical PR side of it. It contributed and gave me something to talk about at conferences, and then I started farming out conference invites to lots of other people, and there's now about 10 or 20 people at Netflix that go out at conferences regularly and give talks on projects of various kinds. Let's move into a Q&A.


On Being Too Far Ahead

There was a talk at an API conference, and the response was, there was a a deer in the headlights look from half of the audience, And the other half were going, "I think you're solving a problem that I haven't got yet." And this was based on their injectable API endpoint technology, which is people are starting to understand why that might be interesting.

There were a few cases where we got out ahead of ourselves. I mean, there's always a group of people who will be at that leading edge who will figure out how to adopt it. The question is, how do you find that group and how do you grow that group, and how soon does that group become a bulk? When do you hit the mainstream? There's a lot of different approaches for figuring this out, but certainly right now, at Battery Ventures, we're looking at open-source projects and companies that are based on open source, which is a whole other topic.

"How do you make money out of open source?" is my current question. Trying to understand what is the activity level of these projects that these companies are working on. Is it growing? Is it shrinking? Are other people piling in? If you look at a project, and there's a parent company of the project, are they contributing 100% of the R&D into that project or 10% or what in between?

How many check-ins come from other organizations is an interesting metric, and that really tells you. Somebody was asking me earlier about ecosystems. You know you have an ecosystem when the core of the ecosystem is smaller than the entire ecosystem, when there's more contributions from outside than the core. You'll start to get a feedback loop for there to be an ecosystem there.

I mean, a real ecosystem is where people actually create startups that wouldn't exist unless you had done that thing. There are the ancillary companies that start sitting around the edge of the larger companies. If you look at Docker, there's Weave and ClusterHQ and Rancho, and people like that who came into being after seeing, "Yeah, there's an opportunity to build something around the edge of the Docker ecosystem," for example. So that's one of the ways you can tell.

Succesful Open-Source Trends

There's a few general trends. One of the really strong trends that I started noticing, just looking at all of the companies coming in, they were new companies that built new things. I mean, there's all kinds of people we talk to, but the brand new software that people were building, and I noticed that most of it was written in Go, and then I noticed that 3/4 of it written in Go, and I don't whether it's up to 80% or 90% now of all new things as far as I can tell are being written in Go.

That's one reason I wanted to learn Go. I was like, "What's going on here? Why is that?" It's a low-friction, really, really quick way of building things that work. And it's a very nice tool for getting stuff done, right? The second biggest community is probably Scala, with the Akka/Play framework, that stuff. I'm seeing a fair amount of activity there.

Then there's other people building things in Java or Erlang or Javascript or Node or whatever, so there is other stuff out there. But I think that it's almost all Go, and then there's a substantial amount of Scala, and then there's random stuff. The rest of it is largely noise. So, that's a couple of observations.

I'm seeing that trend grow. And I haven't yet seen anything pop up as the next thing after Go yet. We're looking at Rust, maybe, but I haven't seen anything written in Rust yet. I don't know. So, that's one trend.

The other "from the open source point of view" is it's very hard to build a product now, and certainly in the enterprise-IT space; that isn't open source. I think it makes your go-to-market much harder. There's two ways, two paths. You can build a SaaS product, but no one really cares because it's a service. Or if you're building something that people are expected to build with, then the days of proprietary libraries and proprietary installed software are basically over.

There are maybe a few exceptions in some places, but you have to live in this world where your code is open source, and maybe if it's going into enterprise IT. You get the free tier, and then there's the startup tier, which is cheap. And then there's the "call us for enterprise pricing." And the "call us for enterprise pricing" is defined by, you needed LDA-integration or actor directory or something like that. And it's ridiculous how clear a boundary that is.

And the reason for that, and this makes a lot of sense. Enterprises have roles that are allowed to do things, and they have policies and those roles are embedded as entries in LDAP directories, and so you have to be in this LDAP group to do this thing.

If you're buying software that does that thing, you have to be able to interface in to say, "Yes, that's what that does." If you saw, Docker Datacenter was announced today. That is Docker with LDAP integration, so you can control who can do what with it. Enterprises will pay for that. You can even open source your entire implementation of your LDAP thing, and they'll still pay for it. Because their model of support and compliance is that you have to have a chain of responsibility that goes back to a vendor or something like that.

They expect to pay for role integration, role-based access to things. So, if you're building a product, make sure you build the idea of roles into the product. I mean, even if you're not interfacing to LDAP, it's actually a powerful technique to have. And then you can actually integrate those roles in to LDAP and charge money for it. I think that that's the trend.

I've seen a few people, that haven't open sourced their software, that are basically being designed out of deals. If there's anything remotely similar to it, they say, "I'm sorry. Nice product. Sorry it's not open sourced." They want to know all the way up. I mean, that's big enterprises and banks saying that all the way down, so it's getting much, much harder to not do open source.

Now, then your question comes, "Well, how big's the market?" You can't charge as much, so you take a $1 billion market and turn it into a $100 million market. And people that were making the money out of it before get cratered, and you can figure out how to make money by having part of a $100-million market. But what you put on your slide when you went in for funding was, "This is a $1 billion market." We're going to take half of it, and you end up with half of a $100-million market. That's part of the problem.

What you're doing, though, is if you do it right, you're outsourcing a lot of your technology development and product-market fit, because your customers are modifying your product to make it fit their market.

You don't have to guess what it's going to be. I mean, I used to work at Sun, and we'd build software, and we'd put it out there after working on it for two or three years, and the customers might start using it a year later. So about three years after you had an idea, somebody would actually decide it was a bad idea, right?

Now people will tell you what your good ideas are by building it for you on your own product so you can use it yourself. You have to leverage that, the economies of open source, scale, and outsourcing the ideas and the engagement and the development and the testing and all of that hardening and porting. Your community, your ecosystem, will do that for you, so your cost of engineering should be radically lower. And if all of your engineering's being done in-house for your supposedly open-source project, but no one else is contributing to it, then you're not taking full advantage of it.

Patterns of Failure

Let's talk about a few failed Netflix projects, projects where the engineer just got too busy to handle the pull requests, and there just wasn't enough engagement there, and people stopped using it and stopped contributing to it, so that's a pretty obvious failure mode.

You can get tied up in legal things. I was at OSCON last year, and I was hanging out near the Capital One booth, and people were walking past going, "What project's that? I thought that was a bank." No, it is a bank. They have 50 people at OSCON, and that blew everybody's mind. And they said, "What are you doing here?" "Well, we've released an open-source project." And people were just, like, mouth-open, walking away.

It was actually quite fun watching people's minds get blown as they walked past the Capital One booth. They've got a DevOps dashboard. But the idea that you could do DevOps at a bank was also blowing people's minds. But the name of the project is some unpronounceable, Hygieia or something.

That's because by the time they'd fought their way through the legal battles to release this thing, they had no energy left to fight the battle with marketing over what it should be called. In big companies, there's a lot of resistance, internal resistance, to doing these things. You're seeing some companies get stuff out there, and it's a huge effort to get the first few projects out. And they have to show enough success to do it again.

There's probably quite a few stillborn projects, stillborn ideas, things that just didn't make it out because people ran out of time or just got ground down by lawyers and management and process and all that compliance stuff, risk. People start worrying about who's reliable for what, those kinds of things. So, that's at the big-company side of it.

There's a lot of good code in big companies that should be out there that's gradually fixing itself. In the startup world, I think I touched on it a little bit, if you end up being the only people contributing to a project, and you open-sourced it because you thought you had to, you probably used AGPL, which is code for "I didn't want to open source it in the first place," as far as I'm concerned.

The Affero GPL license basically says you can't do anything with this except read the code, really. Those are the ones where I think that they've missed it. You're not really building a contributing ecosystem to your project if you're using AGPL and you're being too constrained on it.

I think Apache's the right license. It's a carrot-versus-a-stick approach, and people are comfortable with the project because they could fork it if they needed to. But they don't want to if it's active, because then they'd own a branch on their own, and they'd have to keep rebasing to the mainstream.

If you have an active project that's Apache, it will not fork unless some political random thing happens. I mean, there are certainly cases where that's happened. Picking the wrong license and doing it for the wrong reasons, doing open sourcing because you think that's the checkbox you need on your pitch deck, that's the wrong reason for doing it.

I like to see projects that are built in public. Some of the most fun projects we did at Netflix were the ones where we announced the project where there were a few interfaces in the GitHub, nothing you could run. There was a few samples, like, "We think it should look like this, and then we got people to help us build what it needed to be."

We built a DNS abstraction layer called Denominator. If you need a Java API to DNS, this is it. It has plugins to all the DNS vendors, but it has an extremely well-thought-out model of what DNS is, because we worked with four different vendors to refine what that model should be, and the code was written by people from all over the place.

That is my model for a really nice project. And the ones where you finish it, you throw it over the wall, walk away, and never touch it again, are the ones that are just sad orphans. Sometimes they get picked up by somebody else, get a new life. Cassandra's an example of that. Facebook wrote it, threw it over the wall, and then DataStax basically picked it up. Effectively, Rackspace spun out DataStax to pick it up and run with it.

How Substantial is Serverless Computing?

I was in Washington D.C. in January, and somebody said that the biggest request they had for AWS was FedRAMP compliance for AWS Lambda. That was the top of their list because they wanted to use it on federal projects, and this is big-federal-project stuff. I think that it does apply to large-scale enterprise. It's a good way of building the back-end business logic that does event chaining.

There's a fair amount of stuff which is, you know, somebody does thing, pays money into a bank account, and you have to shuffle it around and do things, and there's all these flows that get kicked off by business actions. A customer has an interaction, but there's all these things that happen.

Those event-chaining flows are currently written using all kinds of crap, people copying files back and forth between COBOL programs and all sorts of junk like that is what's really going on in the back end of a lot of banks. And that business logic fits really well into the event-flow model.

The other thing is, like I said, all of the data points I have so far are particularly for enterprise-type workloads, which are mostly idle. That sales rep automation system you have is only busy at the end of the quarter, at which point it's overloaded with sales reps all trying to get their numbers into the system, so they get paid or whatever.

The rest of the time, it's idle. So, Lambda's perfect for that. It scales to whatever you need as it needs it, and it's not there when it's not needed. And you're just paying for data storage, which is quite often small. So, there's a lot of very idle machines kicking around that have to be this big, because they have the burst traffic at some point in time. Think of the payroll problem, whatever. But they're small enough that they're simple enough that they can be converted, so that's the thing that I think makes sense.

Probably the worst thing for IoT, there's a few event-driven things in the IoT space that make sense for me, but if you've got a very constant workload. If you think of an IoT workload, you have all these things they call you all the time. If they're calling you at a constant rate, there is no diurnal cycle. They don't get busier and quieter; they just keep calling you, and as you deploy more, they just keep calling. So it's a totally flat workload, and it's relatively predictable.

That workload, you want to put on dedicated machines, because it's totally predictable. There's no spikes in this thing. It's like you sell a few more each weekend at Home Depot, and more things appear, thermostats or whatever are talking to you, and it gradually creeps up.

You maybe zoom up a bit more on Christmas Day or something like that. Those kinds of things happen. But it's a very predictable workload. Those things, tune them down, run them on some machines that are 90% busy, and they stay 90% busy. But there's a huge proportion of workloads out there that are very spiky, and those ones, I think, are going to go to Lambda relatively quickly. I think, given the recent announcements, I think that gives it basically an architectural endorsement as a pattern, that everyone is now doing it.

You're not locked into AWS if you choose this pattern. You can run it on Google and IBM and soon Azure and, I think, Iron.io lets you have a similar thing for running it in their data center. Have we got some some Iron.io people? Yeah, okay. Anyone here from Serverless, the company? I think they're in Oakland or something. I don't know.

I've been gradually seeing that built, and I think it's got a lot of attraction for enterprise. It's also a very highly secured environment, so you have to have a AWS role, IAM role, and the security is very, very locked down, and this little clump of code can only read from this S3 bucket, and it can only write to that DynamoDB, and it only sees what came in here, and it can't do anything else, and it only exists for one second.

How would you break into it? The whole idea of code exploits becomes basically meaningless in this environment, so I think that there's a place for it in highly secure computing architectures as well, that you are building the system with a lot of very, very intrinisically secure architecture in there.

Transitioning from Netflix to VC

If you ever get a chance to join a VC firm, do it. It's very hard to get in, is the trouble. I don't know quite how I managed to sneak in one day when they weren't looking or something. That's how it feels. It's difficult to get in, but it's a good environment. What I like particularly, is that I get to talk to people that have yet to create a company.

I'm having a series of conversations over the last week with somebody that hasn't even left the company he's currently at. He's whiteboarding stuff, he's trying to figure out what to build, and that is fascinating. Love those kinds of discussions.

We don't do that many seed investments because we're a bigger fund. We mostly do the A and the B round, but we want to know who the interesting seeds are because we want to keep track of them. So when you come in for your A round, we know you.

It's quite a common quote: "If you want money from a VC person, ask for advice. And if you want advice, ask for money." That's the way it actually works. And so, getting to know people in the VC community way in advance of going and needing money is the way that you actually get money. It's a trust-based business.

A lot of what I'm doing is getting feelers out to people, building up trust on both sides, that we know them, we know what's going on, we know what state they are, and we know when they're going to need money. We can go in and say, "Do you need money now? A little bit earlier?" so that you don't talk to those other guys, those kinds of things, to the people we like. That's the thing that we're doing.

Operationally, I was already spending a lot of my time at conferences, doing, effectively, technical PR stuff, technology PR for Netflix. So that piece feels the same, and I affect in more conferences and do them in other different pieces of the world.

The hands-on stuff I used to do, the real technology architecture that I did between groups mostly, I was mostly trying to get different groups at Netflix to coordinate better, and trying to find good ideas here and spread them around. I'm doing that across companies now, so I still do that.

But my architecture role is mostly across large enterprises that I talk to and startups that I'm helping and portfolio companies that I'm helping. It's more reshuffled some of the priorities around a bit, but it didn't change a lot of what I actually do day to day. But I do have this odd position of being a technologist in a VC firm, and there aren't very many people who are permanent employees of a VC firm who are technologists. A lot of people are entrepreneurs in residence, which is a temporary position. Thank you.