Ep. #13, Two-Way Video and Beyond with Sarah Allen of Veriskope
In episode 13 of Demuxed, Matt, Steve, and guest host Nick Chadwick speak with Sarah Allen, partner at Veriskope. They discuss implementations of RTMP, how the web has evolved to support video over the years, and the future of video conferencing.
Sarah Allen is a partner at Veriskope, and has been a contributor to open source software for over 15 years. She was an early software engineer at companies like Macromedia, Adobe, and Apple. Sarah also previously worked at Google and the Smithsonian Institution.
In episode 13 of Demuxed, Matt, Steve, and guest host Nick Chadwick speak with Sarah Allen, partner at Veriskope. They discuss implementations of RTMP, how the web has evolved to support video over the years, and the future of video conferencing.
transcript
Matt McClure: Hey everyone welcome back to Demuxed. It's been a hot minute.
I opened up Overcast to show somebody where we were, and we are listed technically as "inactive."
So, hopefully this bumps us back into activity. So, anyway thanks everybody for joining us today.
We don't have Phil. As we have done the past, we replaced him with another person with a delightful accent to give us that culture you know.
Nick Chadwick: We're international.
Matt: There we go. Not just a bunch of crass yankees.
So those of that don't know Nick, he is a Demuxed speaker extraordinaire.
Thought leader in general in the video space. Real time video connoisseur.
Nick: I'm kind of feeling a little intimated today, because I have previously given talks on RTMP, and now I am regretting it.
Matt: And our guest of honor today is Sarah Allen from Veriskope. She is one of the original creators of RTMP.
This is really exciting and she is also leading where RTMP is going to go from here, which is always a hot topic right now. Thanks for joining us Sarah, this is really great.
Sarah Allen: It's great to be here.
Matt: She also, if you were at Demuxed 2019, she somehow managed to give a historical lesson in a ten-minute slot, because she was one of the late additions to the schedule.
Unfortunately we didn't meet until after like the talk selection, but it was too good not to have on the schedule.
So she was gracious enough to like squeeze in what within the space we had. So, thank you for, that was awesome.
Sarah: Then there so many stories I didn't get to tell, so I'm excited to come back.
Matt: So, yeah I think great place to start with just like a history lesson of RTMP and like 1999, what was it like before you all started thinking about RTMP? Set the stage a little bit.
Sarah: So, in the late nineties the way that you did video on the internet was either QuickTime or Realplayer and people would come to a page, and it would be, if neither a plugin was detected, it would be click here to download Realplayer, click here to download QuickTime and Flash was ubiquitous.
It was you know something like 98% of browsers on the internet. That was huge, there were hundreds of millions of people on the internet.
So, that was considered huge in the day. Now of course it's most of the people in the world, but in that era, it was a time when regular folks were using the web, whereas five years before that, it was for academics and techies.
And not even all the techies. So, I was still working on the Director team, Director was working on its 3D thing.
Nick: Sorry, what was Director?
Sarah: So, the first Shockwave was made for Director and so the Shockwave player was one of the first Netscape plugins and we worked with Netscape, when they were pre-beta, to build that Netscape plugin and it was also one of the first ActiveX controls.
And then Macromedia decided that its internet brand would be Shockwave and they confused everybody by naming all the players Shockwave.
So, Flash was originally acquired as Future Splash and so they were required to call their files Shockwave Flash and so the SWF file was originally Shockwave Flash, and then later they reacronymed it to Small Web File.
Steve Heffernan: I don't even know that version.
Sarah: Because then like everybody was confused and so we separated the brands, but it was impossible to disentangle.
So, I was originally on the Director Shockwave team and so we were doing multiparty communication for like more the gaming market with the Shockwave multi user server.
They did UDP for fast communication and that's when I really learnt about like firewalls are difficult.
Not that they are actually difficult to configure but that it's difficult to get the masses of the real world people to configure them.
So, the other thing, Jonathan Gay had taken a year off to think about what's next and he came back talking about this idea of doing two-way audio video in Flash, and he said he wanted to do something more exciting than X10 cameras.
Do any of we remember the X10 cameras?
Matt: Vaguely.
Nick: No.
Sarah: So the internet was blanketed with ads for these IP cameras.
And I looked it up, they were the first pop under ads, so when you opened a web page, behind it the ad would pop up, so you would see it when you close the web page and so people weren't as quick to dismiss it.
And if you are browsing for a while, and this is before browser had tabs right, so, you opened all these windows, you'd end up with like seven different X10 ads under your windows that you would have to close.
So, it was a great exposure mechanism. Somebody did a study who is a internet service provider, that X10 camera ads now account for 89% of bandwidth used by all broadband users.
Nick: So, the X10 ads were the Netflix before it was cool.
Who was Jonathan Gay? If we are going to get it on the podcast, I think it would be helpful just like, because he was a big deal.
Sarah: Yeah so Jonathan Gray wrote the first Flash Player and was the primary architect of the player technology and with Robert Tatsumi those were the two founders of FutureSplash that were acquired by Macromedia.
So, that team grew to be about five people and engineers working on primarily on the player and a little bit on the authoring tool.
And Jonathan Gay was probably more famous at that time for having written Dark Castle and a number of early games when he was like, I think that's how he paid his way through college.
And so I remember Garry Grossman who added action script to the JavaScript engine to Flash. Like he came 'cause he wanted to work with the person who wrote Dark Castle.
Nick: Pretty cool.
Sarah: The story I heard was that John came back and pitched the executives on this idea, and he had done some early prototyping and believed that we could do two-way audio video and less than a 100K of code added to the Flash player.
Because the Flash player was then 200K of code. So it's a big deal to increase the size by 50% and at that time the internet was exploding.
So, this was before the buzz that the time that everybody signed off on this. And so they were really looking for places to invest.
And so, whoever CTO at the time told him that he could pick any five engineers at Macromedia.
And afterwards I heard that the same executive was like, "Oh I didn't think you are going to to pick those five."
Because they were all like each of us were people who could lead a whole team, yet we got a chance to actually build the original version of the two way audio video client server, and so that was pretty fun.
Matt: 200k, that's like your standard JavaScript framework these days right?
Sarah: Yeah, my joke is that engineers these days can't count that small.
Steve: 2000, lot of population is still on modems at that time I feel like.
Sarah: Yes, so, we had to make stuff work on 14.4 modems, and like I remember at that time, I couldn't get DSL at home.
I had to get ISDN, so that was like as high bandwidth there was available in the Mission in the nineties, and there were only two browsers right?
There was Netscape, which had the plugin interface and IE that had ActiveX controls and you know we made it work on Mac and Windows for the client side, and you know and I don't know if you'll remember the Power PC was like one of the requirements.
And the other thing that happened in the late nineties is that they published the SWF file format.
So we went into this project with the assumption that we were going to open the protocol and in fact, while I was still there, we documented the protocol in great detail and the codecs that we licensed, we required them to give us all the information and allow us to make that public.
So, fast forward 20 years, when I came back and I actually like looked at the specification, I was like "Oh wow 2012 like what the heck happened?"
Matt: This one can go backwards but the X10 web camera reminded me of, I mean this is probably right around that time.
I remember we bought a scanner or something at my house and it came with a rebate for a webcam and I was so excited because I was going to get.
This was back in the day when like a webcam meant that you had a camera somewhere that would upload like a jpeg to some FTP server every 30 seconds and that was like your live webcam and I was very excited about setting one of these up.
So I get this free webcam from this rebate.
Very excitedly plugged it into my computer and the quality was so low that you couldn't even make out my face sitting directly in front of the monitor.
So I knew this was sad, I did not get mad cam, which in retrospect is probably a good idea for like 13 year old me to not have a mad cam.
So, was anybody else doing anything like that? I mean so when you're talking about adding this two way communication into this plugin like was there competition out there?
Like what did that landscape look like?
Sarah: I think part of the reason, and I synced up with John Gay and recently, and he said that part of the reason that it had to be two way audio video is because we felt like the one way world was really crowded right with QuickTime, you know Windows Media Server and Rio and so there was WebEx existed, but everything required a download and it was all very purpose built. It wasn't a platform.
There were some collaboration platforms, but there were more around document sharing and chat and less around the video or purely video conferencing things and they weren't that many of them that weren't dedicated install systems.
Nick: So, when you are working on this for the first time, did Flash even have the ability to play back video then?
Sarah: I don't think so no. People would fake it by having like a series of images.
Nick: Right so the impetus for actually Flash supporting video which turned out to be this massive thing was actually two way communications on day one.
It wasn't actually that Flash could play video.
Sarah: Exactly.
Matt: That's fascinating.
Sarah: I mean I think for a long time I felt like it was this kind of like, because during the development of this like the internet started imploding and people started for the first time in a while being worried about revenue and stuff. Right?
And so for while I felt like it was this sort of closed kept secret that have if we do two way, we will get one way to happen right like it's sort of intuitive to me that one way if is half of two way.
Like this a little more features of the buffering stuff but we added that in air quotes totally at the end, because I felt like partly I didn't talk about it because I thought we would lose the opportunity to do the two way stuff.
If they would be like, "Okay well let's focus on one way" and then we wouldn't have been able to release such a ground-breaking platform, because it's so much harder to add interactivity later.
Matt: Right so this entire platform then basically became how the standard way of viewing video on the internet for years, was kind of like backdoored into like video chat?
Sarah: Well yes and no because the other thing that happened in late 90s was DVRs.
So TiVo had come out last year too, and then they were started to be competitors, and so we wanted to mix live and video on demand.
So, there could be a live cast that was recording and you could join late and go backwards in time, and so we were like well, so DVRs can do it, we can do it, it's just bits, but that was revolutionary, and it's still you know people will say well you should have different architectures for live and video on demand, and certainly if you know that you are never going to do interactivity and never going to do live, it's simpler to do stuff.
There's all sorts of shortcuts you can take, but it's so hard to retrofit anything that is timely, and I think that's what we are seeing in the industry is that we've seen a lot of improvements to infrastructure that favor things happening reliably yet with high latency, 'cause you know you can give up latency and get other forms of quality and certainly make the engineering easier.
Matt: It's funny watching us. I think we have actually talked about this on the podcast before, but it's been kind of funny watching the video community go full circle for stateful connections for video. Right?
It was like RTMP, RTMP is painful because you have to maintain a stateful connection to every viewer.
Okay we use streaming over HTTP and then that was kind of the de facto standard, but what if we started delivering things over web sockets and what if we started delivering things over WebRTC to try to like reduce that latency again, and now we're like fully back into maintaining a stateful connection with every viewer.
It's really interesting to watch that like internet eating itself again.
Sarah: It was also like this era, where we were going from you know in the early nineties, it was common to write a specification and then build your code and you have development cycles that were a year and a half and not a lot of iteration right, to the sort of precursors of agile.
They were definitely people doing agile, even though I didn't know those words then.
And there was this whole process at Macromedia about getting the specification signed off early and then you would do iteration during the development, but the spec kicked things off.
And that was a challenge because the execs wanted to see our progress in terms of had we written these specifications and we wanted to get it to actually work before we specify the interfaces. Right?
Because the interfaces would change depending on the characteristics of the platform and so I made up a new process that I called the GOLIDe, which was goal-led iterative development, with a small e at the end.
And so, for every feature we had a goal, and after you met that goal with a prototype, then you wrote the specification.
And so that way they could see progress while the engineers could actually build stuff that worked.
Like those are the kinds of hacks that we had to do to get this thing to happen in a different way because it was hard to make it work you know, in two-way video.
Nick: It's kind of one those amazingly obvious and retrospect things that you should figure out how to make it work before you write down how it works.
Sarah: You would think so?
Nick: Basically you had to like invent lean startup in order to get this done.
Sarah: A lot of the stuff that was happening in late nineties on the web, is sort of analogous to what's happened in last five years with web apps, where building it is not the hard part, like figuring out what to build is the hard part.
There were a lot of applications where you just were like, well let's decide what we're doing and then the engineering is clear, but this was not one of them.
Matt: We talked about kind of what the landscape looked for internally here, but like were other people doing this?
Like when you are when you talking about all of these development cycles and getting these out the door, were you racing against anybody else? Was there a video race? Did you know about it or like what did that look like?
Sarah: We didn't know about anybody who is working on anything similar and then I was surprised six months into it when we were like really struggling to get the low latency video and like you know and achieved with the codecs like 10 or 12 frames per second or some of our early codecs that I was like "Ah that's barely video."
Then I walked in one morning and on the interwebs is like a press release from Yahoo, that Yahoo Messenger 5.0 has video in Yahoo Messenger.
I'm like "Oh, they totally scooped us" and I download it and it turns out that the video is the thumbnail of the Avatar.
So it's like 64 x 64 or 32 x 32 pixels and it's one frame per second.
I was like, "Oh I didn't know we could call it video "at one frame per second." We could've totally released something.
I'm glad we waited to make it great, and they didn't totally scoop us because that didn't really catch on.
Steve: Do you know which codec they were using or which codec you guys were using some of these early--
Sarah: We used I actually did the codec negotiations with Isaac Babs.
Matt: Who is Isaac Babs?
Sarah: Isaac Babs was a biz dev guy at Macromedia, and so he actually--
So people don't realize that even though the Web had been around for five years, not many things were on the Web right?
So how do you find a company that makes codecs? Like you can't do a Web search.
You have to talk to people and you know figure out who makes codecs.
So, Isaac did that. Like he went and found the different codec manufacturers and you know he found Nellie Moser with this awesome group of audio engineers, who made a codec for us.
Matt: They looked 'em up like yellow pages or something? How does that work before the internet?
Sarah: You start to talk to people who have codec needs and they introduced you to other people who do codecs and so most of the codecs we look at were just the codec was bigger than a 100 K of code compressed right?
And the audio codecs were crazy large, like you know like "Oh this one's 1.2 megabytes" and we're like okay. Next.
So, the way that we got the audio codec, well we already had MP3 decompression in Flash, so that you could play music, so it did another, there was like Shockwave audio that implemented MP3 that added it to all of its Shockwaves in like '97 I think.
So we decided to do a voice only codec which could get it really, really small and so they made a codec specifically for us that balanced like making voice good with code size, and they were great to work with.
Although the reason I have a clear memory of this specification is because we had to write down the codec format, which was very hard right?
You know we ended up writing something down but it was like very, you know well in theory somebody could replicate that.
But there were discussions at that point of open sourcing the Flash player right?
It was like this whole openness win thing that was at one point seemed to be where things were going, and on the video side, we really wanted to use H264 which was out then, but they haven't figured out the licensing terms.
Matt Is this 2000?
Sarah: This was by 2001.
Steve: Yeah okay 'cause the first codec that was in Flash was Spark, which was H263.
Sarah: It was 263 yeah, so we evaluated 264 and they hadn't figured out the licensing terms, but we were going to put the codecs in the player and we couldn't afford to pay per Flash player.
Steve: Oh yeah, that makes sense.
Sarah: And so if we had put it in the player, Macromedia would have had to pay for every download.
So we went with H263, and I've read people criticizing it being non standard and there were two big differences.
One is most people don't know that H263 and similar family of codecs were written for when you had dedicated systems right and you had video monitors that were specific sizes.
And so the 263 specifications says you must support only these widths and heights of video.
So, we were like well can we put other numbers in there? Okay? No problem. That's non standard.
And you know like I remember John saying, well, maybe people will want to make portrait video. Maybe it's not, shouldn't all be 3 x 4, 'cause we'll be doing video conferencing.
And people were "You think really?" and now the iPhone lets you do that.
Nick: We still don't want to do that. We still won't.
Sarah: If you're compositing it with something, it could be great.
So and then the other thing that was non standard about the codec is that they had constant bandwidth, which meant the quality changed over time and when you're doing it over the internet it actually, especially if you want to be downloading multimedia assets too, there's no reason to like artificially have it constant.
So we made another option for constant quality. Otherwise you had this like throbbing visual effect.
Nick: It's amazing how things change and how they stay exactly the same, now we're in 2020 and we're still angsting over what video codecs we can actually use on the web and who is paying the H264 license and how to actually get good video delivered without that quality pulsating on key frames.
So, the more things change, the more everything stays the same.
Sarah: Well I think that we are still chasing the, like CPUs have gotten so fast with so much memory. Then we want to just gobble it up with like--
Matt: JavaScript.
Sarah: More JavaScript, bigger screens.
Nick: It's kind of amazing. You mentioned earlier trying to get it right and I think one of the things that it tests too that is we're still using RTMP today.
It's a still a relevant protocol for some of the biggest sites on the modern internet.
Sarah: I remember when Chris Hawkes was a PM at Macromedia at Adobe now, sorry, I will make that slip sometimes.
So, we work together, Macromedia back in the day he was one of the original PMs who launched the product and stayed for a few years and then left and came back and so he came up with this idea of spinning out the technology to Veriskope.
And when he first talked to me about it, I was like, "Oh, are you kidding me?"
Because I just assumed it was a dying technology that nobody was using anymore and I don't want to take over this dead thing and then I started looking into it, and I was like "Wow" like YouTube, Twitch, Facebook, like Mux, all of these companies using it for ingest and then there's all these like different applications that are like off the web.
Like we think about the desktop web and the mobile web being everything but it's also being used in apps, you know for some other things that it was made for, but the majority used cases are like, "Oh yeah that's what I use for ingest."
And then started really learning about there is, the technology is applicable to a whole bunch of different things today, because it's you know like it's just a really nice scalable server that it was built to run on a small footprint, you could run it on a Raspberry PI today.
And most of the stuff that people are building for the desktop and mobile web is too heavy weight to put on little devices.
Steve: We only had an H3 cam that's a Raspberry Pi that's streaming.
Nick: We do. I'm not sure that's on the public engine, though. I'm not sure anyone wants to look at H3 San Francisco.
But I think that's a really interesting point like modern, if we could add quotes around like modern peer-to-peer video technologies like WebRTC, only just now are people actually trying to build embeddable libraries.
There's a recent C implementation of WebRTC that came out of a project called Pion that was a go implementation of WebRTC, but just having multiple working implementations of WebRTC that weren't just the Google Lib WebRTC has taken years and years and even now they are not fully production ready.
If you wanted to do WebRTC from an embedded device, it is really complicated to get that technology working, whereas with the RTMP spec and a couple weekends, you can get something working.
Sarah: Yeah I think one of the things that's really fascinating is like how many different libraries are and different, like oh if you want to to have a like NGINX plugin and there is like this, you know if you want to have a standalone thing, here is a open source CRTMP server and there is like Lib RTMP and there is RTMP implementation in multiple ones and most languages--
Nick: Red5 in Java.
Sarah: Exactly and I think bigger than that is the tooling like when I was starting to look at again and you didn't recall all the details of the protocol.
I can just use Wireshark and it's got built-in RTMP support. So, that like having that ecosystem of tools is like very awesome.
Matt: So, correct me if I'm wrong here.
So when we're talking about like sitting down with these specs and trying to implement these things a couple of weekends of RTMP, a constant thing that I've heard is RTMP was reverse engineered to be an open spec or it wasn't like officially an open spec.
It's just a bunch of people reverse engineered the protocol, and that's one of the reasons why it became kind of painful to work with for so long is because there's all these different implementations, because people would just kind of like without the spec they would just kind of shove things in.
You'll correct me if I'm wrong, but I feel like that's something I've heard, like is that a thing and it's like where is that today?
Sarah: That's a great question and many of the implementations, the most widely used ones were released before there was a specification.
So, they could only reverse engineer it and then in 2012, Adobe released a specification and I've been looking at the future of the direction of this protocol and so we now have a micro site RTMP.veriskope.com, where you can see the Adobe's specification we've transformed into HTML backed by markdown, and so you could go there in GITHUB and make corrections and clarifications.
A nd I'm in a phase now where we're first making some clarifications to the specification and I've got a little side project, where I'm trying to do like a it's not really a clean room implementation 'cause I know the specification, and I've seen a lot of the code that implement it, but I'm trying to write code just using the spec and it's really hard.
And what I found is all the information is in there and so far, I haven't found something that is clearly wrong, yet you have to look at four or five different sections and put them together and try to understand what these all are and you know often this, they'll say "Oh the ID blah-blah-blah" and there is chunk IDs and stream IDs and command IDs.
And so it was, I mean it was probably just I would guess given to an engineer to write and they didn't have the cycles to have a detailed review process. So, I have great empathy for whoever needed to write this spec. It was like somebody like a long time after you know original team and it wasn't somebody who is on the original team who was then tasked with writing the specification, but I'm excited to like actually put together some more easy to follow details and RTMPS was never specified.
So, there are different implementations of RTMPS out there, which are too just, it's really straightforward.
There's the HTPS layer, which works to all the firewalls right and then there's just over a secure socket.
So it's "simple" but it's also hard if you don't know that, like "Oh there's two variants and do I have software that speaks both?"
We should start writing that stuff down I think will be a big service to the community.
Matt: Step one.
Nick: I think that part of the thing that was missing from the spec, what I have been following on little bit with the work you been doing on this spec--
One of the things that is really nice is just getting some clarifications like do this, don't do this.
When I was working at Twitch, one of the things we kept running into was, we read the spec, we kind of understand the protocol and you know if you are a protocol nerd, there's things like the chunk stream which then provide data for higher level notion of stream.
So there is this lower level and higher level, things can be interleaved so, that you can send a video frame over multiple packets interleaved with audio.
But then you are into questions like " can I get a message from a stream on a new chunk stream ID and you make decisions there, because that's a little bit unspecified, and the moment you make a decision there, you going to find the code that made the different decision.
So, people have ended up having to write RTMP servers that are extremely, extremely forgiving for what's actually out in the wild, but then one of the actual things that cause a lot of issues in my view is there were kind of two camps working on RTMP.
There was the camp that wanted to build an interoperability and actually build RTMP powered apps and understand the spec and work with it in an open source manner, and then there were stream pirates.
I think the difficulty that came around working with RTMP was RTMPE, that notion that there's an encrypted stream that has a DRM built in, that was extremely sensitive at Adobe and I think that polluted that conversation a lot because half the people using RTMP just wanted to steal content.
Sarah: I think that's a good point, like I think a big part of it is that clarification of intent, right?
So, if you know it's 2020 hindsight right, create empathy with people who are in the trenches making these decisions because it's easy to say things in retrospect, but I think that--
And you know like what I am trying to do now is really that like well let's communicate what Adobe/Veriskope wants. Right?
The DRM stuff is secret and private and you have to get it from Adobe and what not, but I think that got muddied where it wasn't clear to the community what you know there was some threatened litigation and stuff like that, which then I think made a lot of people just kind of want to back off, when actually the majority of RTMP was not in contention, had been documented and so forth.
And so just teasing that apart for the community and then I think, as people are more and more security minded, making sure that we have the security laid out and people can implement that without fear of reprisal.
I think people will always want DRM solutions and those can be layered on top of an open protocol and I think that's what kind of needs to be clarified and I think it would've served the community if that had been earlier, but here we go.
Nick: Yeah that's something that Facebook has originally switched over to enforcing RTMP as only or they are moving in that direction for new streams.
It really is that difference between as an individual broadcasting some content, you want to make sure that you know the person on your Wi-Fi or the NSA can't intercept your stream and watch it and that's a completely different requirement from DRM, where you are sending the same video to multiple viewers and you want to have some kind of encryption mechanism that prevents them from just like sharing that content and capturing that content.
So, yeah I am very excited to get the clarifications coming from that, that sort of Twitch background having worked on RTMP ingest at Zencode and now Mux.
It's very exciting to actually have a spec that is getting annotated, have some these details flushed out and to know exactly how we should be building that security into our product.
Sarah: Yeah I think the other aspect that I think will become important is who is broadcasting. Right?
So, the other reason is not just that somebody can intercept and see your private feed although that can be an issue if you are doing it for private group, but somebody could pretend to be you.
And we know like we have a lot of confidence that I think is false confidence that when we see somebody in live video that they are who they are, but actually like I did After Effects early in my career, like I well know that we can fake these things. Right?
We can fake what we look like. So, I think as time goes on, people will be wanting to have end-to-end security, so that the video, you know the video originated with me if it coming from my stream.
Matt: It's really-- Deepfakes is an increasing, like it was a problem before I think people like are more scared about it now because of the publicity that deepfakes have gotten recently, but that's really a good point.
Sarah: Yeah I think that's more that, I mean kind of like where they should be asked, like I have been following, HTTPS did not become widely adopted because anybody like regulated it or did some kind of topdown, everyone must use HTTPS.
A lot of the engineers have conspired with each other to make it more and more obvious to consumers that if they are not using HTTPS it's a bad thing. Right?
First it was just like okay let's put up icons, and you know like let's spread awareness about what safe and what's not, and then like okay let's make it like you have to click a few things before you get through to the non-HTTPS website, and I think that style of introducing security by first introducing the secure features as an option, and then gradually making it seem more difficult and scary to do the non-secure things, before you take it away.
I think that's a way that we can introduce these concepts and if we get people creating things that have some kind of authenticity stamp, then I think--
It's the only way that I think we can really solve the like fake news and fake thing, by actually having some record of the originator of these videos and how it's been mashed up in between.
Steve: Slightly different topic. May I just know, where you feel like the topic of adaptive streaming is going in relation RTMP?
Like when I am talking to people about a WebRTC specifically, like there is a lot of questions around how WebRTC does adaptive streaming, because it does actually aggressively down grade the quality in order to maintain that real timeness of the stream.
Most people using WebRTC I rarely see to go above like 540P video because it's again trying to keep that you know real timeness, and then there's a detail of with WebRTC when it is doing adaptive streaming, it's usually creating all of the different versions on the client as opposed to doing it on the server.
And so like when I'm talking to people about WebRTC versus using RTMP, like some of the things that come up are you know RTMP is everywhere and like actually more likely will maintain the quality that you're trying to go to because it doesn't have all these common protections against dropping from real time.
Just wondering your opinion on where that technology is at, where it's going with RTMP that type of--
Sarah: Yeah so, like and just as a check of context at the beginning, we only had one codec, and that made things easier and more complicated right because it was a limited set of tools.
But we would drop frames right and prioritize audio, because it turns out that if you aren't seeing everything, then it's like less critical than if you hear--
Like, if voice gets behind or like you miss something and you get a pop like it's really bad and people are much more forgiving of something visually missing data, and what I believe is true is, but it would be interesting if there's been any studies in this area that when it comes to video conferencing, the quality of the video is probably more important than seeing every little detail of movement.
So the other thing that's interesting about video conferencing and compression is that most people keep their head still and there is very little motion.
So, you can actually get the streams to be very small, you know if it's set up well and you have that type of video conferencing.
Of course movies are different thing. Back to your question in terms of where things are going, I think that if your target is desktop video conferencing compressing multiple films on the client is absolutely right choice right, because you have enormous compute power on the client.
You have no transcoding right, because transcoding increases latency.
It has to and so if you can match all the clients with the same codec and the same, you know like and something that's generated by one of the clients, like you're going to have the best latency. So, that's great.
It's challenging because then sometimes the clients aren't as high powered and you have to make trade-offs here. Like "Oh, I want to move to this new codec "that is great bandwidth, but it's hard to generate "multiple streams on multiple clients." So having a solution where you can you know like make a good guess right that one of the clients produces and then have a transcode on server makes a lot of sense, when you can support it supporting multiple codecs from multiple bit rates from clients, it's going to give you the best quality.
A nd so I think the other thing that's interesting about WebRTC and I haven't dove into just see how exactly it's implemented these days but from a specification prospective, it allows lossy transport right for the video and audio streams and so you could potentially take great advantage of that from a codec, if the codec is built to support lossiness.
Practically that's very hard to achieve, because you know certain things you need to have, and knowing what's necessary and not necessary is like super challenging right, that's why we are still inventing codecs in this millennium.
But I think that like one of the areas that I am really excited looking at is the quick protocol, because it allows for lossy and lossless streams over the same connection, and so QUIC is what is the foundation of HTTP/3 and so that's going to standards process because there was an original QUIC and now this has been modified and actually so that it was support all of HTTP and then it will become the HTTP/3 protocol.
And so I think that that has the promise of one the characteristics of the protocols or something that would actually work well for audio and video streaming.
A nd I think that I am really excited to like look at layering RTMP on top of QUIC, and I'm also looking at frankly layering RTMP on top of WebRTC, like that could work in terms of like having the semantics of RTMP and just the transport of WebRTC could work well.
B ut maybe we can leave for it and get maybe if QUIC is out early enough, that might be more suitable because it's really built for two-way streaming.
Nick: I think there's also a question there about new codecs, codecs that are build for example lossiness or even better compression. One of the challenges or perhaps one of the features of RTMP has been that static list of codecs. Right?
There is a set list of things you might encounter in an RTMP stream and there's absolutely no way to add in a new codec to that list.
Is that something you are thinking about changing as part of the spec?
Sarah: That's a great question.
So, yes it was by design to have very few set of codecs, because that makes it simple to implement and it also, we were worried about the code size of the client.
So, this I think has been a success in creating the interoperability.
Yet some of the codecs are awesome. I don't think I ever anticipated 8K video.
When we did After Effects in 1990, 4K was film quality and so I thought 4K was the max that we would ever do for video, and it turns out well, we can do more and understand how humans see video.
And then VR is coming up, which is even fatter than that right and so you really start to need the, you know better codecs in order to deliver good experiences.
So, I definitely looking at expanding that list and I've had some healthy debate and would love to talk to anybody about whether we should have an escape hatch for somebody to specify their own codec, which is a well-trodden debate, but I think that it would be healthy to have that again because as soon as you allow anybody to specify anything, then that affects the interoperability negatively.
But there will be a time you know 10 years from now, there'll be codecs where you go, "I can't believe we're using such old yucky codecs."
Our new like best thing is going to not be awesome 10 years from now or maybe even sooner.
So we started a RTMP channel on the Video Dev Slack and so would welcome people who want to come and bug shit about protocol decisions.
What I am searching for is really great use cases right?
I want to always make decisions with and this is why and this is how it's going to not just help a particular vendor, but really be the right thing for the community.
And I think that the exciting thing about being situated with Veriskope, which has gotten the assets from Adobe, is that like I have you know the authority to write this new specification and my goal is to just do some short term things first that will address pain points in the community.
The codecs and we've heard from people running big surfaces that having a redirect in the middle of a stream, which clients would respond to, because when you first connect it can say, go somewhere else.
But then, if you're in the middle of live streaming and you need to drain a server, you know that there is a you know--
Nick: That would be cool.
Sarah: So, that I think is really straightforward to add, it's just a matter of you know sort of getting figuring out the rights of syntax for it and where to put it in a communication.
So, relatively small things I think could fit into like having like, I think kind of like 1.1 spec that would be, where I think all the server community, people who run services would embrace being backwards compatible, but then if a client were to advertise that it's sports 1.1, then the server could take advantage of 1.1 features.
And I think Veriskope's well positioned because I think that we're the only company that has both client and server technologies 'cause we have the RTMP SDK to, I mean I think there might be some companies that have, I don't know if they have client SDKs.
But in any case, we have a lot of people who use the client SDK and so we have a kind of footprint there and then we have a server that we sell, which doesn't compete with the big services right.
So, some people want to buy a server and that's a different thing.
So I think it's neat to be situated at Veriskope and helping take this forward from a position where we're not competing with most of the people who need the spec changes.
Matt: We've already started dabbling in this, about where things are going, but is there anything else in your mind--
Like what else is you know, we are talking a little bit right now about like 1.1 but what do you see in 2.0, like if you are going to cut to 2.0 in the next five years, like what you think that looks like?
Sarah: Well to wrap up the 1.1 thing, the other thing that is sort of is actually independent of any protocol release, which is formalizing how data is sent around because there are closed caption specifications, there are ad instruction specifications, but how those get put into streams and on the wire could be a bunch of different ways.
And so it turns out that there's a bunch of arbitrarily different things across the interwebs for common data interoperability things.
So, I think having those specified, specifying how multi track audio works, which is the capability of the protocol but somehow like a limitation of certain parts of the ecosystem that I am still trying to figure out.
So, I think those are just kind of addressing the near-term pain points and then in the future, I'm super excited to see whether QUIC will give us the transport that we need, and if we can manage to really like dive into it before they like ink the protocol. It's not just the protocol, it's the implementation.
So, what I've seen historically is that like some standard comes out, and then it doesn't really matter what the standard says, because the implementations don't work. Right? Like that happened with CSS initially. I t was just super hard to use CSS until like five years after the standard came out, because there were browsers that just didn't fully implement the spec.
And so if we can kind of like get into it before the standard gets rolled out completely, then I think there is a chance it could work really well.
Because bidirectional streaming is not really, I mean it's an incredibly common used case, but if you are somebody who is used to thinking about how I download documents and display HTML pages, it's an edge case.
And they are probably more people doing bidirectional communication on the internet that there were people on the internet in 2002, but if they aren't new companies really using these new protocols for these use cases, then they're going to end up not really working.
Like I remember, it was an executive that I won't name in the nineties at Macromedia, who said "Why do we have to be involved with standards?"
"We weren't involved in the CD-ROM standards "when we did interactive multimedia "and that worked out fine."
And that sort of felt wrong to me and I thought about it and I was like, yeah that's why CD-ROMs don't work well for interactivity.
We always had to do this like crazy amount of like let's keep stuff in memory, because the seeks were really slow, and like that's what happens is that like by default what happens for video is VCR controls and the level of interactivity that anybody has is VCR controls.
And so if you want to have better than that, like you know we have to actually exercise for a cause.
So, I would like to see you know a kind of platform on the web that's based on open protocols that allows for the type of interactivity that Doug Engelbart was doing in the sixties.
Steve: And so something that would power some of these interactivities, relatively newer interactions like you know people chatting alongside a Twitch stream or HQ trivia and things like that-- Is that kind of what you're talk about with this interactivity between video streams?
Sarah: So, that is one kind of interactivity. That's not really what I was talking about.
I was talking about like the ability to actually do like linking across video say.
So, imagine that this podcast was a video and I had slides of the timeline of RTMP and somebody is watching this and they could click on that time line and then go to a 30-minute video about the history of the internet at that time right?
This is what we envisioned in the early nineties right, interactive television was going to be like, and it's just never been a thing for reasons, but I think that could be a whole another podcast.
Why did ITV not happen? But I think that now we have the platform for ITV right?
That people were trying to create with dedicated hardware and software in the early nineties.
And now it's the Internet and we have these notions of linking and having additional information to augment our stuff on the web in text and graphics right?
That does not happen in video, and there are some nice implementations of it. I have to say the X-ray stuff that Amazon Prime does right?
There is some sweet stuff that is being done, but it is incredibly fancy engineering, and I think in the future, it should be as easy as marking up HTML page to do many of those things in video and people then trying to do this since before I was born.
Nick: I think at the very least building in that kind of flexibility into new protocols and like getting in like you're saying getting involved with QUIC. Put somebody in the room and saying what about the data going the other way? How do we keep that fast?
At the very least, even if it's not one specific product that wins, it opens up an ecosystem where people can actually imagine whole new things and then have the tools to go build them.
Not be held back by "Oh this protocol is actually "only good for one way data" or "oh, you can't really mix different media types "over these connections," and yeah that's interesting sort of notion there of being in the room so that those things at least become possibilities.
Cool I think one last request is please can we have Opus, maybe AV1? H265 can come too.
Sarah: Duly noted.
Nick: It wasn't a yes.
Sarah: I won't preannounce the details of our not yet fully designed protocol.
Nick: I will try and get involved in that room.
Sarah: Excellent.
Matt: And one of those rooms that Sarah mentioned was RTMP on Videodev, if you are not familiar, that's a Slack organization full of smart people that work with video everyday.
So that's video-dev.org. There is also a link on demuxed.com.
I think it's in the footer and header, and I think if you just go there and search dev, you will find it.
On that note the other big news we officially as of this morning have signed the contract with the venue for 2020 in San Francisco.
So that's going to be October 7th and 8th at the SVN West in San Francisco. So, right off the Bart lines, so we are not in deep dog pads.
Nick: Fancy new digs.
Matt: Fancy new digs, yeah it's pretty sweet. I'm really excited about it. I'm not going to lie.
But yeah that's October 7th and 8th, book it on your calendars.
And actually this is the first time I am officially publicly announcing it, but by the time this is actually published it'll probably be already be out there in the wild.
So, you retroactively heard it here first.
Sarah: We heard it here first.
Matt: Yeah y'all heard it here first. So, anyway thank you so much again Sarah. It's an honor to get to chat with you about this today.
Sarah: It's a pleasure to be here. Thank you.
Subscribe to Heavybit Updates
Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.
Content from the Library
Generationship Ep. #5, Live from DevGuild: AI Summit
In episode 5 of Generationship, Rachel Chalmers shares interviews from Heavybit’s 2023 DevGuild: AI Summit on October 19th, 2023....
The Kubelist Podcast Ep. #39, Live From KubeCon 2023
In episode 39 of The Kubelist Podcast, Marc and Benjie recount their experience at KubeCon 2023 and share interviews from the...
Demuxed Ep. #20, Demuxed 2023 Conference Preview
In episode 20 of Demuxed, Matt and Phil share a special preview of the Demuxed 2023 Video Conference. Together they discuss the...