July 10, 2019
Ep. #42, Structuring Content with Simen Svale Skogsrud and Knut Melvær of Sanity.io
On episode 42 of JAMstack Radio, Brian is joined by Simen Svale Skogsrud and Knut Melvær of Sanity.io to discuss real time collaborative ed...
In the latest episode of Demuxed, Matt, Steve and Phil are joined by Ryan Damm. Ryan founded Visby to take a light field approach to VR. Ryan explains what light fields are, and reveals some of the interesting things that can be done with them. He shows how today’s display technology can handle light fields and makes the bold prediction that in 10 years light fields will be broadly adopted by tv’s and phones using holographic displays.
About the Guests
Ryan Damm created Visby to build foundational technology for next generation displays. Visby is developing a new way of creating, compressing, and displaying holographic light fields for future displays based on virtual reality, augmented reality, and holographic panels. Prior to Visby, Ryan was Managing Director of ICGN and Director of Media Technology at Indoorcycling Group.
Matt McClure: When we're recording this, we're on the way towards Demuxed 2017. By the time this gets to your phone or browser, Demuxed might already be done. But you can find all the recordings on Twitch and YouTube, if that's the case. So go to Demuxed.com. You can check out the schedule. If this is before October fifth when you're hearing this, go buy a ticket and we'll see you in San Francisco. So today I'm really excited to have Ryan Damm from Visby on. If you're a member of the SF Video meetup, you may have heard him, what month was that, Ryan?
Ryan Damm: Oh god, ages ago, lifetimes ago.
Matt: Yeah, probably like nine years ago at this point. Ryan came and gave a talk at SF Video a few months ago about light fields and what they're working on at Visby and how that interacts with the video industry moving forward. So really excited to have you on the show today, Ryan.
Ryan: Thanks for having me.
Matt: So why don't you tell everybody a little about yourself?
Ryan: Gosh, where do I start?
I founded Visby a couple years ago, looking at the VR space and decided that we needed to have more of a light field approach.
We can dig into what that means. Before that, I've been just kind of a video guy at various tech and media-related startups for a while. That's the 10-second version.
Matt: So yeah, let's start off with a really basic question here. But what exactly are light fields?
Ryan: Yeah, that's the basic but million-dollar question, right, and one I apparently don't get tired of answering.
At a high level, light fields are a different way of representing imagery, where you keep every single light ray separate. So you can image all the light that's bouncing around a room, and if you keep every single light ray individually separate, you can do interesting things with them.
I think the phrase came into popular awareness with the refocusing cameras that Lytro came out with a few years ago. Just one application of this light field approach. But of course I think there's a lot more that can be done with it. You, know, at high level, that's what it is. It's looking at the statistics of light and light across a large area rather than through the pinhole of a camera.
Steve Heffernan: So when you say interesting things that can be done with it, what are some easy examples of it? I've seen like refocusing, maybe?
Ryan: Almost all the things you can do with a light field that are interesting involve different ways of slicing and dicing the light rays after the fact. So the refocusing is saying, "okay, we've got this very small, four-dimensional raster of light rays. If we bend them in different ways, we can actually generate different two-dimensional images out of them," right, two-dimensional slices of this 4D raster. When we start talking about large-scale light fields, and my interest in them is much more around arbitrary perspective generation. So this gets a little wonky, but
If you're in a VR headset or an AR headset, and you're wandering around, you build and deliver different perspectives to people on the fly. If you have all the light rays separate, you can actually infer what someone would have seen and sort of give them a realistic view from that.
So arbitrary perspective generation, holographic imaging, to actually kind of make it a little more user friendly, the ability to deliver multiple perspectives instead of a flat image on a poster.
Matt: So when you spoke at SF Video, my only frame of reference for light fields, vaguely, was, A, what I'd heard from Adam Brown, one of my colleagues that used to work at OTOY. And he had mentioned it at one point, I'm sure, in the past. But I think my perspective, and I'm sure a lot of other consumers generally, is things like Lytro are the only exposure that people have to the concept of light fields. So how does that relate to this conversation, Lytro cameras and things like that?
Ryan: Yeah, Lytro I think really deserves credit for bringing the idea to the masses, right? And their first application of light fields was that refocusing camera. Not exactly a commercial success, but definitely a very interesting application of this sort of idea. And the reason they started out with these refocusing cameras, frankly, is because it's what they could do. Light fields have been sort of an academic curiosity for a long time, and it sort of bubbled up in the public consciousness a few times. Refocusing cameras was the first.
Lytro retooled recently, like about 2015, and started focusing on VR more and sort of professional cinema applications. And these are just expanding on more things you can do when you start to have these very large rasters of rays. I think if we want to cut through the jargon and just maybe supply some terms, it's holographic imaging. It's ways of creating things that are way more realistic than two-dimensional images. And that's super exciting.
Steve: You mentioned a cinema use case.
Ryan: Yeah, so Lytro's other branch of research is not just on VR and AR. It's on kind of improved imaging for the cinematic market. You can think of it as their refocusing camera on ridiculous steroids. Alex, you've seen pictures of this. It made some news maybe in 2016. They had this 755 megapixel, 350 frame a second, crazy camera thing that they built. Crazy custom optics, lenses and so on. So they kind of have two product lines right now. That's their big cinema, that's meant for traditional, 2D cinema. And then they have their Immerge, which is more for VR and AR.
Steve: Is the customer use case there that somebody on a movie set could be using this camera to capture the film, but then later in post-processing, they decide, "oh, I didn't like that angle perfectly, I didn't like that focus." And they could actually be almost moving the camera person around a space after the fact?
Ryan: Yeah, they can do some amount of kind of physically moving where the point of perspective is. You know, this all kind of traces back to the fact that you're just picking different light rays to generate an image, right? And so in a refocusing scenario, you're sort of re-bending the rays so there's a different applied focal point. In the more cinematic applications, they have really powerful ways of changing the F number of the taking lens or slightly moving the perspective or doing depth matting, just a few of the features they've shown off.
It all ties back to, you have these rays that haven't been kind of summed together, so you can pick them apart and do interesting stuff with them. I don't think we really even scratched the surface of what that means.
They've done some things that are physically impossible to capture. They demonstrated imagery from the equivalent of an F 0.3 lens I think was one of the numbers, which is physically impossible to do.
Steve: What is F 0.3?
Ryan: So the F number on a lens is sort of like the ratio of the aperture to the focal length. That's the nerdy definition of it. But it really relates to is how narrow is the depth of field? And it turns out there's physical limits on kind of how many light rays you can gather together and make a sharp image. But if you keep the light rays separate, you can sort of arbitrarily mathematically bend them in ways that are optically impossible. And so they're really sort of pushing the boundaries of what's possible in two-dimensional imaging by starting out with light fields, which is interesting. And frankly, I think it's a transitional case for light fields. I think that's interesting.
The reason I started Visby was not to go make better 2D images. It's more like what can light fields do as displays start to change? Because to me it's all about future displays. VR is the first, we're going to get augmented reality in some form or another. But I think eventually these TVs are all going to be multi-perspective.
Phil Cluff: And that Lytro camera is an insanely expensive piece of hardware kit, right?
Ryan: Yeah, you have to like fund a company with a bunch of money to get one of them. It's a 200 million dollar camera or something like that. I mean, it's hardcore R&D, right? They're spending the money to kind of figure these things out for the first time. My understanding is, just I hear through the grapevine that they're working on miniaturizing it and making it cheaper. I don't think it's going to be a mass market. I still it's going to be a very specialized piece of kit. But something that's not a one-off that shows up in its own truck, right? Fun company to pay attention to. And they're doing lots of cool stuff.
Phil: How does this play into the AR, VR, 360 space? Where's the commonality, where's the difference? We're all very used to, now, we weren't maybe 12 months ago, but now we're all very used to talking about VIVE headsets and that sort of thing and where we see 3D generated worlds and some very rudimentary captured AR, VR video. I can't remember the right word for the hemisperical produced piece of content with kind of a static point of view. How does this play into that, then? How does this change what AR looks like?
Ryan: Yeah, so I think the background is VR right now, which is kind of in market, is basically just a video game technology. And all the hype, the deflation of the hype cycle I think is as
People are starting to realize all the projections around how transformative VR can be really rely on it being something other than a video game peripheral. And we haven't yet seen that happen.
But the imaging chain is based on video game stack, basically. If you want to go work in it, you're working at Unity, you're working on Real. The basic unit of visual information is a polygon, right? And that's fine. And then alongside that you have 360 video, right, the rectangular or whatever.
We're abusing 2D video tools to wrap them around someone's head. It's still fundamentally a 2D technology.
You can look all around, but it's just a sphere around your head. Maybe it's a stereosphere, if you're really fancy. That doesn't really unlock the power of these displays though. The power of these displays is you can walk around and see new perspectives and really feel like you're in a place. And the problem with the polygon approach, the video game peripherals, is things look like a video game. So the way light fields fit in here is they have the promise to give you that positional tracking. So you can walk around and see new perspectives and feel like you're in a place, but actually gives you something that's photographically correct, not just kind of a bad, early 2000s video game.
And the reason we can do that is we're not necessarily representing the shape of the scene, which is photographically deficient, right? The shape is one thing, but we actually want to know how light interacts with the shapes. And so if you have all of the light rays, then you don't need to worry about the shape. You can just deliver the light rays that someone would have seen, and it should in theory be totally photorealistic. In practice, we're not quite there yet. This is one of those future technologies. But it's really necessary. Otherwise all these things will remain video game peripherals. I think in you guys' first episode, you were like, is VR the thing that's going to die in two years? The next hype cycle, so. So not if light fields come along.
Matt: Yeah, my 3D TV is good for accidentally getting turned on and being blurry. I finally had to remove it from our Harmony remote, because we kept accidentally bumping it on the couch and then like, "why does this look like such shit? Oh, right, yes."
Phil: It was literally only ever good for two-player video games, different screens, literally the only good thing.
Matt: It'd be awkward to share a VR headset that way, so this is not potentially less useful.
Phil: Like, we can't do this with existing VR headsets, right? We're going to have to come up with a different display technology for this as well, right?
Ryan: Well, okay, so no.
Phil: Or are we going to be able to use the same display hardware and just do some very cool stuff with what we captured from light fields to render it?
My company's in the process of proving that today's display technology can actually handle light fields.
We've got a little demo. Of course, I can't show you guys today bebecause I traveled light. But the demo runs on a laptop, yeah. But no, we've got something running in an Oculus, and it is photorealistic, and it is six degree of freedom. Still very early days, but it can be done. There's just a lot more work to do before it's kind of commercially viable. But it does work. These displays can draw two million pixels, which end up being light rays poured into your eye on the fly, and if you are clever, you can generate the light rays that came from the real world and give someone the right view. So I think these are functionally holographic displays, right, they're capable of giving you six degree of freedom imagery.
It's just the software stack is way behind. And that's kind of what Visby's working on. That's part of what Lytro's working on. It's what OTOY's working on. And there's a few other companies. I wouldn't say it's a race. I'd say that we're all trying to figure out what it's going to take to get these things to be practical. There's a series of pretty tough technical challenges that still have to be tackled.
Steve: Yeah, so I think I understand, so the promise here is that basically something that feels like a VR video game, but it's real life, right, like you're actually looking at real images. What needs to happen, because the demos that I've seen around light field have been in one place. And so the camera's set up in one place, and you can move your head around where you are, you can maybe move as much as a few feet. What needs to happen to where you can actually walk around in an environment?
Ryan: So this brings a really good point. I think that light fields are really critical for the future, but they're not going to be the entire future. There are advantages to having a polygon representation of the world or a point cloud representation. And being able to walk around anywhere you want is actually one of the big advantages of the polygons.
Because if you know the shape of an object, you can now synthesize a view of it from any direction.
A pure light field approach, in which you're actually sampling light rays and then trying to re-bend them and draw perspectives from them, you have to actually capture the light rays in the first place. So you're actually limited by the physical extent of your capture device.
How many cameras do you have wired together, basically. So if you want somebody to be able to walk around a whole room, you'd sort of be stuck having to put cameras around the edges of the room. And that's possible, and we've been playing around with that. We were talking to an equipment company yesterday about how many cameras can we pack in a small space and what would that look like? But when you start to have those kinds of conversations, it's very clear that there's sort of a limit to how physically far you can go. Now, I think it's worth it if you can bring the real world in and it looks realistic, right?
You're not limited by the subject matter. You're not only shooting on green screens, you're not only filming very, very carefully lit subjects. But you can say, "let's go transport someone "to somewhere in the real world and have it look good. "But by the way, if you take more than two steps in any direction, everything breaks." So I think it's kind of going to be the trade-off.
Phil: What hardware are you using right now? What is your capture setup?
Ryan: Oh, it's incredibly embarrassing. We're a minimally-funded startup, so we're using a bunch of knock-off GoPros. That's not expensive, it's a Xiaoyi Action camera. They're actually better than GoPros but they're cheaper. Therefore they look inferior. But they're fantastic. You can synchronize them all. But they're like $200 cameras. So we have 50 of them in a big grid. So it's effectively a 100 megapixel imager. That part sounds pretty cool. But I am wrangling like 50 tiny micro SD cards at the same time, so feels like a big step back.
Matt: So it's definitely consumer-ready, is what you're saying.
Ryan: Oh, yeah, absolutely, yeah. There'll be a signup link I can send you guys, you guys can order 50 cameras at a time and wire them together yourselves. You're ready to go.
Matt: You just need 50 SD card readers or
Ryan: Patience and tweezers, and you're good to go. The workflow's awesome.
Steve: So what do you see as the most exciting use cases in the future?
Ryan: So I'm going to go crazy on you here.
I think in 10 years, it's all going to be light fields. I think our TV's going to be holographic. I think that your phone's going to have these holographic displays, and I think the entire signal chain is going to have to be multi-perspective.
The display hardware's actually getting there. I've seen stuff behind closed doors of holographic panels that are really convincing. Nothing commercially available yet, but it's coming. And I think unlike 3D TV, a holographic panel's going to be a big upgrade. People are going to want to be able to hang a window on their wall that they can tune in to different places, right? So that's the long term crazy thought. Everything's going to go holographic.
And at that point, all the work that's gone into these 2D codecs are going to have to go into these four-dimensional codecs and light fields codecs. That's where I start to get intimidated by how much work there remains to be done. Like, "oh my god, there's so much to do." We're like barely baby steps right now. Like, "hey, we got 30 frames to play back on a laptop with a GTX 1080 in it! We're there!" It's like, no, we can't capture, we can't stream, we can't edit. There's so many things to do still. So 10 years is not a long time. We've gotta get to work. So I don't know if that's a satisfying answer or not. But that's the big idea, I think.
Steve: Yeah, well, you touched on something that's kind of interesting. What does it look like, so if everything's light fields, what does the lean back experience look like? If you're sitting on the couch, how does that impact you. I can understand you've got the VR goggles on. How does it impact like, you're sitting on the couch and you're looking at something that's of course light field up on the wall?
Ryan: So I think it may look like a minor improvement over a lot of lean back entertainment today. The main advantage will just be that it will seem really real, right? And it sounds minor, but even the tiny bit of perspective change as you shift around slightly makes it feel like you're not looking at a flat image anymore. And we don't really appreciate how much mental training has gone into understanding a 2D image as an image and trying to transport yourself there. None of that cognitive load is involved when you're looking at a holographic image. You really feel like you're looking through a window, and it feels very natural. That could be pleasant or unpleasant. I don't think I want to watch horror films that way.
But even minor movements actually give you a really strong sense of depth, much much than just stereopsis. It turns out that the tiny micro-movements, we're seeing how objects shift relative to one another, tiny amounts of parallax. When those are correlated with our proprioception, it feels really real. So short of, I'm not sure if I can verbalize it. But trust me, when you experience it, it's a different level entirely. So I think it's just, things are going to look better, right? It's not as easy to convey as when television went from black and white to color. But it might be a similar upgrade in perceptual quality.
Steve: This is a really dumb example, but we try and get our dog to use FaceTime. And she has no idea what's going on. Like, you can tell she can't tell what the image actually is. And so I wonder with light field, if it actually looked real.
Ryan: Should be closer. I mean, she's probably way more interested in if it smelled like you, unfortunately. Their experience of the world is probably a bit different. But yeah.
Matt: So I had a note that I wanted to ask about how this is going to affect cinematography in the future. And I think we're already touching on that a little bit, but what comes to mind is the jump from 30 fps to 60 fps and how it kind of feels, at least in my opinion, I kind of hate it. Like, I hate the, how do you call it.
Ryan: The soap opera look.
Matt: I didn't coin this phrase, I'm sure somebody else did. Yeah, the soap opera effect that you have on TVs that upscale it. And it can make otherwise good actors just look terrible. Because everything looks more real, you judge everything a lot more. So do you see similar things happening with this?
This feels like the next level of that. And then on top of that, so my other question would be I've heard some people say that they see one of the benefits of light field for cinematographers is similar to what people see as the real benefit of 360 video, whereas if you have 360 degrees of capture, you have more freedom with what you do with your camera. So if I'm snowboarding with a 360 degree camera on my head, I don't have to worry about looking at all the interesting things. I can edit the video later to capture what's cool.
Ryan: Yeah, even like a two-dimensional cutout from the 360.
Matt: Right, exactly.
Ryan: Okay, all right.
Matt: So I've heard people say similar things about light fields, is you have more freedom later to go in and adjust the perspective a little bit from a director's standpoint.
Ryan: Those are two very complex questions. Let me tackle the cinematography one first. I think that on sets in general, and this is just anecdotal conversations I've had. There has been some heartburn from on set directors of photography, lighting DPs and so on about the number of choices and creative choices that are actually made in post-production, right? Color I think is the most recent, where color decisions used to be baked in by basically on how you expose the film or how you painted it into a camera system that wasn't raw. Now that decision's being deferred to post when a colorist might be making these decisions who wasn't on set and the DP's not there.
I think there's a little unease and some risk around light fields that a lot of the decisions you would normally make, what we consider the art of cinematography, is now being completely deferred to the user.
How do you frame the shot, the perspective, the framing, it's closer to theater, where the blocking actually determines the action, than cinematography, where the frame directs your attention. Bebecause it's not a 2D medium anymore. Now, I think there's always uncertainty around new technology. And some people are diving in and really excited. But we still don't know what the language is going to look like. And I'd be perfectly comfortable if people didn't want to call it cinematography anymore. It's a very different discipline. The creative language hasn't really been formulated yet.
That's punting a bit, but in truth I think there's a lot of work to do on the creative side. And to address the first question, actually, I had a deep conversation with a friend about 120 frame a second shooting. And I haven't seen it yet, but apparently Billy Lynn's Long Halftime Walk, Ang Lee's latest film, shot 120 frame a second, stereo 4K. And people said it radically changed acting. And the theory's now we're at high enough frame rates at that point that you can actually, all the tiny subconscious cues you get in how people's faces move, you're actually picking up on these tiny microexpressions that are incredibly telling.
And it changes the art of acting. And so I think what we're seeing is this isn't just an extension of cinematography or an extension to sort of traditional filmmaking. It's kind of a new discipline. And that's going to take us a few years of fumbling around to figure it out. I should say, though, I don't think it's going to be some hybrid of video games, like, "oh, it's all going to be interactive now." People have been talking about interactive stuff for 20 years, and it's never really quite panned out. I think people like lean back entertainment. But yeah, it'll change everything. That can be good, it can be bad.
Phil: So what does this mean for the video industry as a whole? We've got file formats now, we've got delivery technologies, we know how to do what we're doing now. Is that all completely changed? Is there just a fundamental disruption we're going to see over the next 10 years?
Ryan: The two are going to live alongside each other for quite a while. I should say, there's going to be some extensions on top of 2D video that will sort of bring it in these new mediums. I'm hearing a lot of people talking about RGB plus depth, ways of augmenting traditional 2D images so that they've got a little bit of depth to them, so someone can move around a tiny bit. I haven't seen a great implementation of it, but I think that's going to live kind of at the low end and be a transitional format for a long time. So I would say that in traditional video chains, there's going to be a lot of grafting stuff on to support these new media.
At the same time, light fields and polymetric approaches and these sort of new holographic approaches are so raw that there's going to be a lot of work where people take the lessons learned over the last 20 years of online video and start applying it to these new representations. The challenge is the data points themselves are radically different. I mean, people are playing around with form a whole lot, but literally the way images are represented in bytes has changed completely. There has been work on polygon compression kind of in the video game industry already, so you may that.
You may see people having to kind of hybridize a bit, start to get smart about that stuff.
On the light field side, I'd say it's still in the realm of PhD math right now.
So I think eventually get to the point where formats are locked and people can start to do real things on top of it. But it's pretty hard to move that ball forward for a little while.
Phil: I think formats is a really interesting one to talk about, right? I was at a MPEG meeting about 12 months ago now where there was initial talk about trying to come up with some sort of codec for light field representation of data. Do you know about where that's got to? What are you using to represent light field data and is that going to an industry standard?
Ryan: Yeah, I can't talk in detail about how we're representing light fields, for two really important reasons. One is that it's a bit of trade secret, and the other one is that I don't understand it at all. It's completely over my head. So I'm not going to mess things up by trying to give you a garbled version of it. In terms of the standards bodies, though, I can say this. I think we're still at the very early days when people are trying to agree on terminology. And if you look at some of the proposals that are out there, the JPEG Pleno working group's doing a lot of good stuff. And they're sort of broadcasting some ideas.
There's an MPEG working group. VRIF, I think, is meeting in December, which we may be attending. But we're still at the phase where everyone's kind of throwing everything at the wall and seeing what's going to stick bebecause there's nothing around light fields, anyway, that's practical yet. And I think whatever becomes practical may well be the way towards a standard. But all the proposals right now are a grab bag. They're saying well, we should support point clouds and polygon-based approaches and light field rasters and, I think if you just look at the proposals, I don't see anything on the light fields side that's ready for prime time.
Now, we think our representation can be made quite small. We've got some early proof of concept. That, by the way, is the big barrier. You want to get light fields into, like, some sort of workable standard, you have to have some way to compress it down. The data rates we're talking about, in theory an uncompressed light field is like, petabytes in size. So how do you deal with that? As soon as someone gets this to work, I think that's going to probably be the first down payment on the standard. But don't hold your breath. I don't think it's going to be hashed out in an MPEG meeting in the next two months.
Matt: That was actually a big question that I was going to ask, is, to Phil's point, we kind of know how to work with the formats that we have today. So throwing a new format into the mix isn't the end of the world. But with the explosion of video on the internet in general, in some places we're already kind of starting to test the edges of our current network capacity already.
Matt: So all of a sudden, if we're talking 10x-ing, 20x-ing, 30x-ing what we're delivering, and the infrastructure hasn't caught up there yet, what does that even mean? What are we talking about here in terms of, say we can get some sort of standard. What do you think the lower bounds of file size and what we're delivering looks like there?
We haven't found the bottom yet, but I can tell you we're aiming for 100 megabit, which is incredibly aggressive, you know, for a good, high-resolution, multi-megapixel light field.
I think we're going to hit it. We've got a great team. We're working really hard at it. We've seen really good progress, and we're on the right track for it. But we're still not ready to come to market with it yet. I think at 100 megabit, it will work with today's infrastructure. If it's significantly above that, I was talking to someone who's big in the industry who's saying 500 megabit is sort of his lower bound for practical applications. If you can't get to there, it's just a curiosity. It's never going to make it to market. So I guess to punt to the answer, it is somewhere between 100 and 500 megabit is required, and until we get there, it's going to remain a curiosity. But I think we'll be there within a year. Now, that's my aggressive projection. We'll be there within a year.
Phil: So realistically, this isn't going to be over IP for a long time, right? Maybe 100 megabit, I mean, how much of the world is connected at 100 megabit? Not a huge amount outside of major built-up areas, right? Maybe in 10 years, yeah, maybe. I think it's going to be challenging, right?
Ryan: Yeah, I mean, so how many people have high-end VR headsets that can even use a light field right now? We're still at the foundational stages. We are keeping our eye on 5G, I think to the point where we can stream over next-gen cell networks. Fortunately, install base on phones turns over really quickly. So we may see sort of realistic bandwidth down to the consumer jumping up a lot in the next two, three years, I think. There's been a big push behind that roll out. So I wouldn't say we're counting on 5G, but we're hopeful. I mean, you know, to frame this a bit, 100 megabit is insane.
I just want to say that right off the bat. There's no way we should be talking about that.
An uncompressed light field is completely preposterously large. And we're not dealing with pixel-based formats anymore.
I think it's sort of an interesting technical point the audience may appreciate, there is a ground truth when you're dealing with a 2D video. You look at an uncompressed video, you know that you've got some raster, you got some value at each raster point. True light fields are actually, could be perspective free or arbitrary perspective. You can't require the VR user to be standing on a grid and exactly turn their head in 0.1 degree increments or something, right? It's any arbitrary angle. And so you've got this almost analog space, the data space is so large, if you want to be high fidelity. So we've made some breakthroughs at Visby where we feel we can get down to 100 megabit, and that alone, we should be celebrating.
Matt: So for reference here, so some people, to try to improve delivery of 360 video, for example, they try to guess where your head's going, encode your field of vision and then maybe encode like a little bit around it and then just keep up with your head movement. Sounds like that's such a minor, in comparison to what we're talking about now, that is such a small optimization.
Ryan: Well, there has to be an adaptive bit rate version of this, right? So you have to have some way in which you can sort of spend your bits where you think someone's going to look. For sure, that's going to be an issue. We're still at the stage where people are worrying about representation, so we're keeping in mind that we're going to want to do adaptive bit rate streaming and some sort of prediction as we build this thing out, but that's probably two steps away from now, to be frank.
Our goal is to get something that someone can download overnight and then watch a beautiful cinematic experience at home and feel like, okay, that's a really important milestone. The next step would be, okay, I can kind of I can stream it, and after a little bit of buffering, watch it. And then I think a few years from now, we're going to be at the point where someone will just be livestreaming, set up cameras and livestream it out to end clients.
That might be five years away, to be completely honest.
Phil: I think that's a really exciting prospect, right, like getting to a true livestreaming experience for this. I love the idea of being able to watch, like watch Glastonbury but through a light field camera, where I can then kind of move around, look around, and stuff. I think that would be an amazing use of the technology.
Ryan: Yeah, well, it's on our roadmap. It's unfortunately a few years off. So, you know, sit tight.
Ryan: We actually believe there's going to have to be a change in kind of camera technology before that's possible. One of the big challenges you have, if you've got a 50 camera, 200 camera array, is the raw data coming off those cameras is just so much to process. It's just really hard to see it getting to real time any way soon. So we have some IP around ways to re-configure cameras that might make it easier to these things, but that's an R&D project for us.
Phil: How distributable is that kind of workload? If you've got some form of stream coming off, say, a grid of 50, 100 cameras, right, you have to, presumably can do some level of parallel post-processing on that, but at some point it's gotta get down to like one data stream, right?
Ryan: We haven't really invested in saying, "oh, you got 200 cameras, how can we pull all this data into one pipe and then start to reshuffle it so it can be processed quickly." The way we handle light fields, I don't want to edge into trade secret territory, but it's highly parallelizable, right, you could in theory just throw a ton of parallel compute at that thing and maybe use it and conclude we can get close to real time. But the whole thing just feels so impractical that we're not investing in that part of it yet. I think there are other breakthroughs that will make it a lot easier. So again, I think it's a five-year timeline for that. Yeah, that's as far as I'll go on that one, how's that?
Phil: Nice, absolutely, that's fascinating.
Matt: I think the interesting point of reference here coming from the traditional transcoding delivery playback world, when we're talking about 100 megabit being kind of the lower end of phenomenal that we can expect, if we're getting there and everybody's able to handle it, then do we even need to encode content anymore, or are we just shoving pro-res files straight down the pipe to people? How many transcoding businesses are going to go out of business if we don't, the file size just does not matter in a traditional sense?
Ryan: Well, the other way of looking at it is like consumer expectations are going to get higher, resolutions are going to go up. So relative advantage is always powerful. Sweet, sweet 16K video. Well, and if you look at proposals, I should say, I'm a crazy outlier. The stuff I'm talking about is nuts compared to I think where the center of mass of conversation in the VR industry is today.
So if you look at what most people are talking about, they're talking about well, it'll be great when we get to like 20K, 2D video that we can make into stereo or whatever. So there are still people talking about these preposterously high resolutions for kind of traditional, pixel-based images. And that might be part of the mix. I think that that plus some amount of depth augmenting will be important.
So all the hardcore algorithms people are using to compress 2D rasters right now will be repurposed and augmented and be really important.
If we want to get to livestreaming Glastonbury in two years, there are half a dozen companies working on that right now, sitting on top of the existing video stack and doing extensions to it.
So I think that might be where the real heat and light is for the next couple of years.
Ryan: It's important that I not say that everyone who's listening to this, your job is irrelevant in five years, right? You don't want to like, thank you very much.
Matt: Get your retraining in now.
Steve: Would it be possible in the future that we record one of these podcasts using light fields?
Ryan: How many hard drives do you have?
Steve: We have the cloud.
Ryan: As long as your app is big enough, we can take care of it. Actually, last shoot we did, the way we moved the data to the cloud is we got one of those Snowball appliances.
Steve: Oh, yeah, the Amazon ones?
Ryan: Yeah, yeah, yeah. So to give you some idea of the data we're talking about here, it's not small. But it'd be fun. It would be a lot of fun, wouldn't it? Right now, us getting a shoot together is still a bit of a chore. But I've got a contract engineer working on just like wiring things together so I don't curse the world and all of my poor career choices every time we decide to go shoot something. So probably a couple months, that won't be an impossible ask.
Steve: Cool, that'd be fun.
Ryan: Yeah, it'd be a lot of fun. It's kind of intimidating to stare at 50 lenses, by the way. It's like, something to get used to.
Matt: I can't wait for this to be the action cam, somebody going down a mountain on a snowboard with 50 cameras strapped to their back.
Ryan: I have to say, actually, speaking of poor career choices, I actually used to film, I used to strap a digital cinema camera to my chest and ride an electric mountain bike down slopes to produce video content for the fitness industry. So I've been there, I've crashed with hundreds of pounds of video equipment cabled in and taped to my body, and it's not very pleasant. So that's why I now just record podcasts and talk about technology.
Matt: That video's going to be available on Demux.com.
Ryan: Actually my first VR content I created was taking some of that footage and playing it back in a VR headset. Far and away the most nauseating thing I have ever experienced.
Steve: Oh, yeah.
Matt: So no thanks, on that one.
Phil: Something that did occur in my mind was do we think this is going to give more ability for tweaking to the end user? So one of the things that I've seen a lot more of is kind of picking your own experience, right? Like there are websites out there giving people the ability to pick a particular piece of the orchestra they want to hear more clearly or now mixing how they want to on the client end.
Is this going to open this sort of window for cinematography? Am I going to be able to kind of realistically play with the brightness I see and the contrast and the color space I see, do we think?
Ryan: Yeah, so in theory you could do that with 2D video right now, right? You could give an end user some sort of lighting control and they can change what they see. The thing that opens up that's not possible is that the end client really is the cinematographer. They're the ones who are choosing the framing and where they're looking and so on. And to be honest, I think that again, there are some mixed feelings about that in the traditional cinematography community.
And they're saying, "well, how do I direct someone's attention to the subject matter if they're staring off in the corner somewhere?" So I don't think we've really figured out the language for that, so it's a double edged sword, right? Yes, there's a ton of control on the end client side. We're not really sure how to use that. And there's some really kind of experiments in formalism right now, I would say, where people are turning off the lights or doing things to really manipulate attention.
And that has that feel of train pulling into the station kind of stuff, just trying to understand what the boundaries are and what the new language is. I should say also, though, to the extent this is going to be interactive, I'm a big skeptic. I think there's going to be a clear divide between video games and, for lack of a better word, call it cinematic content. And there'll be some formal experiments in the middle, and maybe someone finds a model there that works. But I think there's a reason that the touchstone for the interactive narrative is choose your own adventure books. There's a reason those aren't a raging success still. It's a quirk, it's a weird thing.
Steve: Yeah, we were at a VR meetup the other day, and they brought up an example, this is the same discussion here, is how are you going to deal with this option for people to look around and make their own story. And they pointed to an example where somebody had, basically if you're looking straight, you see this couple who looks happy, and they're in a good place, but if you look around, you're actually in this drab apartment, and it's in squalor, and you get more of the story.
Sorry, I haven't seen the example myself. But this is what they were describing. It made a lot of sense. It's like, oh, okay, so you can actually start to, if you do it right, you can maybe inform the story by allowing the person to look around more as opposed to just being a freedom thing.
Matt: Right, that layer on context.
Matt: Which is pretty cool.
Ryan: Yeah, that's a good point actually. So I'm already going to contradict myself here, just because, I don't know, that leads to interesting thought, which is the one change we have now is we actually have, for free, knowledge of where the user's looking. That's actually the structural change. And so it's possible, some of the informal experimentation's been done, I think the first I was aware of it was the Wild VR experience.
Very subtle, I didn't realize it was true, but it was a branching narrative based on where you happened to be looking. And again, because the SDK includes your gaze direction, you can do that invisibly, whereas I think it's really hokey if it's like, hey, here's a button. Do you want the person to go right or go left, do want the person to, you know, open a cabinet or shoot a gun?
It doesn't feel like a narrative. But if it's really subtle, then it feels like you're just watching a narrative. You don't know it's interactive. The downside is you don't know it's interactive, right? So is the experience actually any different than watching a simple linear narrative? You'd have no idea. If you're really into it and you go back and rewatch it and it's different, that's interesting, but it's sort of meta-interesting.
It doesn't actually change the viewed experience. So as I dive into the muddle, you can see, this is kind of where we are, right, is people have theories and they're experimenting and train pulling into the station, you know? It was fun to watch.
Matt: This has been fascinating. I thought about watching the podcast again so I could ask really awesome, informative, or not podcast, watch the meetup again and ask really awesome, informative questions, but this was awesome. It was a lot of new information from the meetup itself. So if you are interested in this and you want to see Ryan's talk at SF Video, you can find it on YouTube under the Heavybit library. Just look up Demuxed and Ryan Damm. I'm sure it'll come up eventually. Thanks again, Ryan. This is really great.
Ryan: Thanks for having me. This was a fun conversation.