1. Library
  2. Podcasts
  3. Jamstack Radio
  4. Ep. #113, AI-driven Audio with Peadar Coyle of Aflorithmic
Jamstack Radio
30 MIN

Ep. #113, AI-driven Audio with Peadar Coyle of Aflorithmic

  • Artificial Intelligence (AI)
  • Content Creation
  • Developer Relations
  • Machine Learning
GuestsPeadar Coyle
light mode

about the episode

In episode 113 of JAMstack Radio, Brian speaks with Peadar Coyle of Aflorithmic. They discuss exciting new use cases for AI-powered audio, the recent explosion of synthetic media, and the changing landscape of audio production at scale.

Peadar Coyle is Co-Founder and CTO of Aflorithmic, building audio infrastructure for developers and business. He’s excited about developer tools, and helped build PyMC3 enabling developers and ML to build explainable statistical models without knowing Bayesian Stats.

transcript

Brian Douglas: Welcome to another installment of JAMstack Radio. On the line we've got Peader Coyle from Aflorithmic. That's a tongue twister.

Peader Coyle: Yeah, it's a bit of a tongue twister. Yeah, so really, really excited to be here. Thank you very much. Really, really excited to talk to the JAMstack community. I think it's an awesome podcast.

Brian: Yeah. And say the name of the... since I flubbed it so poorly.

Peader: Yeah, so we're Aflorithmic, we're an API first solution for audio development or audio at scale.

Brian: Yeah, for sure. So I guess, Peader, can you intro yourself and tell us what you do? And how you got there?

Peader: Okay. So I'm CTO and co founder of Aflorithmic, and I've been working on developer tools for a long time. I was working on PyMC3, which is a noted machine learning library in Python, and I did a lot of open source work, and I worked a lot in the machine learning, data science space. But I gradually gravitated more towards data engineering and DevOps.

But now, unfortunately, like many CTOs, I just manage a team day to day so I don't really get my time to do a lot of coding which I miss and love. But I do try to keep myself involved, and I'll talk some more, some tools I like to use and stuff. I think it's very important that you're aware. I'm very, very excited about opening up human creativity, right?

So I think fundamentally, one of the most exciting things about software, and I've been building software since I was 16 and I'm 35 now, so it's a long time, is about building tools that users love that allow them to do something they couldn't do before. I think that is just a beautiful thing and that is literally what gets me excited first thing in the morning. It got me excited to set up this company, it gets me excited every day, and we're like three and a half years in now.

We've raised a bit of money, we raised a small seed round, we have some wonderful customers, I can talk some more about them later on. Meanwhile, it's a very interesting space, of course a very competitive space as well because there's a lot of developer tools, there's a lot of tools out there. It can be quite difficult to get developer mind share and stuff like that, so all these challenges can be very fun.

Brian: Excellent. So I would love to talk about the product and the approach that y'all are doing as well, so can you walk us through what you're working on as well as far as product goes?

Peader: Yeah, sure. So we're API first, so we work with any programming language. But fundamentally, we're for automating audio production at scale and we use a lot of AI, so we use synthetic speech generation, we use AI music generation. But fundamentally, what we really try to do is, you may go and hire a voice artist, you may go to a recording studio, these are quite technical tools.

One of my co-founders, Timo, was a musician, so there's a lot of technical tools there. I don't know if you know about the whole audio ecosystem, there are classically things called digital audio workstations, and the gap in the market we saw was these things are just too hard to learn to do. Fundamentally, what's very interesting is it's not so much competing at the high end, it's more like at the low end.

So you have the hobbyist, someone like myself who has no musical ability whatsoever, ironically for an audio company. It's unlocking that latent creativity. Let's give a couple of examples, for example, in this podcast you might want to generate an advert or generate many adverts and put them through to your podcast.

You may want a synthetic avatar or a games application, you may want to add voiceovers. You may want to add voiceovers to videos, and those are the kind of areas we see a lot of our interest and such.

Brian: Okay, yeah. And I'm familiar with the digital audio workstations. I spent a lot of time as a bedroom musician and I record a lot of content, GarageBand and Pro Tools. In comparison to what you're working on, Aflorithmic, I use a tool which is called Descript, and very similar. I don't really have a strong use case for it, it was installed when I was doing some quick video editing, really.

But what I liked about that, and I guess this is your similar approach, is it more of you're not educated in how audio works, but you know you have to edit a podcast. Is that what y'all are trying to solve?

Peader: I mean, it's an interesting example. I think that Descript is great if you're a podcast host and you want to edit content, I think Descript is an awesome tool. I think where we add more value is where you want to scale or dynamically generate audio. A lot of what actually happens is you might have something like localisation, so you might want to, in a game, add various parameters.

You say something like, "Hey, Brian. Hey, Peader. You're in the UK, you're in Spain, et cetera, et cetera." Right? And you see examples of advertising and stuff like this, samples in games. But I think what's probably more interesting is how that unlocks latent creativity, stuff you just couldn't do before because it was just too expensive to do. Sometimes you'll have customers who say, "I wanted to add a voiceover to this video, over and over again, and I just didn't have the time."

Because sitting and manually editing these things is quite cumbersome. Sometimes we see people who want to do walkthrough videos, a classical SaaS use case, you record a video. But what happens, for example, if your content slightly changes, right? So either your language changes, or your location, or you make slightly subtle changes to the product, you change the name of the feature, but your video still works. You don't want to be recording new voices each time.

Normally what actually happens is people just leave the video and, fundamentally, we see every video asset often needs some sort of audio asset to attach to itself. The other thing that's very exciting to me is things like localisation. You might have multiple languages, multiple accents, of course a very important thing in the United States of America, you might want to have multiple accents to get your message across in different parts. Things like multiple topics and stuff like this. Does that make any sense to you? I can dive a little bit deeper in that, for whatever.

Brian: Yeah, so the AI side, am I using your product to basically train the AI to learn my voice, so that way when I create my tutorials on Screencast, I can-

Peader: You can. We have a functionality for that. But often, you might not want to use that. You might just find a voice, we have like 600 plus voices, 50 plus languages, we're adding more day and day. You might want one that you particularly like because it particularly suits your brand or your use case. Unless you're a professional voice actor or actress, you might not have the capability or an ability to really record your voice well. You might want to do it, and we encourage you to do it, but you may want to just use something off the shelf instead.

Brian: Yeah. I got it. You provided us some links before, one of the links you provide is the Digital Humans, where it sounds like you could chat to Albert Einstein. I guess it's Einstein himself, because I'm not sure if his name is actually trademarked. But to talk to humans or this AI. Can you explain a bit about that product?

Peader: Yeah. That's for one of our partners, Unique, and they wanted to build... They build digital humans for the Metaverse, there's a couple companies like that out there. We reproduced or we were inspired by Einstein, we got a voice actor, we did a whole process, we documented it. There's a lot of care and stuff involved in doing that. You want to represent it as well as possible.

But people loved it, still heavily used. People use it as an exploratory tool, people learning about science, someone with a physics degree getting a lot of joy that people were interacting with Einstein, a digital Einstein. You get this chat bot experience. I think in the future when we have more Metaverse-like experiences, I think you'll just see more and more of these things. Some of these things will be entirely synthetic, but some of these things will be based on real life characters, and some of these things will be for educational needs which I think is a very interesting use case.

Some of these things will be for possibly even use cases that I would have trouble coming up with right now. We've seen tech in the last 10 or 20 years, the rise of things like NFTs, et cetera. I wouldn't have predicted that, for example. I think something like the Metaverse, I'll probably not try to make any strong prediction about where things are going to go.

Brian: Yeah. But I would say by investing in the AI part of this technology, it seems like AI, it's a big thing right now. I don't know if you saw recently, maybe you were part of it, but the AI Grant which Nat Friedman, former CEO of GitHub, is pushing out and asking for folks to apply to this so it can advance the technology and the entire space?

Peader: Yeah. That's something that I'm personally very interested in, and we spend a lot of time as a team thinking about this. So I think fundamentally, one of the problems, I don't mean this to criticize, I've worked in AI for a long time, I don't to criticize earlier products. But I think a problem a lot of earlier AI products did was they... It was like AI was the magic and they never thought about product-market fit. Product-market fit is always essential.

One question, and I've been thinking this with the rise of Stable Diffusion and these prompts and these AI generated models, large language models, what kind of products does that open up? With ours, with the rise of text to speech, your WaveNet based models and similar other deep learning models, you basically had the capability to build products that you couldn't do before. My CEO, Timo, said, "What do you do whenever AI can make a human-level synthetic speech? What applications does that open up?"

We spent like two years just doing heavy R&D to really explore what we were doing, what we were building. Of course that can be a bit frustrating because when you're exploring it's not really clear. I remember, I found the whole AI Grant thing by Nat Friedman was awesome. He talked a lot about where is the intersection? That's a lot of the challenge we have.

Where do you do something purely AI generated? Where do you do a template? Where do you use old fashioned technology because that's a better user experience? How do you allow someone to record their voice in an easy to use manner? How do you make that an easy experience?

Because a lot of that stuff is UX, some of that is AI, like filtering, digital signal processing whatever you want to do. If you're writing content, how do you be inspired by things like large language models or whatever, to write good content to get passed writer's block so that you can produce a great audio experience for your customers?

There's a lot of things you can do wrong there, if you just throw AI models at it and not really think from a product point of view, I think you're going to run into problems because you're going to do something that hits the Uncanny Valley, or doesn't really fit into the workflow that a developer does. Because whenever you're building a developer tool or developer-first tool, you don't live just in there head by yourself, you live in their head with many, many other tools, right?

So you have think about all those sort of things. I think what's really excited about this space is we're only getting started at figuring out what all those levers are, and I think that's why it's such a platform shift that people are very excited about. Of course a lot of venture capitalists like Nat Friedman are very excited about these things for that reason. But I think there's no playbook, right? And that I think is the risk and the opportunity there.

Brian: Is that something that you're looking to solve with your product?

Peader: How do you mean?

Brian: Yeah, so you mentioned it was selling magic versus actually having product-market fit, and I appreciate when... Stable Diffusion, being able to understand, "Okay, type in here and it'll go to image." That is closer to product-market fit than generating just random images on the fly through generative art, I guess was the previous use case.

So when it comes to what you're doing, you're providing a path for people to pick a language model or pick a digital human, which is what Unique was doing, but using a similar technology. So I guess the question is, are you creating that onramp for a bunch of folks who are uneducated in the AI side of the audio piece?

Peader: So what do you mean by uneducated in AI? Are you saying that the product is not for someone who understands AI, the product is just for someone who can code, right?

Brian: Yes.

Peader: Yeah. I think we've solved a lot of this, and I encourage everyone listening to give it a play, send me some emails if you have some issues. You should be able to get it up and running quite quickly, but there are still a lot of, I think, UX challenges. We've always been a very product company from day one. We were never really thinking about just doing deep tech for the sake of it, we always wanted to solve real world problems. I think that's why we're quite user focused, I spend a lot of time even as CTO talking to users and trying to understand how we fit in with their tools, but also how to build better onramps.

Some of that can be using things like large language models or whatever, but some of that can be just clever templating or good user experience, or even subtle stuff like what do you name the endpoints of your API? Because that actually has a huge effect on your adoption rate, right? If your call your API endpoint something slightly esoteric, you hurt the user experience because a developer expects their tools to work in a certain way. So we think a lot about this and we're working on that.

I just want to be humble, that I don't think we're 100% there yet, but I'm very interested in this whole explosion of synthetic media, the rise of things like GitHub Copilot, and all these sort of tools, and what insights they give us about how this product space is. Because I think fundamentally, technology is indistinguishable from magic, right?

So fundamentally, AI enables you to do things that you couldn't do. We're seeing, for example, the rise of prompt engineers in the Stable Diffusion space, for example. We could have the rise of Aflorithmic engineers writing prompts so that you can write beautiful audio or something like this. A new understanding and have marketplaces like that, I could see that happening in the next couple of years.

Brian: Yeah. There's a use case that you actually provided in the shownotes that I wanted to actually talk about, because I've already fumbled through saying the name, but also I mispronounced your name prior. Your name is pronounced, it looks like Peter but it's pronounced Patter. Can you talk about how to overcome that sort of thing? Teaching robots on how to pronounce properly?

Peader: You're talking about our voice intelligence features?

Brian: Yeah.

Peader: I'm very excited about this, I think it's awesome. It was such a nitty gritty feature to build and we got it wrong the first two or three times. It's like any iterative approach. But fundamentally, names of cities, names of place names, human names, these all can be quite esoteric, right?

So a lot of classical text to speech models by themselves will often get these things wrong, and that can be very upsetting because a human doesn't want to feel that they're talking to a robot or whatever. Or they don't even recognize it because there's something slightly wrong. We use a bunch of techniques, we use a bunch of dictionaries, we use a lot of phoneme analysis, we use a lot of natural language processing.

We do also a certain amount of human curation because one thing we have discovered is it's not purely automated, for quality control it's quite important. But it can pronounce my name quite well, and we've implemented these things in many... It often comes up when you're dealing with things like brands. A good example is Renault, which most robot voice systems-

Brian: Renault?

Peader: Yeah, Renault. So you might want to make sure, that can be very confusing. So that's the kind of use cases that we see on that. We're constantly looking to make that easier, right? We're constantly on a mission. In the future we'll probably have something like automatic tagging of documents that you upload so people can at least process things well.

Another thing that recently encountered was the pronunciation of centimeters was quite difficult for some of these models, so that was quite... If you had got a news case and we've got a bunch of publishers as customers, pronouncing important stuff like that is quite confusing because it can be quite... And of course it just hurts the user experience, and of course keeping a good user experience is central to what we do.

Brian: Yeah. I like the idea of the developer experience, the user experience, making sure that pathway to being productive is quicker because as you approach new products or new ideas or new tools, there's always that onramp. It's something that I pride myself in the stuff that I touch and the products that I work on.

I always think about onboarding. If someone found this podcast or found something else that pointed them to this product, how quickly can they get started? And I'm curious of the developer angle in this use case too as well, so I imagine, you coming from the Python world, are there a lot of open source and stuff like that that's building this?

Peader: In terms of our SDKs, we currently only support JavaScript and Python. We have plans to do more, if you want to send suggestions I'll be open to that. But we also have developers using other libraries because it's just an API underneath it. You can just use Crowd request and PHP and stuff like this, we have a few developers using that and some Java developers. I think Java will be one of our next SDKs, actually, to work on. I just haven't got around to hiring a Java developer yet. It's on my to do list.

But we think a lot about documentation and how to make those things well, and I'm doing a lot of thinking at the moment about minimizing that onramp, what code sandboxes do people use, because a lot of the friction of learning a new tool is just installation.

Even a lot of Python installation is quite difficult because people have a lot of issues like, "I forgot to initialize my virtual environment," or, "I didn't run my Docker container correctly." All that is unnecessary friction. I'm very excited about tools like Replit, for example, and Gitpod, which I think solves some of these problems. It's about figuring out which of those. We use ReadMe.io a lot on the documentation side, and it solves a lot of these problems. It's getting better and better, but it's still not quite the level that, as a developer purist, I would love for the adoption of tools out there to be.

Brian: Excellent. So curious, we had mentioned a couple times, but even broader towards it, how do folks get started in leveraging the tool?

Peader: So if you just go to www.API.audio, there'll be a thing in the top right to sign up. There's a free trial, and you get a one month free trial which generally is enough to do about one hour worth of audio, and then there's a pay monthly subscription. I think our first subscription is like $39 a month and that's generally how people get started. Have a look at the tutorials, have a play around, play around with some pronunciation or build something yourself.

Also, I'm very interested in things people build, right? For example, we have one customer who's building a system for building the signs of pedestrian crossings right. You might think, "Wow, there's a niche use case." But they have lots of different signs and they have lots of different volumes, and they have all these kinds of idiosyncrasies and he just wanted to build his own tool because that was easier to do his job.

I find this staggeringly fascinating, that someone would have that as a problem. So I'd be very interested in just what people do and how easy it is. And also, like any developer, I'm very interested in how do you make that simpler. At the moment, it's coming up very soon on our roadmap, we're working on a new design array API and a lot of that is based on developer feedback, small problems with getting started, small problems with error messages.

I'm a big believer that the error message should try to put you onto a happy path. I think we have that down in parts of it, but it's like anything, there's such a big space to explore that it can be difficult to get all those things. So I'm very open to feedback and curiosity. My email should be in the shownotes as well, or my Twitter or whatever.

Brian: Yeah, for sure. Yeah, definitely your Twitter. I don't know if we put emails out publicly on the website.

Peader: Okay. Twitter is fine.

Brian: That'd be perfect for scraping though, if we did.

Peader: Yeah, true. Sorry, I didn't mean to open up myself to a security issue.

Brian: Yes, no worries. We're here to ensure your security is intact, at least for this podcast. Yeah, so I want to transition us to picks, these are JAM picks, things that we're jamming on. Could be anything, books, music, food, technology picks. Since you've come prepared with quite a few of them, did you want to go ahead and get us started?

Peader: Yeah. Actually, one I didn't put in the shownotes, but ReadMe I quite like as a documentation platform, ReadMe.io. I find that quite useful.

Brian: Yeah. We actually had them on the podcast, the founder.

Peader: Yeah. I think I did listen to that one actually, I am a fan. The other is Warp, we chatted about this before, Warp is www.Warp.dev. It's this fantastic terminal. One of my developers was like, "This is amazing. Look at this, it's so fast!" And I was like, "It can't be that cool." And then I was like, "Okay." And now I've been hooked since, I just find it so much easier to use.

Podcast-wise, I've really been into Matt Clifford's Thoughts In Between, Matt Clifford's one of the co-founders of Entrepreneurs First. He talks about all sorts of topics, it's quite general, but a lot of it is about technology. A book recommendation, I've been thinking a lot as I build teams, I have like eight or nine developers working on my team at the moment. How do you build good engineering management systems?

A lot of this is a systems issue, there's a great book by Will Larson, it's called An Elegant Puzzle. It's full of lots of discussions of restructuring or reorgs, what happens whenever your team gets too big, how do you interact with product, how do you do some product management. Which, of course, a lot of engineering managers end up doing.

Sometimes you have to wear the product management hat, which I wear badly, but sometimes I have to wear it. He has good frameworks for that, like doing good one to ones, and also just thinking about things like how do you run a migration? How do you ensure that you have a good on call rota? Et cetera, et cetera. It's just full of lots of stuff. Also, the book is beautiful, it's published by Strike Press, it's just a beautifully well bound book. I don't know if it's been mentioned before in the podcast.

Brian: No, it hasn't actually. I'm actually looking at the Amazon link right now. I've read a couple of Strike Press books and they do a really good job. I always walk away with some knowledge base. The Nadia Eghbal book, which is Working In Public-

Peader: I haven't read that, it's on my list. Yeah.

Brian: Yeah, definitely worth a read if you're interested in approaching open source, or even just improving the open source ecosystem that you might touch and maintain.

Peader: I'm a big believer in that, it's giving back to the communities as well, and keeping those things open. We contribute quite a lot to open source, so it's something that we take seriously as a team.

Brian: Excellent. Yeah, so I've got two picks. First pick is This Week In Startups. I just recently started listening to this, I think it's for folks who are founders or aspiring founders. I think there's a lot of good nuggets and wisdom, and I think that what I've got a lot of value from is their Sundays Sessions, which they usually teach something new for founders, to teach them about venture capital or things that they didn't think about.

Recently, I learned about the 83B Election. So as a founder in a Delaware C Corp, very useful to know that knowledge and it literally never came up prior to listening to this podcast. So yeah, worth a listen if you're a founder or an aspiring founder. The other pick is the camera I'm using. It's the Lumina Webcam. I don't know if it's going to activate, I'm trying to give a demo and it's not working.

Peader: That's always the curse of a live dome, isn't it? We've all been there.

Brian: Yeah. What I was trying to demo is the cameraman feature, which is like if you move back or if you come closer, it keeps you in frame the entire time which is pretty nice. Then it's got this auto blur, Bokeh situation going. Yeah, I've got a DSLR that's actually mounted over here to the left of my screen and I love it for doing content and videos, but I have to be sitting down and I have to be looking this way.

What I love about the Lumina cam is that I could be using the standing desk, I could be sitting down here or I could be... Oh, it just zoomed out a little bit. But it's pretty useful, it's a nice go to webcam if I don't want to use the heavy rig that I use for content creation, so definitely worth a try.

Peader: Awesome. I'll give it a look as well. Thank you for those picks.

Brian: Yeah, appreciate it. And thank you for the conversation. Folks, check out Peader's work and, listeners, keep spreading the jam.