Fix Bad Audio Instantly? Inside Waves' Voice Regeneration Tool | All kinds of podcasts for all kinds of people.

What happens when audio is so bad… even the best plugins can't save it? In this episode of The Pro Audio Suite, we're joined by Michael "Gomez" Pearce-Adams from Waves Audio to talk about Voice Regeneration, a new AI-driven tool designed to rebuild poor-quality dialogue into something actually usable. And this isn't just noise reduction or EQ… it's a complete rethink of how broken audio gets fixed. We dig into:

Why this will never be a plugin
How AI is actually "rebuilding" voice recordings
Real-world use cases, from podcast disasters to phone interviews
What this means for audio engineers and post-production
The limits of AI audio, and where it still falls over

There's also a bigger conversation here about where the industry is heading, and whether tools like this are helping or hurting the craft. If you've ever been handed unusable audio and told "can you fix this?", this one's for you. 🎙️ What You'll Hear in This Episode

Why traditional tools fail on truly bad audio
The rise of content creators as the biggest audio market
Cloud processing vs plugin workflows
Ethical concerns around AI and audio data
Accent bias in AI models and why it matters
Real examples of voice regeneration in action
How fast AI processing is changing expectations

🔗 Links & Resources

🌐 Waves Voice Regeneration: https://www.waves.com/voice-regeneration
🎧 More episodes: https://www.proaudiosuite.com
💬 Join the conversation: https://www.facebook.com/groups/357898255543203

🙌 Sponsors Thanks to our mates at:

Tri-Booth – Portable vocal booths built for voice pros.
Use code TRYPAS200 for $200 off.
Austrian Audio – Making passion heard with world-class microphones like the OC818 and OC18.

You are any be history story. Welcome, Hi the Pro Audio Suite. Thanks you guys, a professional and motivated with text. The video stars George Wisam, founder of Source Element, Robert Marshall, International audio Engineer Darren, Robbo Roberts and Global Voice Andrew Peters. Thanks to Tryboo, Austrian Audio Making Passion, her Source Elements, George the Tech Wisdom and Robbo and AP's international demo. To find out more about us, check the Pro AudioSuite dot com and welcome to another pro audio suite thanks to try booth. Don't forget the code t ri ip AP two hundred to get two hundred dollars off yours and by the time this goes to air, I'll have mine set up somewhere remotely in Australia and Austrian Audio Making Passion heard. We're hearing from our special guest today, Michael Pearson Adams, or as we like to call him down here Gomez from Waves Audio has a brand new pro which I have to admit I think I looked at, but I can't remember. It's all to do with hormone treatment that that is messing with mark brain. I think the funniest thing about the entire intro is it sounds like a telemercial on Channel ten at all at eleven o'clock at night. What's funny? You sh'd mention that that's Andrew's petty. I used to me and a guy called Brad Turner used to film those in Melbourne years and years at like nineteen ninety. It's like, hey, we've got Brad and Michael here from double TFAM to talk about a new product. Hang on a cotton a minute. Were you doing that? Yeah? I used to go down there. I was Brad. It was a double tea. Yeah, BT and me. It was fun. Oh my god, we must have crossed past. John Dromanus was also involved at some point. We alway used to make money like that. Those were the years indeed. Anyway, anyway, thanks guys for having me here. It's been a pleasure. I guess I'll see you next time. Right. Thanks, and get out to Brad if you're listening, to tell us about this new product, because for some reason thought I would be good to help you with the beta testing. So I know all about it, but you should probably feel our listeners. So one of the first things that I need to mention about what we call Voice Region is it's not a plug in, and it will never be a plug in, And we can talk about that part later down the line. But one of the largest and fastest growing audiences or of users who create some kind of audio in the world are not DOW users or video editor uses. Their content creators. And one of the things that I thought really long and hard about was they might not understand how to fix bad audio, but they do understand what bad audio sounds like. So let's give them a solution. So we spent about three and a half years working on building a very large language model and training it on what good audio sounds like. And we've come up with this product called voice Region, which can make even the worst nearly a retrievable dialogue sound nearly like a podcast mike and very very usable and on the website. Some of the demos I use, actually most of the demos are very Australian because you know, I have access to a lot of content creators here. But literally one of the cases was a podcast recording from Riverside where on video the guest looked like they were recording into a nice microphone, but the reality was they were actually recording on their MacBook microphone which was across the room. In that kind of classic mistake. It's a hugely common problem, but it's also one that is emotionally, very very stressful for somebody who thinks they've got great content and realizes it could be completely and utterly useless because of that fact. And this is kind of why I built voice Regent, not for the people who mostly control the content, but for all of those out of control scenarios where bad things happen and you have to work out how do I retrieve this? Do I pay somebody, do I learn how to use a tool that I've never seen before? Is this something I can just drag and drop it onto and have it fix it for me? Which is what we built. It's quite seriously, some of those examples that you have up there, the normal plug in would probably be of no help anyway. They're so far gone, yeah, that you would only be detrimental to the audio where you were trying to keep anywhere. And obviously, if you think about the amount of time that it would it would take, say, for example, robo, if you've been paid to take one of those pieces of audio, you'd spend a few hours on it, you'd still go back to it and go, no, I'm still not happy with it. And one of the first things that we had when we put the first demos out was people from you know, our existing audience, which are all plugin users, which we're very grateful for seeing, make this a plug in. And we're constantly explaining now, the reason this will never be a plug in is because it would break the CPU and GPU on any computer. It doesn't matter if it's an M five that came out last week or you know, a really powerful PC, it just would break it. It's not possible. This takes too much energy and too much power. So it's in the cloud. Tell us tell what are we doing to the audio? I mean, it's called voice region. We're literally, well, we're literally regenerating, so we're listening to what it is. We're taking the tonality as much as we can find, and we're literally rebuilding it so that it is what the LM expects it to be. One of the examples that displays that fairly clearly on the website is a Creative Commons public recording from one of the sixties trips to the Moon where we take the NASA radio from the moon and make it sound like he's talking into a podcast mic, and you can tell it's the same person. It's just it's just one minute he's on the moon, in the next minute he's not. And so this but was he? But was he on the moon? I wasn't going to go there, but you. And my point would be that, Robert, there's hope for our podcast yet, mate. Yes, that's well. I mean I think I'll just start recording right now and you can just capture this directly off the phone, right, please. Please please, but take into account, Robert, that I actually can do that with voice reachent Right. That's I'm saying. I mean, like Live by the Sword, Die by the Sword. Let's let's do it, Robo, come on. I'm already doing it with Gomez's paudio. Actually. And there's a couple of different ways to use this, And there's a couple of different things that I've built in and that we're about to kind of add. One of them is I've given people the ability to record directly onto the in the app on the either on their phone or on a computer, and I've given them a place to put their script and you can either use it as a teleprompter and have it scroll, or you can just kind of stick speed to zero and the script will stay there in front of you, so that you can actually look at the script while you're talking into a mic, the same way a voiceover does. And then for one of the reasons we did this is because a lot of the time, if talent is remote, whether it's a podcast guest or somebody else, you can still make them sound like they're in front of a better microphone. And do you room it with just drag and drop and process. We're about to add in two weeks a PopOut teleprompter so that you can actually put the teleprompter on a separate screen, make it really large just underneath your camera so that you can use the teleprompter. And we're also adding a record video onto the recording page, so you can select your video camera, select your microphone, and even if it's the MacBook microphone and you're in the kitchen, will make it sound like you've got it. You're wearing a lapel. So this all sits up on a server. Is that correct? Yep? And is this part of the what's the bundle? You know, the subscription? Do you get the no. No, if you're if you're if you're if you're paying for creative access plugins, then you're not getting Regent. If you want Regen, then just go and subscribe to it. I mean I've made it super cheap. I mean, to get five hours of processing per month, I'm charging four dollars ninety nine US right now. Oh that's true. But I also give you everybody on a free account which needs no billing address, no credit card or anything, you get five minutes free per day. So if you think about all the short form content and that people could literally record on their phone with the video not have to wear a lapel, put the videos straight into voice Regen, and then post it on TikTok, etc. And by the way, we're finding that video files are about fifty percent of the files being uploaded to voice Regen right now. So it takes in video demos as it does the audio processing remos as a video. Absolutely, we give you back an MP four with clean audience. Here's a question, and look, I know it probably doesn't need to be answered by you guys, but I'm sure plenty of people of asking it given the AI temperature at the moment. Shall we say, what happens to all these AI samples once they've been prosted. Your audio stays on your dashboard until fourteen days, and then at that point it's automatically deleted. And if you look at the user dashboard, it says I think I've got it in like two places right now. It says anything you have in your files is deleted after fourteen days. One of the things that we do not do is use any users files to try and improve that model. And model is that model. We improve it by paying for content from specific companies that do that kind of thing for two reasons. Number one, it's because it's you know, we want to stay one hundred percent legal with where our content comes from. And number two, scraping is a really really bad idea. It's unethical, it's not moral, and it's lying. So we do not use any user's audio to help educate that model at all. And by the way, there's another reason for that. When you've got a large language model like this, you don't educate it by giving it terrible audio. Get you educate it by giving it very high quality audio. It's not like we can educate this by giving it regen processed audio. Because that's a bit like you know, open ai chat GBT scraping and finding more information that it created that was wrong and. It's eating It's the beast is eating its tail. Yeah exactly. Yeah, it's like it's it's not the way to do it, but yeah, it's nobody's information gets used. When does the four ninety nine limited time, because it's normally nine ninety nine. Right, Well for those career define normally it's like ultimately, as far as I'm just I'm concerned, I will keep going on the four ninety nine until I feel like I've reached a point where we're established. We released this on January twenty sixth, and in a very quiet launch so far, we have about thirty five thousand members, which you know, for for a month and a half, I'm pretty proud of. Wow. Yeah, we've processed a ridiculous amount of audio in that time, which I also keep tabs on. But I want people to keep on joining because one of the things that happens when you have a free account and you're using it is occasionally an error happens or a file that you upload doesn't work, and those kind of errors do get flagged and they help me and my team, and I'm looking at the data every day to see, Okay, I can't hear the file, I can't see the file, but if it fails, I can see what codec it was and kind of like what length or what was in it, so that I actually can learn and improve how we deal with errors and also keep on adding, oh there's that codec from that, okay, cool, let's add that. I want to keep the price down so that we keep on bringing more people in. By the way, prices is really fast, So I mean it immediately tells you to download a week file, and I guess that did you do that so that you could ensure that Look, there'll be a local copy. For the user. Absolutely, and if you once you've done that once effectively, what will happen is you press record as you just did, you can preview it and go, yeah, I'm going to like that. I'm going to process that as soon as that happens. If you press okay, it will take a copy of that and put it into your default downloads folder so that you have a redundancy. And from my perspective, anybody who wants to record anything should have a backup of that that's not touched by any server. So if you haven't got a backup, you haven't got it at all. Right, yeah, exactly. I mean, remember this is developed by somebody whose theory is there's no such thing as a backup unless there's the backup of the backup. You don't have a copy unless you have two copies, exactly. Yeah. And the other thing is, once you click on a file, you have a toggle between the before and after audio, and most of the time you can just tell by even looking at the wavefile that something dramatic has happened. Is the processing basically faster than the real time, way. Faster than really quick, really quick? Yeah, So how'd you go? Robert? Have you got something you can play? George? Sorry, here's here's a Robert on the phone. I don't know. I'm going to ask Gomez, when does the four ninety nine limited time? No, it's because it's normally nine. Ninety nine, right, all right, And here's the process version. I don't know. I'm going to ask Gomez, when does the four ninety nine limited time? No, it's because it's normally nine ninety nine. Right. Isn't that bizarre? It sounds all like a phone to me. Yeah, that's that's your technology, man. Sorry, yeah, it is freaky. The first time I did that, I was not on the phone. I haven't done a phone one like that, but fucking. But the thing is, you know, it's like we've got we've got a couple of radio networks in the States and radio network in Australia using this now, so that phone interviews, even if it's down to an artist, you know, doing Hey, it's such and such, and you're listening to Suddenly it's not a phone interview anymore. Suddenly you've got them sitting in a studio recording the tags and they can be used in proper promos. Take all your old phone IDs and you could have you could have Jim Morrison saying gooday. This is Jim Morrison sounding like he was sitting across the room. So I don't think Jim Morrison will say gooday. Just so as a as an experiment, it just because because I don't really have much of a life. I about about three weeks ago I went looking for the oldest John Lennon interviews and I found a video of one that had so much ground hum and you know, it was like the interviewer was in a wait coat and all very very and I put it through voice region and then I put it through a visual sharpne called Topaz video, and I ended up watching this kind of like Netflix HD video of John Lennon, and it's it sounded amazing. I mean that this has some fairly, it has a lot of use cases. This. Yeah, it's probably probably a bit advanced for where you're at with the software. But just thinking as an audio engineer now, like with that two header John Lennon interview that we're just talking about, can I pan left and right or is it just all straight up? So right now, it's all at the center. And so for example, right now, if you also if you put in a multi channel file at this point in time, if it's only stereo, we will will mono it. If it's a multi channel more than two channels at this point in time, will politely tell you we can't do it, or if we think we can do it, we're only doing the first channel. We're working on that. That's another reason why I study the data so much, is so that we can learn from how the system deals with different kinds of files. It's for spoken, this is for dialogue only yes, speech, Yeah, and. I can see like Peter Jackson, I'm surprised he hasn't been on the horn pipe saying can I e go. We already have a couple of post production studios who have said we used this for boom mikes from a film when one of the mics failed. So it's like and one of the reasons why we've got a pro account which has thirteen hours, and then we'll end up with an enterprise account. The other thing that we're doing with this is we're also about to launch an API version of voice Regen so that you can actually plug voice Region into your own funnel and kind of create your own service of it. The only person I can see or part of the industry suffering from this is if you were into audio dialogue replacement, because you won't be needed anymore. Well, yes and no. I mean that there's a couple of things. Firstly, a lot of the situations that voice region is useful to a user are the users who would never go to a dialogue editor in the first place. They don't even know what the term means. All they know is their audio is terrible. They need a solution. What I'm not selling them is a hammer and a nail. I'm selling them a nail hammered into a wall. They don't need the process, they don't understand the process. They will never know who to go to or who to ask for. So therefore I'm not really risking anybody's career. The other thing that's important here is the content creator market is like one hundred and fifty million times bigger than the audio pro market, and I feel like I'm going to get to like one percent of it maybe, so I don't I'm not losing sleep feeling like I'm taking anybody's job away. I'll tell you someone a good benefit from this. And there's a podcast I was watching earlier, and I'll tell you exactly what it's called. I think it's Racing News through sixty five and they do a Formula one podcast. Every time there's someone beaming in to them, like a guest or something, or the person's on location, they can press the shit out of it to a point where it's so gated that half the stuff is missing. It's like, what the hell did you like? Words are gone? Yeah? Yeah, it's like come and I've sent them a message saying your gates can't listen it's too heavy with the gating. Half your dialogue's gone. Well, you raise an interesting point that I was going to make was that you know, there for content creators, there actually is a direct correlation between the quality of your audio and how much people believe you. It's actually been studied at the University of Queensland and that one in California. When we've talked about this before. You know, it's like you can draw a line almost in terms of how listener perception of the information you're giving them falls as your audio quality gets more. Doesn't matter if you've got an FX three camera with a Sigma lens, anamorphic perfect lighting and your sharp focus. If your audio sounds like you're recording on a Logitech C nine two oer webcam in a kitchen, which you probably are. Just a random side note, the highest selling webcams still in twenty twenty six is the Logitech C nine too. I've got one. I think I'm using it right now. Literally literally, It's like I did I did. I have a report software that I pay for a lot for every month that tells me how many things sell on Amazon per product. This thing just on Amazon sells like at least twenty thousand a day. It's insane. It's not a bad camera minds to see nine to. Oh you've got the Pusche version anyway. So but yeah, good audio is competence, and that's kind of one of the things that from my perspective, I wanted with voice region is to give people the competence because you know, if you look amazing but your sound is really rough, it does lower your self esteem and you think, do I really want to put that out there? What I want is somebody to feel like they can just literally pick up their phone, record something, hit process and send it out into the world. Know that it sounds amazing without knowing how it was achieved. Focus on the content and let us look after your Polish. I agree. So if you're like what you've come up with for people who are doing content, is it a worrying about all the shit that they have no idea what it is. Just do what you do best and let the machine do the work for you. The other bit for you. The other thing is that, and this is something which I'm denied about, was how la a file I wanted to enable people to play with so you can upload anything to voice region up to one point five gig in size. So now one point five gig if you're at seven twenty for you know, something like that with video you're talking. I mean I ran one of the webinars I did with kind of like Andrew Shepps or somebody. I ran the entire webinar at seven twenty p after I ripped it down off YouTube through Voice Region and that was like two hours, and it cleaned up our vocals something chronic. It was amazing. But one point five gig, I think is enough. And if you're just doing audio you could do it's like four hours of audio right there. And if you're doing four hours of podcasting, you're not going to have an audience, right, They're gone, they're asleep. Yeah we should know. Yeah, yeah, that's right. That's right about not having an audience. I mean yeah, I think yeah, yeah yeah. So where do you think this leads go? Is? Where do you think what's the next horizon if content creator focus is added to what Waves is already doing. I mean, not specifically, but in a broadest sense, do you think as an industry in general, that focus is going to move in that direction? Not really, I mean, let me give you some perspective. So there are three product managers at WAVES who deal specifically in live sound because we have one of the most popular digital live consoles in the world with the LV one. There are ten product managers who deal specifically with the plug in market and coming up with ideas for plugins, developing plugins, etc. Because that's what a product manager does. A product manager comes up with the concept for something, flushes it out, and then works with the development team to all the way through to make it happen. There's one product manager at WAVES who deals with the content creator market. That's me and for a company who has you know, we have about five million plug in users who have purchased plugins over the years, and in our database, our focus is still very very much on looking after the market that is our heritage. However, my goals are set on how can we take advantage of the processing that we have, the skills that we have, and the technology that we have to make a content creator's life easier so that they can focus on content and not focus on trying to learn a new skill. And that does two things. Number one, it helps us retain the people in the industry like yourselves, who are voiceovers, production experts, etc. And means there are less people out there on fiver saying hey I can prompt you a radio ad, but also means that, uh, the audience gets a better product regardless. Is there a user feedback functionality? I remember kind of in the earlier days of this type of technology. I'd watch YouTube videos that occasional audio randomness. You'd be this listening dialogue and then one or two words were just yeah. Yeah, yeah, yeah. Was it that way? Clearly? If that like it discord or one of those discord used to bit crunch the cramp out of things. Yeah, so can you can you give feedback like if you get if you get bad output, which I would imagine you've done a lot to prevent from ever happening, But can you give feedback? So so well, yeah, with there was a support page in the contact form, and I get a report from tech support literally six days a week on anything that comes from them, and we escalate it. So if it's something that's really easy and it's a user issue, and eighty percent of the time a problem that comes to us is a user issue, So something they didn't quite understand that we clear up. But if there's a real problem with audio, then I will contact them directly and say are you happy to let me listen to the audio file or and give you an explanation, because at this point in time, we're still in the point where we want to make sure that everybody who has the chance to really get an idea of why their file didn't work should know why. And one of the other things we're about to do is we're about to open a whole new set of social media channels specifically for the content creator market because we can't we can't really run tutorials on voice, regen on Waves YouTube, and then somebody will be scrolling and they'll go from all that was really useful, and then the next one is Andrew Shep's talking about high pass and low pass, and we lose them exactly. You know, we're opening a set of new channels. We've just employed a whole new marketing department to deal specifically with this, so that we can actually give content creators the attention they deserve without affecting anything else in Waves or lessening the attention that we give to our heritage audience. Did you say discord? Was that the the. Discord used to bitcrunch everything to I. Was going to say, because it used to do this week discord, put it in the system that would try to clean up the audio, it would just likescript'crit. Discord or sorry descript Yes, descript discript. Studio sound studio sounds. So do you remember the one that you sent us, George, whether it was a real estate thing where it all of a sudden went and started doing gibberish. So it sounded like what I would like to do is get that that plug in and then I want to get the chef from Sesame Street and run. That's really. So from what I I mean, Obviously I haven't managed to get anybody from descript dot com to talk to me about this for obvious reasons probably, but I have. I have, actually genuinely, I mean, you guys know me. I'm pretty genuine and open book. I have reached out and said I would love to talk to you about helping you improve that because from my perspective, what I want to do is I want to make sure that the content creators who are using anything you get the best and feel that's really good. What I feel like their audio process is doing is number one. I feel like it's giving the users a bit too much control without explaining what the toggle is doing. But when it's on full, which I think most people just put it onto full without really I think it bit crunches to a point where as you said, some words just disappear. Now. One of the interesting things about this when I was investigating this kind of problem was I found out around the same time that AI created video, AI generated video with dialogue. The dialogue with most video models comes out in the codec as ninety six K, but is bit crunched. To helen back, if you drag that AI video onto voice region, it makes it sound like they're talking into a boom mic or a lapel. So suddenly the AI video that was given away by the audio irritatingly to some people, really impressive to others now looks really realistic because the audio sounds realistic. I think all AI video should always put Waldo somewhere. Yes, you mean the Waldo watermark. Yeah, that's how you know it's AI, because Waldo is somewhere. In the world. Seeming is seeming as you brought up Waldo. I'll share something which I share with the team every week at WAVES once a week when we have our engineering meeting, I create a new Where's waldo but with me and it using Google's Nano banana pro, and I share it in the team's chat every week and the product managers like, okay, where's m P A, Where's m p A? Where's m p. It's funny I wear pajamas that stripey I wore. We were bananas in pajamas. There's bananas. Do that? Does that make your We were in a meeting. We were in a meeting to day and I was wearing a collared shirt and and I had three people say you look like you're a prisoner. It's like, so. Because we all are prisoners. Well for me, for me as a user of waves, the hardest thing for me to wrap my head around when I hadn't looked at it, yea, it was just understanding that it is well, you know, so that's like number one. It's like you have to get out of the ecosystem of waves central when your brain clicks and goes oh yeah. First video, okay, I guess. Video I did on Voice Regent was to the large waves market saying and it literally is all about, hey, we've just released this thing, just so you know, it's not a plug in. That's kind of the idea of the video. It's it's not you know, we're not we're not degrading or saying, you know, our users couldn't understand stuff. But we wanted to make it clear that for a first time ever, we're releasing an audio process that isn't a plug in. You know, It's like I wouldn't have wanted to release this without letting the bigger audience know first we're about to. Is it also mobile friendly? I didn't forget. Oh, absolutely, to the point that in the next push, in the next few days, we're adding a little banner that says, hey, do you want to kind of shortcut this so it looks like an app on your phone? Teleprontter works on the phone. You can audio record to a script on your phone. You can drag a video or an audio file from your phone straight in and press process. I'm not letting people on mobile phone record video when we release that, because a mobile phone is perfectly good at recording video by itself and you're holding it. Just record the video with your standard app upload that. It's easy. So here's a question, Then do this just occurred to me. If I were a voiceover artist and I was putting together my travel rig and I had a interface like. You know, maybe something that some certain podcast may. Have put together. Yeah, sold for a while. But I needed something to record onto. Could I use it just as a recorder? Like I guess the question is do I have to process? Can I record on it there and then just download. And have to process? You can just to it that way if you want it. So so does it burn your hours if you use that that teleprompter and the recorder or is that all without using the hours that the you know. The only thing that the only thing that burns your minutes is the processing processing. A teleprompter and a recorder right there. Yeah, And that's honestly, and it's actually intentional. And one of the reasons why I did that is because of my heritage and my background from you know, not just twenty years in music production in the States, but you know, fifteen before that, radio production in Australia. One of the things that when you've got headphones on, you've recorded a script and you listen back, it's when you listen back that you realize, yep, sounds good, but I'm in a hotel room. You know what, I'm just going to click process and de room this exactly right there. I love that you integrated the script reader. That was first. I was like, oh, I had to. I mean especially, I mean I've got two users who are on pro accounts because they they are professional audiobook readers, and they love me to death because I got them to beta test this. Then they were like, wait, so I can have literally chapters in this and just have it right there in front of me, scrolling really so and just do it at my own speed. It's like, yeah, and just pause if you want to. This is gonna be controversial, but I guarantee you could literally just record entire audiobook of the phone full stop. If you if you give this really good audio, like really good audio, will. It still touch it or will it go? That's good interesting question. I'm so glad you asked it now because of the fact that this is a very large model that's trained on high quality files. It's trained on not the highest quality, it's trained on quality files. So when somebody from a studio puts in a file that's recorded in you know, annoyment and in a soundproofed room, eq COMPRESSI free am exactly. If you put that into it, it will degrade the file interest stuff. Yeah, okay, that makes sense, because what it does is it's still trying to go okay, I have a quality that I'm trying to get this file too this. I don't know what to do with this, So it's going to degrade it for you. But only if you hit process right. You can still record. Yes, at some point you decide this model is baked, let's build a new model and start all over. We're on the third model now, because, as I said earlier on, we're also about to enable this is an API that people can use as a feature to put into their own website and charge for or whatever. So what this will end up with being is there'll be the initial model that will end up being kind of like one of the most affordable ones, and then as we build new models that will be more dynamic or brighter or this, then will will decide how we want to move forward with the consumer app. The one that we just released. Says, this is a voice that sounds like it was recorded with the U eighty seven. Yeah, I don't want to do that. There's a couple of reasons. I don't want to go that exact. Number one, it's a huge claim. Yeah, But number two, one of the things that happens in if you're creating a tool that content creators use is you run a risk of giving people who are focusing on recording their content and creating content and don't want to be an audio engineer or a video editor. You're asking them to make too many decisions without the experienced knowledge to back it up, and that becomes what I call feature creep. You're asking them to make a decision on control on something that. They may not be able to hear well. Even that is like but understanding what you're hearing. Whereas my goal is still I don't want to give you a hammer and a nail. I want to give you a nail that's in a wall. I would imagine ninety nine percent of content creators would ask, what is a. U eighty serv exactly. Yeah, if you take a look at what a lot of There's been a lot of experience over the years with different pro audio companies entering the content creator market in one way or another, and I've studied most of them, and I'm not saying I or we are perfect in any way. But one of the things I have learned from other's mistakes is number one, never use pro audio nomenclature when you're talking to a market that could be a wedding event planner doing a vlog. At what point in her life or his life have they come across a high pass low pass filter and the two and two to one compression ratio. It's the you know, even the term compression or EQ. You just I want to stay away from them. Well, even Clarity VX you have I think I haven't checked in the last two months, but three models right. And there will be more. You know, the Clarity development team, and it's a team they're constantly at work improving, tweaking, and obviously there has been some you know questions as well. Out in the marketplace. It's like, well you made Clarity a plug in? Is Regen just a website of Clarity? No, completely different models, completely different models, completely different processes. Clarity can be a plug in. Regen will never be a plug in because it will melt. There are different products. I mean, Clarity is not there to change the character of the sound. It's there to isolate the voice in the character. That's given to it. Right, whereas we literally regenerate a voice. You change the character you're re I wouldn't. Say we're not already changing the character, we're change. The sonic character like this. This this was originally recorded on a whatever, like a like a carbon microphone. Yeah, and now it sounds like it's coming out of an s M seven. Yeah. Yeah, you know like that the character of the recording, not not the person. Would rebuilding it be a bit of way to say it. We're Renovationno, we're kind of regenerating in a lot of case, because it's like we're we're kind of looking at the in a way I guess the DNA of a vocal tone and going okay, let's put some work in and improve this. So in the end, you know how well you know how you go to various websites we won't name them, and then you've put your voice in there, and there's the whole idea of a one shot model. Yeah no, we don't do that cloning. Yeah, so this is not a clone, Like, just to be clear, it's not learning this person's voice that someone uploads to it, and those people have nothing to worry about, Like, oh, now you've uploaded my voice to some website. I don't want my voice uploaded to. No, we don't retain. It's like we it goes through our process, it goes out the other end. It doesn't even stay in our cloud. It goes back to your dashboard, and after fourteen days we just delete them from the system. You know. I had one case a year ago, because I know everything's happening so fast, so whenever I say I tried this thing, you sort of have to say when you know, because like everything's changing. But the client had sent me some audio from like a Catholic priest or something, and it was all this archival audio and it was recorded very on a bad technology, probably like a twelve bit thirty two Killer Hurts or worse recorder that was built into the I don't know, something pretty bad. And they said it to me, and the audio was pretty bad. And then I threw it into one of these AI tools that regenerate essentially, and when she listened to it back, she was kind of I think she didn't know what to expect because it was a reconstruction, right, and so she thought it didn't sound authentic. So then I was like, well, I don't know what to tell you. This is probably what it did sound like, but you've ever heard it for the last thirty years, sounding like it comes through a tin? Can you know? Out of It's exactly the same thing that happens with I. I was talking to a producer composed of guy called Greg Wells a few months ago and we talking about a soundtrack that he's just composed. And one of the biggest problems with composing a soundtrack and making the film director and the company happy with it is when they're actually doing the edit of the film, they use a temp soundtrack, which is usually an existing soundtrack from something else that kind of fits. But they spend so many months on the coming room floor with the temp soundtrack, so when the new soundtrack that's been composed specifically for it comes up, a lot of the time they're like, well, that sucks. I think the funniest example of this is some clients and I were talking about a session and basically the scratch track was done with an AI and then they were recording the real voice and they began asking the voice talent to read it more like the scratch track, which was basically saying, please read it more like AI. Like how to lose your soul in five seconds? Yeah, I had that exact thing happened to me. We talked about the session before, whether it was a female reading to this short film that they wanted me to voice, and I was kind of thinking that sounds okay anyway, but they said, oh, no, no, she can't. The few words are not pronounced correctly, and I'm thinking, well, why don't you just get her back to redo those lines. But it wasn't until later I realized it was an AI. But they were trying to get me like the inflection on the guidetrack, can you try and do that? And it wasn't until the next day, I think. And I mentioned this before another episode that I was playing it to my son who's seventeen. I said, you know, this is what happened loud. He goes, that's aim, like. Yeah, I can hear it a mile away. Usually too, they can hear it a mile away, the kids. I'm fifty six, but I can hear it a mile away because I've spent so many I've spent the last few years literally deep in AI research. And one of the things that's really hard, even if you're using something as good as eleven labs. You have to make sure that words, especially in English, words that are the same, so say, for example, wonder or wonder so w A n d e R w O n d e er you have to work out a way of spelling them phonetically so that they are separated. And there are so many other words in the English language that AI just does not get its perceptive head around because it doesn't have a perception version of the context in between this word, this word and this word when it's spelt one way but is pronounced another. So I remember doing those with the phony in chips in college, and just to get this thing to say electronics, you had to put like ten e's in a row because if I had to go like electronics, and if you wanted to say electronics, you had to put. Like e e E E e e. L like electronics. Yeah, because it just yeah, you have to like force it to get out what you want because it's not always. But it's also a dialect. I mean, you know, I've just heard a couple of things, like I heard say data instead of data, but it's spelt the same but it's pronounced differently, like dance dance you. Know, yeahs an American man. Yeah yeah, yeah yeah, tomato, tomato, yeah exactly. It's like when you get someone who's like, English is not their first language, and then they're selling writing on the script there phonetic version of how you pronounce that word, but they forget that. The fact is that their phonetics are going to be different to yours because they say certain things differently the way you would say it. So when you read their phonetic version of that word, it's completely wrong, and they come back and go, no, no that's not what I want. I want it blah blah blah. It's like, well that's what I just did, but you know, because they will pronounce it differently. So now this brings up something which I think is interesting. I find it fascinating in when building the model for region was happening. The first model is a term that we realized early on that we had to rectify, which is accent bias. If a model is educated on too many of a specific accent, then it doesn't understand how to deal with other pronunciations or accents. So initially, early on, an Australian accent was an anomaly and some of The first tests we did was frustrating because the Americans would sound amazing, the Brits would sound amazing, but anytime I tried it, it would sound glitchy as hell. And we worked out it just doesn't understand this half drunk way we talk. I careful, We're not half drug. It's like we tend to kind of take shortcuts with the way we speak. So we had to kind of make sure that we spent a lot of time avoiding accent bias. So is this only for English? At this English language? It works with Hebrew, works with French, German, Spanish, Japanese. But it is language specific basically, No, it's human voice specific. Yes, understanding, So if, like does it work with pig Latin. I have no idea. It's not high on my list of priorities, to be honest, Like. If it's like some unknown language that it's never ever heard before, here's. Here's the killer. Here's the key with voice regent. If you're pointing a file into voice Regent and in the language that you speak, you can understand the words that are coming out of the speaker's mouth regardless of the noise level. We have a chance of saving that file for you. If there are words in that file that you cannot comprehend yourself, then our chance goes down by about eighty percent. Okay, is this is this forensics proof? Like like could someone go to court and be like, here's this awful recording, but look he said I murdered her And then voice reagion's like perfect, Like there can't you hear that? It's like yeah, absolutely, yeah, we're not making up. We're not. There's no point in the system where we make stuff up. We regenerate what the system understands. So a word that is really garbled because there was a technical. A word that is really garbled is still going to stay really garbled. Yeah. Yeah. So one of the early cases, and because I mean, one of the things which we did is we spend a lot of time throwing files at it that were really really challenging. Like two people filmed on an iPhone, but they're thirty feet away on the edge of a cliff with wind. One's talking ones further away. She's yelling back at him with wind, you've got the wind. And we learned very quickly that if you're listening to a file and you're listening to the audio and you can't understand what they're saying, then our chance of getting there is a lot lower. It's not every case, there are some of them were go, oh, that's what she said. But most of the time it's at that point where the audio dialogue frequencies are so mixed down into the noise frequencies that we can't regenerate because we have nothing there to work with. So what it does not do it does not interpolate. No, it's not using context. It's not it's not it's not it's that there's no content decoding the language and re encoding the language. It's just decoding the voice, the human voice in there. Voice. And the examples I have on the landing page on the website are very intentional. They're all bad quality, but they're also all very If you listen to them, you can understand the words, but you're like, oh, that's terrible, And the reason for that is exactly what I just said. It's like, there is no point point in the system where we just guess, because that's you know, we already have enough of that with ch actual BT and other AI systems. It's like what I did was we we created a model that can bring back nearly irretrievable audio, but it's also not going to happen in every single case. There are some cases obviously where people put in audio and go, yeah, it didn't help. So as a future, possible pro version or advanced version would be to literally regenerate a word that was totally garbled, so the user could type in the word foxy and they will regenerate the word foxy from that zoo. It's been discussed one of the things that comes up with this that is a big problem for me because as a product manager, one of the things that you're thinking isn't is something possible? You're thinking, is this possible in a way that the result will be something that the user will take for granted there's something that happened and can move on with their life without worrying about it. And my answer at this point in time is still no. Because what you're doing is you're saying okay, right, So we have to first of all clone a certain amount of the voice in between around the word that's illegible. We then have to make sure that okay, they're typing in a word that's what the word was. We have to get that voice to them say that. We then have to get it in the right intonation, the right emotion, the right quality to fit the rest of the regenerated audio around it. And if you're creating one word, it's actually a lot harder than it sounds to create one that's average quality, that's even with the rest of it, than it is to create one that's perfect quality. Yeah, it's it's a different product. Really, I mean it's adding a feature to the product. I'm going to drop in an old school solutionary, remember in the day when they're if there was and it can only work if it's a small, like one word problem. But you would get a male or a female, not the same voice to do it, and then you would cut it in and it's so fast that the mind doesn't actually perceive it. Who were the people that were best at doing that. It was always the musicians, because the musicians could follow the pitch of the person. I'd done it a few times, I'll be honest, to save things I can't remember. Paid for it, like the voice time. I'm sure ap could do it, and I know Paul Davies and I used to do it all the time. Yeah, yeah, I used to do it. Change promise. I've done it with Robert where he's like, the audio is like chop the end of a word off, and I've recorded the end of that the rest of it, and you can't pick it. In the old days if you actually if you dropped in and you dropped the end off, like an ing off the end of a word or something. Yeah, I know. My daughter can can match pitch in like a supernatural way. When my when my girlfriend first introduced yourself, my girlfriend's name is, and my daughter immediately picked it up. And I said, I said, hey, honey, do you want to do a drawing for Fear as a And my daughter immediately says, it's not fear as a, Dad, it's fet as that, And she like, immediately a tape recorder played back, you know, in her brain the way it's supposed to sound. And I was like, holy crap. You know my twelve year old can play back audio from her brain. That's the key. That's the key to it being a mimicked. You can mimic stuff like I remember going to UH because my other half she speaks Italian, and it was like, you should learn Italian so we can when we go there we can both you know, communicate, And I was like, all right, okay, So I got, you know, convinced that I would go and do this thing. So I went to do my first Italian lesson in Melbourne and I finished the thing. They go, oh wow, you know, you were really good. You're really good. And I was just mimicking. But when it came to actually building structure and knowing what the words meant and how you put them all together, I hadn't know fucking idea. So I've just got to ask ap So to learn to learn Italian in Melbourne, what do you do? Just go to a restaurant on leg on the street. Or something and you know what, do you know where the Italian school was street? Yeah, exactly, disorder an appetizer and just way it's cool. Yeah, I'm just going to do your lesson, going some and stuff after. Yeah. Well, you know, I've been getting more and more teaching production for podcasters and creators, so I'll definitely be mentioning this as a tool, you know, and and letting people know to try it out because I have, you know, I'm working more with corporate you know, creators. That's another who you know category you know people that do content for corporate so sweet. You know, let's let's talk for a second just quickly about that. Say, for example, somebody has done a course that they've recorded their screen and the little hover over camera in loom or something like that, but the camera is a webcam and their microphone is the a MacBook or something, and it's like, literally, if you're lucky, lucky, if you're lucky, but just literally drag the course video onto voice Regen, hit process and you're done. Yeah. Now, quick question for you. Where can people go and check this out? Waves dot Com, forward slash voice dash Regen or just type in voice region Waves into your Google or whatever you want and go off and find it. This could be this could be an integral part of my road case. Well, I think I was going to suggest that even if you're not a content creator, if you're a voiceover artist, going. No, never need it. Just do yourself a favor, go and just have a play with it, because you'll be blown away. One of the reasons that I put the teleprompter in there and is because of voiceover artists like UAP and also how often you find yourself in a position when you're you're not with your rig and you need to record something and you need it to get it back to the station or to the studio and have a way to do it now. In that case, it's not taking a producer's job away from them. What it does is it gives you Robbo less work to do at the other end than you would have had to. And that's also why I put a TELEPROMPTU script reader in there with zero scroll, because if you're pacing in a thirty second script, then you know you don't need it to scroll. You just need it in front of your face auditions to ap. If you got one you were out at a restaurant for example, or something like that, you could just jump in the bathroom on your phone and record it, stick it up, do it and send it straight back. You could, but I wouldn't because I don't think it gets rid of slurring. I'm guessing the red wine. Hey yeah, yeah, it's going to cut it. See See. That's the other thing is like I worked very hard to try and make sure and I did a lot of the cody for the water, for the for the teleprompter here in Brisbane. I worked very hard to try and make sure that the scroll, especially on a mobile phone, was not jittery, and it's slow enough that you can look at it on a you know, normal phone which is like a tablet these days, and be able to read things without constantly moving your eyes up and down. I'm really proud of this. It's like and so, and if anybody out there as well is interested in using an API version of this for your own service, reach out to these wonderful hosts and they'll get in contact with me. That picture on the floor behind you, licked in a stone, Yeah, can. You give you me a faith? Can you hang the fucking thing up? No, how much it frustrates me to see here looking at that picture just sitting on the floor, because it's really cool. It is really cool, but it's also really heavy. It's actually a wooden cargo slat crate that was painted on. If I hang this, if I hang this on the wall, it's it'll bring the wall down. So yeah, I. Looked at the picture of the girl, and of course the context of what we're doing here podcasting I thought was what. I know it is, But like her hair, now i'm looking at it now, I won't be able to unsee that you can't. That's right, sorry man. The Pro Audio Suite and Austrian Audio recorded using Source Connect, edited by Andrew Peters, mixed by Robbo. Got your own audio issues just ask Robbo dot comment George, don't forget. To subscribe to the show and join the conversation on our Facebook group to leave a comment, suggest the topic, or just say today, drop us a note at our website, our audio Sweet dot call