Discover the Latest in Noise Reduction with iZotope RX11 | The Pro Audio Suite
The Pro Audio SuiteJune 24, 2024x
24
00:50:0891.97 MB

Discover the Latest in Noise Reduction with iZotope RX11 | The Pro Audio Suite

Description: Welcome to The Pro Audio Suite! In this episode, we chat with Mike Rozette from iZotope about the latest features of their Noise Reduction Suite, RX11. Whether you're a seasoned audio engineer or just starting out, you’ll find valuable insights into making your audio cleaner and more professional. Episode Highlights:
  • Introduction to iZotope RX11
  • New features and improvements
  • Practical applications for noise reduction
  • Tips for achieving the best results
Listen to this episode to learn:
  • How RX11 can enhance your audio projects
  • Key features that set RX11 apart from previous versions
  • Real-world examples of RX11 in action
Prefer to watch the video version? Check it out on our YouTube channel here. Connect with Us:
  • Subscribe to The Pro Audio Suite: [Your preferred podcast platform link]
  • Follow us on YouTube: [YouTube Channel Link]
Keywords: iZotope RX11, noise reduction, audio engineering, audio production, noise reduction tips, The Pro Audio Suite, Mike Rozette, audio cleanup #AudioEngineering #NoiseReduction #iZotopeRX11 #TheProAudioSuite #AudioProductionTips A big shout out to our sponsors, Austrian Audio and Tri Booth. Both these companies are providers of QUALITY Audio Gear (we wouldn't partner with them unless they were), so please, if you're in the market for some new kit, do us a solid and check out their products, and be sure to tell em "Robbo, George, Robert, and AP sent you"... As a part of their generous support of our show, Tri Booth is offering $200 off a brand-new booth when you use the code TRIPAP200. So get onto their website now and secure your new booth... https://tribooth.com/ And if you're in the market for a new Mic or killer pair of headphones, check out Austrian Audio. They've got a great range of top-shelf gear.. https://austrian.audio/ We have launched a Patreon page in the hopes of being able to pay someone to help us get the show to more people and in turn help them with the same info we're sharing with you. If you aren't familiar with Patreon, it’s an easy way for those interested in our show to get exclusive content and updates before anyone else, along with a whole bunch of other "perks" just by contributing as little as $1 per month. Find out more here.. https://www.patreon.com/proaudiosuite George has created a page strictly for Pro Audio Suite listeners, so check it out for the latest discounts and offers for TPAS listeners. https://georgethe.tech/tpas If you haven't filled out our survey on what you'd like to hear on the show, you can do it here: https://www.surveymonkey.com/r/ZWT5BTD Join our Facebook page here: https://www.facebook.com/proaudiopodcast And the FB Group here: https://www.facebook.com/groups/357898255543203 For everything else (including joining our mailing list for exclusive previews and other goodies), check out our website https://www.theproaudiosuite.com/ “When the going gets weird, the weird turn professional.” Hunter S Thompson


[00:00:00] And welcome to another Pro Audio Suite, thanks to Trybooth, you can see it in the background there. Take you to a hotel room and scare the cleaner. And also Austrian Audio, making passion heard. Anyway we have a special guest today because we're talking Isotope on RX-11. Welcome Hi

[00:00:24] Hi Hello everyone To the Pro Audio Suite These guys are professional, they're motivated With Tech to the VioStars, George Wittem, founder of Source Elements, Robert Marshall, international audio engineer Darren Robbo Robertson and Global Voice Andrew Peters, thanks to Trybooth, Austrian Audio, making passion heard

[00:00:41] Source Elements, George the Tech Wittem and Robbo and AP's international demos To find out more about us, check the Pro Audio Suite dot com Hi Mike Rosette from Isotope has joined us and will give us a bit of a demo of what this new RX looks like.

[00:00:59] Welcome Mike Hey everybody, hey gentlemen, thank you so much for having me. I am really pumped to be here So listen, you're talking to someone who's used RX for a few years now as an audio engineer and I believe Robert has too Is that right?

[00:01:14] I've gone back to RX three years No, I've wanted RX, but I've not really... Back to three for you George, really Yeah Wow, that is a long time That's very cool So give us the latest and greatest, mate. What's been happening with the new product? Almost different

[00:01:33] And yeah, more importantly, what's different And why should we buy it? Well, I'm hoping I'm going to provide all those reasons but it's been about... We launched on May 15th with RX 11, it's been about 20 months since RX 10

[00:01:49] So it was a while and a lot of things have happened as you all know, the audio industry is moving at a... It's always moved quickly but it's moving blazingly fast now with all kinds of sort of machine learning processing

[00:02:00] And people really embracing AI tools, everything now from generating ideas and songs to all kinds of sophisticated cleanup In the space that we're in with RX with denoising and dereverbing stuff

[00:02:13] So we really had to spend some time on going back to basically machine learning and really catching up with what happened over the 20 months Where we didn't have our new release out

[00:02:27] So some of what I want to show you and what I'm really excited about is the machine learning stuff that we brought in Kind of improvements to a bunch of our processing

[00:02:35] So I think maybe the first place to start is with Dialogue Isolate because we've done some real work on that And we're pretty excited about what it does now So anyway gentlemen, just stop me if I get into non-stop demo mode

[00:02:51] Like please jump in if something's clarified in one direction Because I realize my tendency is to... I can just go like a wind up clock and then I'm like, wait nobody's spoken We let him loose baby, let him loose

[00:03:03] Well Dialogue Isolate is definitely one of the things that for our audience is going to be interesting Because a lot of our audience are voiceover actors or voice producers And so obviously the goal is always clean audio, clean audio But not like totally sanitized or like antiseptic audio

[00:03:23] Right? So that's the golden goose is like getting a dialogue clean But still having a presence of a room tone I think it's a room tone, just a... well, a semblance of a power drill We need isotope to your joint, Robert Yeah, if you do there's some silence

[00:03:42] It's the air around it so it doesn't sound so choked like Jesus A lot of other noise reduction can kind of make something... It's like it gets rid of the noise but it gets rid of the life Right, yeah

[00:03:53] Yeah, that's a really... I'm really glad you started off with that Because that was going to be one of the points that I made And it's come to us from all kinds of different users

[00:04:01] From high-end dialogue editors in the post world to folks who are just starting out But with very powerful tools, if you push things to their extremes You can suddenly take or an interview recorded on the street Or dialogue recorded on location

[00:04:19] And turn it into something that kind of doesn't exist in space and time anymore You've turned it into what sounds like a studio recording And sort of sterile ADR by overly denoising or dereverbing Or taking the background out

[00:04:33] And so you already got to one of my main points So I'll show you in just a sec what dialogue A.C.E.L.A. can do And what I'll do first is push it to the ridiculous level

[00:04:42] Where it's lost all time and space and then back off on it But I think that's an important characteristic of using the processing properly Is not losing time and space Not losing a sense of what makes the space unique or what makes a place unique

[00:04:54] So with this example, this is actually from a podcast So let me just queue up a little bit of this raw And you'll hear why we would want to work on it Improv is acting without a script The main role of improv is yes and could we?

[00:05:11] What's wrong with that? Sorry? Oh my god Like that's great for Robo That's like really good That's Robert O'Matrain doing our show Yeah exactly So we have very good material but the background is not great

[00:05:28] So for dialogue A.C.E.L.A. for folks who haven't used it before Or if they've used it and I don't know about the new version We've done two things We totally updated the neural net so there's a brand new neural net

[00:05:39] And it runs in two modes The one that I'm in right now is sort of Good slash Real Time And this is the mode that now runs in a real time plugin So you can run dialogue isolate in real time

[00:05:51] And you get a really good, a very high quality result But if you're working for example where I am now Which is in the standalone editor You have the option to bring up best or offline That will take longer, you know with better processing

[00:06:06] Comes more demands on the CPU and takes a little bit longer But that's where you get your sort of your best result For demo purposes or for moving quickly Put in good real time, put it in preview And then you can dial

[00:06:20] Which means that you agree with what the person tells you And you add to it It's also a great skill for learning how to pivot on the spot Think outside the box, think creatively, team build We've taken everything out of the backer right

[00:06:35] And we were just talking about how that's not ultimately How you probably want to go Unless you had a special situation where you were trying to Maybe take a voice and I mean sometimes for animation Or maybe for gaming you want to take a voice

[00:06:48] You want to strip away any of real life And then you want to add back in the background But for most dialogue editors You're probably going to want to play more along And all those things that we need in our real lives as well

[00:07:01] When you work with others And you work off of their ideas Not only is it fun and stress busting But it's comical and therapeutic It's just a great opportunity to laugh Improv is a- And so one of the cool things about this version

[00:07:22] Besides, and I'm going to just plug it again Because we're psyched about it Having the real time plug-in that you can use For folks who are using the advanced version So this version you're seeing is what it looks like In the standard version

[00:07:33] We've never had dialogue in standard We've never had dialogue isolate in RX standard before We did add it this time to in RX 11 But if you go to the RX 11 advanced version You get a little something extra And that is a multi-band interface

[00:07:51] That allows you to dial in whatever processing You've set over on the reverb and noise side Or you could just do reverb or noise If you do them both The multi-band processing affects Both of your reverb and noise settings If you've done something really aggressive

[00:08:07] And you've brought down your reverb and noise You now have the ability to have settings Across these four bands So if you felt, for example, that the Dereverbing that you were doing Was really kind of landing in the low mids

[00:08:20] You might want to target all of your processing To the low mids And then you might want to be a little bit less aggressive In the lows And if maybe you felt like the processing Was taking away some of the high end

[00:08:29] You could really back off on the highs And the mids- High mids and the highs And you can also set your bands Different sort of crossover points If you wanted to do that And it allows you to kind of dial in

[00:08:42] What you think is going to be right For your particular sound So that's a new feature That one's only in advanced That's very cool So I guess the upside of that is It will allow you to have less coloring There'll be less artifacts

[00:08:56] In the areas where you need less processing Kind of the idea Exactly right So in many cases Just showing up and ignoring this And working with your reverb and noise settings Could get you somewhere Really, you know, get really strong results Really quickly

[00:09:13] And you might be good to go And print or render and be done But, you know, every recording situation Is different And yes, if you really want to get more specific About where you're targeting D-reverbing and the denoising Feel free to kind of move those as well

[00:09:28] Can you reverse the process? So I mean Like so far this seems a lot like As you said, you kind of like Approached it similarly to what some other Companies may have done And you see that multi-band approach Can you reverse this and say Don't isolate the dialogue

[00:09:48] But get rid of it I only want the background Yeah It's literally as simple as just deciding Where your voice needs to be done Yeah, you can do that And any, you can do any combo Of the three that you'd like Yeah, so in the

[00:10:03] Think of how you want to do it In the multi-band select mode Is that will work You said in real time Does that work in real time as well Or is it just the basic You know The noise and the reverb Or can I use the multi-band

[00:10:19] In real time as well You can use it in real time Again, it's only in advanced So don't look for it in standard But you can do that The thing that we find And just to be really direct here The fact that I've got other stuff running

[00:10:32] On the CPU for the session that we're doing You'll see performance fall off If you have, for example Let's say you were doing this in real time And you were playing with the bands You'll probably see a little bit of a lag

[00:10:43] Depending on what kind of a computer you're on I'm actually, I don't have my M-series At this point it's on the way I'm running an older Intel And you can truly see The difference in what modern day Machine learning is making computers do In terms of processing

[00:11:01] If you're on an M-series You're in a different world If you're, I'm still kind of On the tail end of an Intel You can see that performance And so I like to say that to people Not because in any way I'm trying to push hardware

[00:11:12] Because I can't stand it When you're forced to buy With a laptop But the truth of the matter Is I want people to be aware of this Because with where everybody's going AI and machine learning is going to be More and more prevalent In all kinds of things

[00:11:25] Not just audio And a lot of the new machines Have that, you know, that NPU That neural processing unit That, you know, comes with the machine So just to note that Your mileage is really going to vary With anything machine learning So... What's the sensitivity now?

[00:11:42] Is that just like your basic Kind of threshold approach to things? Or is it? Yeah, sensitivity in this case Does a couple of things Higher sensitivity So if you swing it this way And the numbers go up Higher sensitivity is basically Telling the separation That it wants to

[00:12:02] Basically look at Sorry, I just want to grab one note here I don't want to get this right for everybody And for the viewers and listeners It will allow you to remove more noise But the trade-off is You sometimes get a sacrifice on clarity

[00:12:18] So if you're going more separation You're taking out the noise But that can sacrifice Particularly at the high end If you're really pushing it You can sacrifice dialogue For example, clarity When you go the other way When you go all the way down To the low end

[00:12:32] The opposite happens You will get more clarity in the dialogue But then you're generating Potentially more noise So lower is clarity And less noise Higher is less noise But you risk clarity And so this is a... Does it end up exposing A bit of a gating effect

[00:12:52] Around the words if you Like can that affect that kind of quality To some noise reduction Where you hear it go Oh, there's dialogue I'm going to clamp down on the noise Or you kind of hear it breathing, you know It manifests more as artifacting

[00:13:07] So you'll start to hear edginess Potentially or some issues in the voice If you're pushing it Particularly if you're going In the higher direction You'll start to hear the voice Sound a little bit less crisp And again, it varies by the

[00:13:21] You know, the audio that you're putting in But in some cases, if you really push it Excuse me, up to 10 You'll hear... You'll start to hear some of... It's not distortion But it's a little bit of a quality Of the voice that sounds like

[00:13:33] Oh, somebody is running their processing Too hot, you know And you want to back off Or back down from that That's the fake heat sound You know when you hear it Yeah These settings are... People like so much are like They want like a number

[00:13:47] Like a number of my clients They don't want to get too lost In the details But it really does It's a tune it by ear thing You really have to listen And make a judgment call, right? That is absolutely true I mean one of the things with our

[00:14:02] Ex that I always say to folks Is what's cool about the spectrogram Is you can see your audio And you can see problems When you can see them You target them But that's no replacement For your ears, right? I mean your ears are telling you stuff first

[00:14:14] And then as you get With the spectrogram Sometimes you can spot stuff Before you've even played it back Yeah But the two are supposed to work together, right? And so your ears are Your ears are your friend And you know people will say

[00:14:26] Oh I insist on running things at You know 2.3 And if your ears And the project you're on is really 5.0 Go with what your ears are telling you There's no hard and fast rules with that stuff Depending on the variability of the audio That's really nice

[00:14:41] I've found this just This whole like Look before you listen thing And you see young editors Not really understanding The power of scrubbing Scrubbing is like audio zoom Without having to like click There and go click, click, click in Click, click, click out

[00:14:55] You can just find where you need to be And you can be looking at it From a big fat ugly Non-detailed waveform And you can get in there Pretty specifically And yeah like But with the I mean God I remember The original RX

[00:15:11] The first time it came out You could last sue around A frequency band And it would kind of do the Photoshop thing And like whatever that mode is Where it's like the magic wand Just delete And it just take that chain Or whatever it was

[00:15:26] At the high end It's pretty cool So the visual is More useful I think Especially when you get into noise And being able to clarify What it is that you're hearing They do get kind of You look like forensic stuff for sure Yeah You know

[00:15:42] Speaking to you before we started recording You were telling me about your Saturation so you can tell me If you actually had anything to do with this But I did see On socials over the weekend An ad for the new version Of Isotope and you've got someone standing

[00:15:57] On a train platform With a train rolling in the background Let's do that And pristine audio I mean did you use Did you use RX In this mode to achieve that Or what did you use to achieve that Yeah that's supposed to be That

[00:16:14] Let me put it this way First of all about the train thing I had nothing to do with that Even though I fully support The use of the train Come on Mike you were sitting in a murdi And I went I've got an idea

[00:16:22] Yeah and I had to be on location No seriously I didn't do anything That's right Let me check And I need to pick the train too Yeah yeah They need to be No that is a dialogue isolate From this version from RX 11 And we've

[00:16:38] They shot a couple of spots like that Where you know really using it In a real life setting And then taking it out in a post-production setting Because that's quite impressive Well thank you Like we're pretty fired up about This neural net and And you mentioned this

[00:16:54] One of you guys had mentioned it earlier But with dialogue isolate as you said You know we pulled in dialogue dereverb Right and you see that three kind of slider Denoising dereverbing approach In a number of places And the reason for that

[00:17:07] Is the neural nets are getting so good now That you can train it to do multiple things So we trained it to do What was traditional dialogue isolate Right bringing non-stationary Moving background noise out of the background But then it was like wow this net is good enough

[00:17:23] To do dialogue dereverb as well And many of us who are experimenting With different neural nets are finding that And so it logically lands That you can put those two pieces of processing together In the neural net And you can do two things

[00:17:38] And the sky's the limit with that technology Like as we go further down the road And the processing is becoming Particularly with Apple stuff blazingly fast You mean the SkyNet There you go I mean there is a I am SkyNet SkyNet is the end of the limit

[00:17:57] I'll be back We joke about that But I won't go digress too far on that But yeah the technology for tools That are way beyond just sort of like I don't know predicting a stock price Or what the weather is going to be

[00:18:11] Or cleaning denoising is pretty staggering But I was talking about like the You know the happy stuff With audio and that's really That'll blow our minds In six months The AI will denoise all of us eventually But that's fine Well it's getting to the point where

[00:18:29] You'll type into chat like Yeah here's you know A street interview and you just Basically say to it make this sound nice And we're gonna The tech's gonna do that and that's That's on the way You know So a more important question is

[00:18:47] Did you take the chance to take all the knobs And make them go to 11 We Have been using ThisRX goes to 11 In a few spots Mostly sort of internally We're gonna close door demo stuff But we did not Actually make anything go to 11

[00:19:07] You should have made it go to 11 Because to have a question Actually Turning back to something you just said about You would just say to the AI Make it sound nice The question is what defines Nice Well part of What happens with The training that you can do

[00:19:29] Over time is as you had more and More data sets and so for example Let's get specific if we're trying to Train dialogue isolate We're gonna use audio That sounds like clean speech Like in really great shape and we're gonna use

[00:19:43] The kind of noise that you would want to pull out of Speech and then you would do the same thing with reverb You'd take a reverberant Spaces with an echo in the background And you train on that And the thing that's kind of amazing about

[00:19:55] AI and machine learning and neural nets Essentially what you're doing is you're just using A lot of an example that's relevant to what You're doing so as I mentioned you could be Training on weather data, you could be Training on stock market data You could be training on

[00:20:09] You know modeling What happens when a car hits a wall For example and as you Train on enough data you begin to see The patterns and then those patterns Allow you to build them into the algorithm And so I could take a piece of audio

[00:20:23] That we've never trained on And when we throw it into dialogue isolate The patterns that are Characteristic of what the difference is between Dialogue with the difference between speeches Get recognized. So As sort of chat stuffs coming Into the world and we have qualifying Words like nice or

[00:20:41] Good you can Start to build a set of Responses behind that. Nice Is probably a little flaky at this point You might want to say Probably a more useful prompt would be Denoise this piece of dialogue Or denoise this Interview or clean up The background audio

[00:21:01] You know what I mean it'll be a little more directive But the whole plane is Clean it up and then Do the rap version without the swear words too Yeah, it does be included in there Literally Clean version literally Get here to the

[00:21:17] Hammering in the background of Robert's audio So So I guess The question that comes from that From being able to Say hey just do this And this is kind of me folding Two questions in that I had into one Because I kind of see the possibilities

[00:21:37] When you say that is I also use Nectar occasionally And So the two questions I have Is firstly is sort of will Some of the noise reduction stuff fall Into Nectar and then secondly If you take that a step further in that case Sort of will

[00:21:57] You know EQ and compression And all that sort of stuff be able to say hey Hey Nectar I want my audio to sound like This I want this voice to sound like this And the reason I ask is because Even with Podcasts for example

[00:22:13] I have a distinct sound That I give to this podcast a way I Do the masters and do the mastering and all that sort of stuff And The other podcast that I do isn't produced By me I do one about radio imaging

[00:22:25] And Adam who is in the UK Who puts that show together He gives it a very FM sound It's very compressed, very Over-equeued and compressed and all that sort of stuff So do you see a time When all that may well come Together Yes nothing you described

[00:22:43] Is out of the realm of possibility Again it's What you decide to train on So you've got The data essentially the patterns of What any one of those things Sounds like and then you can put them together Which is what I was kind of hitting on

[00:22:59] Hinting at with a reverb And dialogue isolate But you know I think The grail for folks is Sort of I'm a little focused on my world Because of Rx but let me expand it a little bit So I'm always talking about like oh you got

[00:23:13] A sound or an interviewer Or something you need to clean that up How do you do that fast? Could you drop it into Rx Could you give it a command, could you clean it up And spit that out but if you're doing more complicated mixing

[00:23:23] I mean in theory you should be able To sit down and say Okay I'm going to do You know jazz combo With four musicians these are the folks Playing this is acoustically Recorded I have a stereo pair on the drums I need to think

[00:23:39] About denoising I need to think about EQ I need to you wouldn't say But you know you need to handle EQ and compression in theory To be able to do that and I don't think That's far off So when you train it

[00:23:53] And then it finds it's like okay that's what Dialogue sounds like that's what noise sounds like That's what a train sounds like And then you Is it re-symptoizing With what it knows in mind Or is it somehow applying Some sort of never created before DSP

[00:24:11] To isolate these things Like what is the end Process in the end I guarantee It's not a bunch of expanders anymore But I imagine It's re-symptoizing The voice on some level In some cases like we have We have a process called spectral Recovery and what that

[00:24:31] Does is it uses machine Learning to look at What you have and then let's say Above 6K maybe it was a weird Zoom call Or maybe something was compromised One way or another so the machine Learning in that case looks at what you have

[00:24:47] And then because it recognizes patterns And the pattern in that case is like There's a harmonic structure here but it stops Like what do I think Would be the missing pieces in the harmonic Structure or there's a transient But I know transients Usually go up to 20K

[00:25:03] When someone like a snare hit or something's dropped As part of a scene I don't see that From 6K and above So what the neural net has done Or what we've trained it to do in that case Is to look at the patterns of the audio

[00:25:17] That's present basically predict What's coming and in that case Yes, it is synthesizing new audio That didn't exist before And adding it In the case of noise reduction It's not re-symptoizing In the case of noise reduction DSP processing But like you said

[00:25:37] It's no longer a fast Fourier transform It's not a gate It's all new stuff I'd imagine As far as like In this case it is So we distinguish between digital signal Processing which is sort of the more traditional Stuff like writing algorithms To de-clip and to remove clicks

[00:25:55] You can do that stuff sort of the pretty traditional way The machine learning again Is I'm probably Talking about it too much but it is super exciting But that's where you are training on it And then that training is built into the neural net

[00:26:07] And the neural net has a little bit This is kind of fudging it But it's a little bit like a brain DSP doesn't really have a brain But machine learning kind of does And when it sees patterns that it recognizes It's looking to pull out

[00:26:21] The things that are noise But try to leave the frequencies That are dialogue behind So it's not going in with like a sledgehammer And smashing away at the noise It's almost like it's going in with a scalpel And making a distinction between

[00:26:35] Okay, there's a voice that covers this frequency Range. I don't want to take any of that Voice out of that frequency range Or feel like it's missing But if there's noise in the background At the same frequency range The learning and seeing the patterns

[00:26:49] It's able to make a distinction Without re-synthesizing And kind of pull out the stuff that you want And kind of push back or block the stuff That's processing. It's Like it's either EQs Compressors, expanders, gates Fast Fourier transform Those kinds of traditional things Or it's

[00:27:09] Re-synthesizing without the drill in the background Like how? Because it looks at the waveform And says, I mean even when I De-click like I'm in Pro Tools And I see a click And I'm like, oh crap, I know the waveform Didn't want to go there. I am drawing

[00:27:25] A waveform where I knew it should have gone Had the click not been there And I'm literally By drawing, re-synthesizing That little portion of the waveform Give her that. Yeah, it's never the drill No, it's essentially Knowledge of what the differences are

[00:27:43] And kind of being able to reapply it But I think If you really want to go deep on it I'm going to hit the wall pretty quickly As a product manager where I can go I can talk to my PhD colleagues Who do this stuff

[00:27:57] And they can really tell you more about how far I can go, but That basic concept of being able To train on something, recognize patterns Bake that into the algorithm The neural net is a sophisticated algorithm And then be able to apply that

[00:28:11] Works in a lot of cases, but it is Very different from digital signal processing And it does not have to involve re-synthesis Actually through work So that's about how many hours I can take Maybe this could be new processes New methods of compression Like neural compression

[00:28:27] It's no longer just a threshold And a thing, it's a I know what you're saying And I know that vowels have You know, like I'm going to go Heart, I'm going to compress a lot here I'm not going to compress a lot here Being able to control transience

[00:28:43] Almost like compression Like an audio engineer Writing the fader Instead of a threshold And a slope A very hard set of math equations Compared to something you might automate Exactly I'm thinking of something like Particularly for voice people Who are working from the home studio

[00:29:05] And it's not actually perfect I could actually see this Being used as a pre-tape Kind of plug-in So you actually dial it all in So it's working on the noise floor And everything in your booth And correcting it before it actually hits the tape Would that be...

[00:29:23] Would you bake it in before recording though? Before getting to your door? I would say if you're finding the faults with that room I mean, if there are constant faults With that room, then why wouldn't you bake it in Before it hits a tape?

[00:29:35] At that point you would want to bake it in Because it might be someone has to bake it in Who's got the knowledge of with and without the room It's going to say you'd want to do it The AI can set itself properly

[00:29:45] You know, like you can solve all that data That's a big risk Sending some audience and someone goes What did you have to say? Well you always record a backup right? Yeah, exactly You could use the VO passport To input two signals On the left and right side

[00:30:03] One process to the other one Exactly So Mike we've got a bit off track What else are we looking for in the new version? What else is new and improved? Yeah, let me show you some other stuff So in Dialogue contour This we've had since

[00:30:21] Excuse me, since ARC7 In 2018 Dialogue contour is a tool that is used To basically shape Either empire weeds Or sections or syllables Little tiny pieces Of a phrase or something like that So that isn't new But for folks who aren't as familiar with it

[00:30:41] What you can do is Either this is command A for folks For keeping score, you can select An entire piece of audio And put it in, or if you wanted To target processing to a small piece You could do that and And spectral, I'm sorry

[00:30:57] Dialogue contour picks that up So let me play this This is a simulated Answering sort of a voicemail message Please listen carefully as your menu options Have changed Time and queue is four hours So if you were cleaning This or anything like this

[00:31:15] Where you've got dialogue and you're getting Slightly, I don't know weird reads On a particular syllable This syllable up in the front I'm just gonna grab it right about here Squeeze Squeeze That please sounds a little weird to me So it was like squeeze Kind of

[00:31:35] And so if I wanted to go in I could put some nodes in here On the pitch curve And I could go up on the pitch or down if I wanted to What I wanna do though is I wanna bring it down a little bit

[00:31:47] Cause I don't want it to go I think the upward read is a little bit weird So if we Just do something simple like that We just drew in the line over the particular Characteristic Of the speech Squeeze, squeeze, squeeze, squeeze, squeeze

[00:32:03] Squeeze this carefully as your menu options That's a little lower now Squeeze this carefully as Squeeze this carefully as Squeeze this carefully as You'll hear that it's high Squeeze this carefully as your menu options And so that's one example

[00:32:19] Of sort of working a little bit at a microscopic level Where you can go in and you can select Any part or any speech And if you're feeling like You know somebody went a little too high In their read, you can bring that down a little bit lower

[00:32:31] Same thing as if the intonation Same kind of situation if the intonation's wrong For like, hey I'm really happy Right, you can bring up the pitch on that And then you can also now The two new things I wanna talk about Are the formant and the variation control

[00:32:45] You can start to lift things And also change the quality Of what somebody is doing So I'm gonna do this in sort of broad demo mode Just to show you the range of things And then we'll talk more about subtle being a second

[00:32:57] So if we go back to I'm gonna select this entire Selection and The entire piece of audio And then we just play it in preview mode And then do some extreme things So you get a very broad sense of what the controls do

[00:33:11] So pitch we've had for a while And that's gonna be, I think pretty obvious To folks what it does Please listen carefully as your menu options have changed Your time in queue is Four hours Please listen carefully as your menu options Have changed

[00:33:27] So again I'm showing this in broad strokes And the dial I got in here I wanna make very subtle Semi tone adjustments But imagine you were doing sound design For I'm gonna reference gaming and animation again If you are really trying to create a new character voice

[00:33:41] Pitch could be one of the tools that you have Formant will change the quality Of the voice Please listen carefully as your menu options have changed Your time in queue is Four hours Please listen carefully as your menu options have changed And then variation

[00:33:57] Is basically looking at the sort of natural pitch fluctuations That are already in the read And if you push it up You're adding sort of 100% you're adding twice as much expressivity So if somebody was Somebody was really talking like this and going up and down

[00:34:11] You would really emphasize that If you didn't like that and you wanted to go more monotone You could go in the other direction Let's hear the extreme Like Please listen carefully as your menu options have changed Your time in queue is Your time in queue is Four hours

[00:34:29] Please listen carefully as your menu options have changed Your time in queue is Four hours Please listen carefully as your menu options have changed Your time in queue is Four hours That's like the make it sound like AI said it here That's the Gen Z read

[00:34:47] Oh I like that I'm gonna have some fun with that Here give me a favor Pick your fish All right That's the next one The response That's you I was linked on my chat I feel like My channel market Is Our It's Challengeable That's over That's it

[00:35:19] It's over This is great for making your mind work. Partly. Yeah, it's got absolutely... Like on the extreme settings, it's got all kinds of sound designer in me is jumping out of his skin right now. Yeah, really? Yeah. And that's... First of all, that was very fun.

[00:35:37] So thank you for allowing me to do that because in certain settings people are like, no, no, I want to create two semi-tones. But the thing that's really important about this is it's really designed for a wide range of application, right? And so again, if you're working on...

[00:35:51] Let's go back to Star Wars. If you're working on the dialogue in the Star Wars film, you're going to really be tweaking pitch. You're going to be doing individual syllables. You're probably going to be literally working in semi-tones just to make final adjustments to a dialogue performance.

[00:36:06] But if you're doing something like a sound designer, super creative, it's a really fun tool to really push the voice of whoever is speaking. The radio image in me is coming out right now because I'm thinking about Zips and Zap

[00:36:21] and Bangs and Washes and throwing those in there and the possibilities of what might come out the other end too. Especially layering it under a straight voice. Yeah, totally. Absolutely. Take the original, put that in there, screw around with it. Telephone voice is so dated.

[00:36:36] The new telephone voice is digital mangled voice. And not necessarily like, oh, no low end, no high end. So did that come from a request? Or was that just something that someone dreamt up? This came from dialogue editors who said, and again, they're not the ones pushing the

[00:36:57] range. That's the sound designers with the dialogue editors were saying, hey listen, we like where you were going with dialogue contour a few years ago. But I want forming control. That was a specific request. It reminds me, what was the one back in the day?

[00:37:11] Was a pitch doctor or pitch? I think. But it was one of the first ones to come out with forming control and it was just awesome. You could do all kinds of crazy stuff and make all kinds of... Look, the go-to for radio is Serato pitch in time.

[00:37:31] But I can see two different beasts don't get me wrong but I can see a use for that with the same sort of things I'm using pitch in time for right now. Absolutely. That's awesome. I'm really glad to hear that. Yeah.

[00:37:45] Let's see, I'll jump to a few other things. The mouth de-click is always super impressive with you guys. This is like the thing that in the voiceover world is killer app. Like you guys have that market for sure. Yeah, mouth de-click. Oh, awesome. Thank you.

[00:38:00] Yeah, no, that's a real staple. I mean, people really can't live without that. Do you know has it been much to do in regards to mouth de-click in terms of optimizing it? Is that still done in the relatively traditional way, that mouth plugin? Yeah. It really is.

[00:38:20] It's still... I would say high end but high end digital signal processing. It's not... We haven't put machine learning into that. That is one of those things that's been pretty... We've improved it periodically.

[00:38:31] Actually, you know when we fit last, I think it was a few versions ago, I'm going to do a full disclosure thing here but it's an interesting story I'm going to share.

[00:38:39] One of the things we found out with mouth de-click was that we had mostly worked on it in the context of English and more Eurocentric languages. What we found from some of our colleagues in Japan was that it was beginning to go after a couple of...

[00:38:59] I don't know how to describe the part of speech. Legitimate syllables, yeah. Yes, it was removing... Well, there's languages in Africa that are literally clicks. Yes, yes. The clicks are part of their... Yeah, yeah. Yeah, yeah. So that was exactly...

[00:39:13] So this was one of those moments, not as extreme as that example and that is a very valid example but this was us getting great notes from our colleagues, the team over there and some of the folks that we sell the software to and they said, hey, sometimes

[00:39:27] this works great and other times you're literally mangling my language and we're like, okay, all right. We got to think that seriously. So we're going to make a mode specifically for French. Just the French language. You think that one specifically? I think that fits pretty well.

[00:39:43] It's very clicky. I was going to say that's when I do the super mouth clicky. Yeah, like they need more mouth de-clicking than other languages. For yourself. Yeah, that makes sense. With the Japanese language stuff, we went and we studied a bunch of samples and

[00:40:00] we actually tweaked the DSP to fix that because we'd really gotten that as a note is like, hey, you're missing this. This is crazy for me to say this out loud. One of the things that we want to be working on for the next version is

[00:40:14] breath control because breath control is one of those pieces of processing that would be immensely helped by machine learning in particular. Yeah. So I'm hoping to get to that down the road. No, I can't. It's not promising but that's something I really want to look into.

[00:40:28] Every week, somebody's asking me how to make my editing easier on my audio book. How do I get this done faster? How do I do this? The problem with most of the deep breath ones is that they also take out the Ss and

[00:40:42] they go after a bunch of other stuff that it shouldn't go after. Yeah, like a lot more things like ouch. Yeah. There needs to be a lot of machine learning for a deep breath or to really do a good

[00:40:55] job because the best deep breath or is a literal human editor going through and managing it. The same goes for the D clicker too, I would say. Great. Ultimately. Well to a lesser extent but yeah. Yeah, to a lesser extent.

[00:41:08] Yeah, but the trouble is with deep breath that the way we talk is based on air movement. So breathing and talking both involve inhaling or exhaling air and it's trying to get some AI to... But sometimes you're absolutely right.

[00:41:27] Sometimes leaving a breath on the end of something is necessary to make it sound normal. Some ends of sentences need to have a little to it. Yes, and some of the breath is part of the actual performance.

[00:41:40] And so when you have characters in a role, breathing can be actually a part of the actual performance. So this is why I'm so careful and I hear about somebody saying, well first thing I do is remove all the breaths. I'm like whoa, whoa, whoa.

[00:41:53] Only if it's an announcer's thing. Yeah, but if one of the greatest giveaways for an AI voice is all those little breaths that you know like sheesh, you know, and all those little sort of breath things that

[00:42:05] happen at the end of certain words, you know, that's one of the biggest giveaways for me. Or overly obvious, overly obvious breaths. So you know like the Apple speak thing? You can just type something and you go to the text and say Apple speak and it's

[00:42:18] like blah, blah, blah, blah. Blah, blah, blah, blah. That's like your card in between sentences. Yeah, yeah. It's hard to ventilate with you. Well, that'll be pretty amazing when that becomes reality. Not that you've shown your hand or anything, but I mean, it's technology.

[00:42:35] We all expect progress and we expect, frankly, we just expect these tools to be, you know, as you're well aware, like as AI has become a part of like is becoming progressively now part of our everyday lives.

[00:42:49] Not everybody, but certainly a lot of us who produce content, you know, as AI as a part of, I mean, it is for me. The expectation level of what software should be able to do is going to rapidly increase at the same time.

[00:43:06] So like, yeah, as people can do more, it's just like when computers came along in the 80s, you know, it's going to save you time. No, all I did was expect you to produce more widgets for our.

[00:43:19] That's what computers so like with AI, it's just taking that and multiplying that by 10. So now because your tool can do X, there's no reason why it shouldn't can be able to do Y and C. And when is it going to be able to do that?

[00:43:32] You know, yeah. And that's what's so hard about developing in this current like climate. If you. Yeah, absolutely. I mean, that's true on so many levels because all of us who are on the software side, we're all scrambling to make sure we can deliver on that stuff.

[00:43:46] And then the folks who are on the, you know, content creation and side. Exactly what you just said, they're all kind of like, all right, so now I can see what some of these tools can do. Then you should be able to solve five other problems for me.

[00:43:58] Like that was really impressive for about three days. I'm no longer. It's not enough that I didn't have to do any work for that one and I can still charge my client the same amount of money. I need to do even less work.

[00:44:14] Well, there's much of the other side of the coin. I mean, that's the dream is to do less work for the same amount of money. I mean, come on. But we hear a lot about the money. Yeah, for the same amount of money.

[00:44:25] Right. But what we hear now from a lot of folks that are out there because software has gotten so good and this was even before AI was all over the place, but now certainly including AI. You know, as folks who are editing on, you know, big TV

[00:44:37] and film productions, the amount of time that the audio department is getting is always minuscule. Always at the end and it's less and less. And then people are kind of like, we've literally had people say to us that if they're showing something to a producer,

[00:44:50] whoever the client is on the project and they're just kind of like, yeah, just RX that we don't even just go fix that. The old fix it in the mix. It still exists even in a digital landscape. Exactly. Fix it in the mix. Those are upset.

[00:45:07] And by the way, you don't get six days to do it. We're going to give you a budget for four days because you've got this software. So go do that. And it's hard on those folks for sure. All right.

[00:45:15] And if you have an excuse, you guys will make a three. Yeah, exactly. Has anybody asked you the opposite? Like please don't make that because I'd still like to have a job. Literally no one says it's always the other thing like,

[00:45:30] can you do stuff so I can go faster and faster and do more projects and, you know, get stuff done? It's always like speed with the highest quality you can deliver. But I love that. I would love for someone to come to me and be like,

[00:45:41] can you slow down the process? We have had people go to us about the source connect and say, please don't make it less expensive. Like I miss the days when ISDN was completely unobtainable and there was only five of us. In here.

[00:45:54] I don't think I can do that. Here's a question, probably the last question because we're going to have to let you go. You've been with us for a long time. But just a quick question about the way you had the pitch

[00:46:05] and all that stuff you were demonstrating before. I kind of see that if you're doing audio, ADR, audio dialogue replacement, you could actually synthesize a voice if you have to drop in a part of a word or a word to match the original dialogue using that plugin.

[00:46:22] I would think totally. Yeah. That area of voice synthesis, which, you know, you're starting to see people release some things and there are some issues about, you know, deep fakes and all that stuff out there. That's still relatively new. So the things that we're doing

[00:46:40] that are a little bit old and now like stem separation, whether it's musical stem separation or dialogue and noise and reverb, those have been around longer. Voice synthesis is a little bit newer. But but everything you said is, I mean, in some cases,

[00:46:53] we're already seeing versions that are not respectable. I've done it. Even with even with traditional pitch shifting and like you have two different words that you need to do an edit and the person was low in pitch here coming in high in pitch there.

[00:47:07] And then you little just just, you know, crowbar it into place and and maybe sounds a little bit funny for a split second of a vowel, but you get those things to merge. I've literally had clients ask me to take a statement and turn it into a question.

[00:47:22] Yeah, just a little bit tougher, but you can start to approach it with just. Yeah, pitch shifting. Traditional pitch shifting. There's that. Yeah. I mean, in some cases, you can do that with dialogue contour. You can shift the question to a statement. It's tricky. You can do that.

[00:47:40] Some cases probably can't. But but the idea of voice synthesis, I mean, truly like being able to. And again, I'm talking about I'm not advocating, you know, stealing or anything, but we're talking about in situations where let's say you have a production and you have all the dialogue

[00:47:53] from a lead actor and the lead actor has given permission and said, oh, there's there's a couple of things in there that that you guys can fix or you all can fix and allowing permission wise to re-synthesize that actor's voice for a blown line.

[00:48:10] So you don't have to go back and do ADR or, you know, to fill in the certain certainly the stuff that they come back and do sometimes the action scenes where they're yelling or they're doing whatever, you know, use responsibly and in conjunction with the people

[00:48:24] who are giving permission for that, you can do some pretty incredible things that don't require bringing actors back in and that don't require setting up a ton of looping, which is often really expensive and hard to do. Used irresponsibly. You all know what we can do with people

[00:48:39] saying things they never said and that kind of stuff. But there's some really cool genuine use cases in there that I think are going to really help anybody who works in audio. Those are on the way. And you know, those are exciting. Beautiful on that note.

[00:48:53] Rx11 from Icetope. Check it out. Hopefully this has been an informative podcast for you. Check it out. I really enjoyed it and really appreciate you giving me a chance to talk about Rx11. Mike, it was really fun to have. Yeah, it's great hearing it from the

[00:49:08] someone on the inside out and hearing what's going and what might be coming. So yeah, that's really even more exciting. Well indeed. My pleasure. Well that was fun. Is it over? The Pro Audio Suite with Thanks to Tributh

[00:49:23] and Austrian Audio, recorded using Source Connect, edited by Andrew Peters and mixed by Voodoo Radio Imaging with tech support from George The Tech Wittom. Don't forget to subscribe to the show and join in the conversation on our Facebook group

[00:49:37] to leave a comment, suggest a topic or just say good day. Drop us a note at our website, theproaudiosuite.com. Take you to a hotel room and scare the cleaner and also Austrian Audio making pleasure. What? Making what? Making what? Making pleasure.

[00:50:01] Not that hotel room, not that hotel room. Making passion heard. Oops.