Re: Boz - The future and AI plugins

Jonathan · July 15, 2017, 12:54am

Anyone tried Izotope Neutron? The automatic mixing plugin?

I would have bought this program in a heartbeat if it actually worked. And I would pay a ton of cash for it, because the amount of time it could have saved is enormous.

Izotope already owns the mixing tools they need to make a plugin like this work. Their problem is purely on the coding and AI end… If they integrated the RX tools (that’s their sound restoration suite) into an AI they’d turn the audio industry upside down.

…Now if they could just take that one step further then automate that with an AI, then they’ll own this industry for years.

…If I owned a fully functional super smart AI…if this thing could also learn and adapt to your mixing style as Alexa and Siri adapt to your speech style, I could finish days worth of nasty, dry, boring dialogue and SFX editing in hours

… Imagine how many audio engineers you wouldn’t have to pay, if it could balance and navigate an Atmos object panner and balance a digital summing matrix algorithmically? Wouldn’t that be amazing?

Anyone tried Izotope Neutron? The automatic mixing plugin?

I think the main problem with plugins like these is that it’s taking a backwards approach to “smart plugins.” I don’t think it makes sense to try to make an algorithm mix like a person, because there’s too much variability in how that’s done.

I mean, look at photoshop. It’s not like Adobe tried to create a computer version of how a brush works. They took the basic idea of painting/drawing and added all sorts of completely different things to it. That’s what makes it powerful. But nobody, as far as I know, is trying to make smart filters that apply the right settings to white girls in bras. That idea doesn’t even make sense, so why do we try to do it with audio?

To me, that magic of AI when it comes to music won’t be in the mixing department, it will be in the creating and arranging department. For an algorithm to tell me “hey, if you add this chord transition here, it might be cool,” would be way more useful.

Or for an algorithm that listens to the music and automatically creates a drum beat for it, or a bass line, or whatever. Those things are all possible and match the strengths of AI really well.

We already have “dumb” versions of this (meaning not AI, but hard coded performances) with stuff like EZDrummer and EZKeys. But doing an AI version of that would be awesome.

20/char 20/char 20/char 20/char

Jonathan · July 15, 2017, 1:25am

[quote=“bozmillar, post:9, topic:1599, full:true”]
I think the main problem with plugins like these is that it’s taking a backwards approach to “smart plugins.” I don’t think it makes sense to try to make an algorithm mix like a person, because there’s too much variability in how that’s done. [/quote]

Isn’t that what complex AI algorithms account for though? Think about how much variation there is in human speech, but voice recognition software manages to consolidate patterns of consonants and vowels into words and sentences based on the common thread between words in a similar language. I know that detecting speech is different than treating it, but couldn’t you take the same wave form analysis that you’d run in Rosetta Stone or Amazon Alexa and use that to drive responses to a piece of software in a DAW?

Remember that last thread where AJ told me I needed to dump swatches of 400hz in a vocal? Why couldn’t a computer figure that out and do it for you? If he can hear it, surely an AI can…the question then isn’t hearing it. Its knowing what to do with it…right?

There may already be something like that out there, but I’d bet a lot of money that if someone made a filter that analyzed a body shape then ~removed~ a bra, it’d quickly become the most pirated software on the planet.

Do you think that might just have to do with demand?

So the software that’s already out there can go head to head with any human, but only in areas of music where the rules are clearly defined. So these things can write inventions, fugues, toccatas, in the style of Bach, Handel, or Vivialdi based on the hundreds of works we have from them.

As for pop/rock/country/blues stuff…there’s already stuff on the market that can make suggestions. Hell…band in a box did that.

Also…for lyrics…there’s a case headed to the US supreme court over who would own the rights to lyrics that an AI generates. lol. My understanding of the case is that the software owner can claim them in terms of use agreements all day long, but the agreement may not mean anything at the end of the day.

Hell dude…I suck at writing lyrics. I would love to have a computer spit out lyrics to a melody I write, then be able to sell and license them without having to pay a co-writer on the split sheet.

Why would it be good at that vs listening to a mishmash of frequencies then creating an EQ for it?

What’s the difference? I mean…not with EZdrummer or EZkeys, but the difference between an AI and something that has a hard coded performance?

anon51315799 · July 15, 2017, 6:20am

Fascinating thread, watching Jonathon talking to himself.
It’ll be fine, JK: someone WILL talk to you.

Jonathan · July 15, 2017, 6:58am

Well, I guess I now get the gracious honor of explaining how a forum works. We were talking on a previous thread about AI with Izotope. Boz started talking about AI’s in general, so I copy pasted what @bozmillar said, with a response in the new thread. I sort of had to quote him there, because without his response, there wouldn’t be enough context to continue the discussion.

On this forum, when you see the little dot with someone’s logo, it means they responded.

Uhh…ok. Sure.

anon51315799 · July 15, 2017, 11:20am

No JK, you don’;t need to explain forums. The primary poster on this thread is you. The blue stripe down the left of a post indictes it is a quote.
You have quoted yourself six times and answered YOURSELF five times.
Until I posted, no-one else had posted anything in this thread. Boz’s reply comes from the original AI thread and YOU posted it as a quote.

Oh, and that little dot next to Boz’s name. Is it or is it -n?

See, teaching grandad to suck eggs is a waste of time, especially when you are unable to understand the subject you’re explaining.

bozmillar · July 15, 2017, 3:35pm

@Coquet-Shack, do you not have anything better to do? Maybe we need a separate section of the forum for you to post your irrelevant tirades.

Definitely a discussion I want to take part in, but responding to any of these questions would require both a lot of research and a lot of typing to do it any justice, both which I want to do. I’m out today, but I definitely want to talk about it more.

redworks · July 16, 2017, 11:25pm

Yeah i am interested in this line of thought as well. I am currently too busy to take the time to comment too heavily but i do think that the kind of thinking that boz is getting at. i like it when we can take the known and mix it (pun intended) and come up with some familiar and distinctly different. Anyway as i said no real time now but i will soon (I HOPE) and then i would love to read what others have to say about the subject.

LazyE · July 17, 2017, 7:10pm

the only way i can see the program working and being able to mix for you is if it automatically compares the mix to hundreds of proof mixes in its memory banks. it would need to match genre, style, tempo, effects, etc all manor of things till it creates its own reference track made up from it taking bits from hundreds of other tracks. then it would match all the likes for likes and spit out a mix that sonicaly matches the bespoke reference . the programming for that would probably be larger than what first took man to the moon!

LazyE · July 17, 2017, 7:15pm

otherwise wouldn’t it just be working as an automated gate? filtering out set frequencies . i dont know. i`m pretty much out of my depth with this to be honest.
cool thread though Jonathan
:beerbang:

Jonathan · July 17, 2017, 7:58pm

[quote=“LazyE, post:10, topic:1601, full:true”]
the only way i can see the program working and being able to mix for you is if it automatically compares the mix to hundreds of proof mixes in its memory banks. it would need to match genre, style, tempo, effects, etc all manor of things till it creates its own reference track made up from it taking bits from hundreds of other tracks. [/quote]

I’d envision teaching this thing the same way you would any other AI. Dump several hundred reference mixes into its database then let it adapt. Just like the Amazon Alexa. You train it by teaching it to listen to something over and over again. And if it keeps tracks of what you change (after you make its suggestions), that data should be usable for improving the suggestions it makes. I realize that Amazon has a lot more money than Izotope or Waves, but we’re heading into an era where companies that create AI are poised to be the wealthiest entities on the planet. As AI technology becomes cheaper and more generic, why would the music industry not adopt it just like it did when Pro Tools first came out? And then dump it all the way down to freeware and our $70 Reaper licenses.

I don’t think it will stay like that forever. Unity and Unreal are freeware engines that allow wanna be game makers such as myself to design playable functional games having only studied about 3 months of coding. If a platform comes out like Unity, but for creating AI’s instead of video games, the heavy lifting is taken care of by the engine, all the person has to do is learn the language the engine runs on. So in 5-10 short years from now, you or I could sitting around designing apps that learn shit all using a computer program that that is designed to make AI’s accessible to dummies like me. You think?

bozmillar · July 17, 2017, 8:01pm

well, neural networks don’t have a lot of code.But you do have to decide which features to send in to the NN be trained. The hard part is that it’s really hard to know what features actually help the NN do a better job and which features make no difference. And there needs to be some sort of universally accepted outcome.

NNs can get stuck in ruts where they find the local maximum, but not the global maximum, which means it may discover a pathway that creates really great mixes for one set of tracks, or one style of tracks, but completely destroys a separate set of tracks

People fall into the same issues though. that’s where superstitions come from. When we attribute one thing to another because we find patterns in it, doesn’t mean the patterns actually exist. But our brains are far more complex than NN, and we can naturally break the patterns when we recognize there is an issue.

For example, it’s completely possible for a NN to spit out white noise and think it’s ok. It needs to receive the feedback telling it it’s wrong in order for it to know.

Also, NN are much better at doing their thing when the results have a statistical element to it. For example, if you have a zillion pictures, and you want to use a NN to do facial recognition to find somebody’s face, it works really well. In google photos, I can enter my name and see a large list of pictures with my face in it from my personal library of pictures. It almost never gets it wrong. If that’s all you see, then you would think it’s 100% accurate. But what you don’t see are all the pictures it’s not showing me with my face on it because it go them wrong. Even if it’s only 20% accurate, as long as the cutoff is set such that it only shows pictures of me that it is 95% possitive are me, then it looks to me like it’s 100% accurate.

So while a NN can be really good and sorting through songs and picking out features that have good qualities, it’s much harder to throw stuff at it and have it come back with 1 thing that is good every time.

I’m not saying it’s not possible, I’m just saying that it’s a far harder problem to solve than working the other way. Nobody wants a plugin that delivers really well 30% of the time but delivers pure junk 30% of the time.

Jonathan · July 17, 2017, 8:07pm

Hey man, I’m out of my depth too. The only reason this even crossed my mind is that I was sadly disappointed by Izotope Neutron. But heavily inspired by Mark Cuban’s speech at Oxford on how automating automation is the future of technology.

Jonathan · July 17, 2017, 8:10pm

Why isn’t it then just a matter of teaching it the difference between a country track and metal track?

When you disable the white noise plugin every time it tries, can’t the AI make note of that and not do it anymore?

So what if we backed off a bit and gave it an easier task. How about clip, isolate, and crop used vs unused regions in the daw. Then apply crossfades based on the type of instrument and speed of the transient. Then clean up unused regions from the audio pool or bin. Then detect the types of instruments used in the session. Then apply a picture graphic, a color scheme and a label to said instrument.

I mean, forget mixing stuff completely. If its really about sorting through features, can’t you at least have it clean up and organize a DAW session?

bozmillar · July 17, 2017, 10:43pm

What is the difference between metal and country? I mean, you could use a NN to detect twang in a person’s voice. But even if we were sticking to strict genres, does that really give you many hints on how to mix a song?

Mixing often requires decisions based on decisions based on decisions. You have to take into account all of the instruments used and their purpose for being used. I think it would be really hard for a NN to hear a song and say “This song would really benefit from the piano driving the chorus, but not the verse.”

And in order to train it, something has to tell it whether it did it right or wrong. It’s hard enough for Pandora to know what kind of music I like because the act of training it is very binary. I have to tell it that I either like a song or I don’t. Often times, I can’t quantify why I don’t like a song.

I’m not talking about a white noise plugin. I’m talking about the fact that NNs process things in very unpredictable ways. You can get some hints about what features might have a significant impact on the NN’s ability to give you good results. They are really really good for some tasks, and really really bad for other tasks. And for some tasks, they could be really good if we could provide it with enough data. Add to the fact that there’s no real agreed upon end point, how do you even go about training it?

That seems like a better job for hard coding that NN. Mixing in it’s current form is more of a game of setting things up in a way that you are comfortable. I already have actions in reaper that do this for me, but they only make sense to me. Doing something that is useful for lots of people would be really hard, because everyone has different requirements. How do you train something when everybody wants it trained differently?

Again, I’m not saying it can’t be done. But I do think there are better and easier ways to utilize AI in music production.

Jonathan · July 18, 2017, 3:43am

Just to be clear, is there a difference between neural networking and machine learning?

bozmillar · July 18, 2017, 3:44am

no, NN is a common method of doing machine learning, but there are other methods as well.

Jonathan · July 18, 2017, 4:35am

So I see 3 different problems? Making distinctions, making correct decisions and prioritizing processes based on an ‘intuition’, and then arriving at en end result is far more complicated than Netflix figuring out which shows you probably won’t want to watch.

I think I’m starting to see where my own misconceptions are confusing the issue. It has to do with most machine learning being focused on arriving at a specific concrete solution, and under more controlled circumstances, with far less variables to account for.

bozmillar · July 18, 2017, 4:48am

Machine learning is basically really good at pattern recognition. And not just obvious patterns, but patterns that may not look like patterns. Basically any patterns that will help it arrive at the correct outcome the most often.

For example, it would be really really hard to program code that could pick out a telephone booth in a picture. Telephone booths come in all sorts of shapes and sizes, but under normal circumstances, a person can look at an object and know whether or not it’s a telephone booth intuitively. It’s not something you need to think about. You just know it because you know it.

NN are the same way. You can train it to know what a telephone booth looks like and it just knows it. Then you can give it a picture of a random telephone booth and it will know that it is one. That kind of thing just couldn’t be done with brute force programming. And there are many similar problems like that that brute force programming just couldn’t do in the past. It really opens up a lot of possibilities with new things that can be done.

But it’s still not easy. You can’t just give it raw data and expect it to work well. Too much data and it might find irrelevant patterns. Too much data and it might not be able to find any meaningful patterns. So you really have to spoon feed it the right information so that it can process it reliably. And that information often doesn’t make that much sense to the programmer.

Also, keep in mind that we live in a world where if something is failing 1% of the time, it drives us nuts. We expect high levels of reliability. It would take a ton of research and testing to be able to come up with something that worked reliably.

LazyE · July 18, 2017, 2:28pm

No matter how good a programmer or how good a program it will never be able for o make artistic decisions based on what sounds nice. Its just crunching data. Soulless

Jonathan · July 18, 2017, 6:12pm

I wasn’t assuming it would make artistic decisions. I had merely hoped (Izotope Neutron in particular) would be able to make surgical corrective ones based on frequency analysis, which at first I thought would be comparable to ‘soulless data’.