Post by Liebezeit on May 22, 2020 18:25:11 GMT
In this post this concerns the use of machine learning to train the neural network to follow the speech patterns of someone else, stored in mel spectrograms and made it work as a text to speech synthesis (think DecTalk) through a decoder e.g. WaveNet which spouts out any words from the voice.
This is some terrifying, scary and interesting stuff. And it's also complicated.
So basically somebody ran a compiled voice of celebrities that's at least many hours long with a transcript to train the network that oversees the speaking patterns of the celebrities (common name, Tacotron) and then train WaveNet (two pass I'm thinking) so it can generate audio via a text-to-speech reminiscing of Google Translate. And they got in trouble at some point because Jay-Z and Roc Nation struck down YouTube channel Vocal Synthesis to take it down. However the DMCA claim was incomplete and the Jay-Z synthesis videos were brought back. This is unprecedented, and it overwhelms me greatly as a dude who's been constantly exposed to technology (cellphones and TikToks don't count)
If it's one thing I'd like to see in an ABBAtar, I'd like to see Björn Ulvaeus use the same technology so the ABBAtars have its own kind of voice and it would be a fair deal for the fans who desperately want to see ABBA sing their new songs and the members who were clearly exhausted and moving on from the youth days, content with their retirement or cozy living.
Most of the videos presented here comes from the channel Vocal Synthesis and most of them are coarse and too ridiculous to be included, so here's a much more digestible example of what I'm talking about. And, of course, this channel is not the only one that does this. There are a lot more copycats ahem inspired people doing this. This could be used either as a weapon against or for the humanity as others would say. The future is looking a bit odd today.
UNCANNY!!!
This is some terrifying, scary and interesting stuff. And it's also complicated.
So basically somebody ran a compiled voice of celebrities that's at least many hours long with a transcript to train the network that oversees the speaking patterns of the celebrities (common name, Tacotron) and then train WaveNet (two pass I'm thinking) so it can generate audio via a text-to-speech reminiscing of Google Translate. And they got in trouble at some point because Jay-Z and Roc Nation struck down YouTube channel Vocal Synthesis to take it down. However the DMCA claim was incomplete and the Jay-Z synthesis videos were brought back. This is unprecedented, and it overwhelms me greatly as a dude who's been constantly exposed to technology (cellphones and TikToks don't count)
If it's one thing I'd like to see in an ABBAtar, I'd like to see Björn Ulvaeus use the same technology so the ABBAtars have its own kind of voice and it would be a fair deal for the fans who desperately want to see ABBA sing their new songs and the members who were clearly exhausted and moving on from the youth days, content with their retirement or cozy living.
The full papers if you want to read it. I may have worded something wrong so feel free to correct me.
arxiv.org/abs/1712.05884
web.archive.org/web/20200415005845/https://arxiv.org/pdf/1712.05884.pdf
arxiv.org/abs/1712.05884
web.archive.org/web/20200415005845/https://arxiv.org/pdf/1712.05884.pdf
UNCANNY!!!