Transformed or synthetic voice, a new playground for the creative industries

We are entering a new era that belongs to voice.

From a technological point of view, the deployment of 5G combined with new computational power is leveraging years of research into voice reproduction and transformation to produce tangible results.

Voice applications are developing, boosted by their ease of use. If each person interacts with others through more and more channels, then the constant renewal of this content requires the upgrading of listening experiences.

More broadly speaking, after decades of domination by images and text, the sensory and emotional dimension conveyed by the voice is becoming a new frontier for the transmission of feelings and for the creation of new multi-sensory experiences.  A new playground for personalising one’s avatar in the metaverse or in video games? Retrieving the power of speech for historical figures by reconstructing their voices in the documentary and audiovisual fields?

In this way, voice cloning, and more generally voice synthesis, must respond to this rapid change by ensuring that they are accurate and accountable.

A mutually beneficial partnership between human and machine to achieve more expressive speech

There are many tools available today for producing brief voice content – a simple search on voice cloning will bring up a dozen companies. Most of them actually offer only simple creation of voice content from text. This approach allows neither understanding nor interpretation by the machine, thus uncontrolled emotion!

Another way is possible. Firstly, to rely on the virtuous partnership between human and machine in the design of algorithms and post-production, to intervene on possible biases linked to inclusiveness (gender, accents, etc…). Secondly, to rely on an actor to transmit the intention and create a base for the transformed or synthesised voice by its prosody and thus generate a high level of sensitivity and unparalleled realism.

Thus, the main opportunity for voice cloning lies at the high end of this technology: the reproduction of a person’s voice must reflect their real character and emotions.

Voice cloning opens up new possibilities for designers. Whether it’s recreating speech that hasn’t been properly recorded or making disruptive use of the voices of prominent public figures, the possibilities are endless. But they must be mastered.

This is why IRCAM amplify supports documentary and audiovisual projects with high historical added value in tailor-made production processes.

Voice Cloning

“Voice Cloning” is a technological process of voice reconstruction capable of determining and “learning” the elements that enable the automatic reproduction of all the emotion and dynamic articulations of an existing voice.

It takes only a few dozen minutes to analyse the original voice recording thanks to Machine Learning, in order to characterise it precisely and in a totally unique manner.

At the same time, the speech is recorded with an actor carrying the intonations and emotions, to give the prosody to the voice to recreate.

Would you like to meet with us?

Do you have a topic, a problem, a question you would like to share with us? Feel free to contact us!

Contact us