News

From Virtual Singing to Deepfakes: How is Speech Used Today

27/04/2022

IRCAM’s first steps into research on human speech stemmed from the music world, but today the applications for audio research and vocal analysis extend far beyond the music industry. Focus on the new use cases of voice.

Playing with Voice: A New Instrument

From the production of synthetic voices to the re-creation of voices that have disappeared to the transformation or analysis of voices in real time, researchers are developing new solutions that are ever more realistic and adapted to usages made possible by technological innovation.

However, the first projects giving rise to this work came from culture. Many artists regularly collaborate with audio specialists to rethink composition, using the voice as a new instrument. More recently, the artist DeLaurentis, in collaboration with Ircam Amplify, has designed an entirely new sound experience: a virtual choir that enables the creation of a choir effect in real time and the harmonization of voices on different musical scales, via connected gloves.

Listening to the Voice: The Societal Challenge of Deepfakes

By putting the best of IRCAM’s audio research and sound creation at the service of markets, new applications and organizations, Ircam Amplify is at the forefront of technological trends related to voice.

For the MGEN Vocal’iz application, Ircam Amplify has developed an algorithm to analyze the user’s voice and then suggest exercises adapted to the user’s profile and state of health. A real pocket vocal coach used for preventive purposes to take better care of one’s voice.

Connected speakers are also developing into daily support systems, along with all the devices that are rapidly becoming more ingrained in our lives and consumer habits.

At this stage, most technology is building towards more natural human-machine interactions but there are often still hurdles at the level of user requests. As Nathalie Birocheau, CEO of Ircam Amplify explains: “First a machine must handle the issue of voice analysis (intelligibility, comprehension of the content, analysis of the sound environment, etc.) before even thinking about establishing a more qualitative and emotional human-machine dialogue. Next will come the improvement of the interaction, to go towards more contextualisation and personalisation (if this is useful and relevant for the expected applocation!). This is where Ircam Amplify is positioning itself and developing technologies for the coming years.”

Listening to the Voice: The Societal Challenge of Deepfakes

The digital revolution is now moving towards a post-digital period where sound is taking back an essential place in usage, for sociological, economic and technological reasons.

Human/voice machine interfaces are developing at breakneck speed, with usage being driven by the younger generation, who are ditching keyboards and starting to converse with their computers and smartphones. In 2020, nearly 30% of Web browsing will be done without a screen, and voice assistants will be present on 8 billion devices by 2023.

These figures are dizzying, but they prove that the coming century will be the century of sound and voice (and above all the century of multi-sensoriality). This also means that over the course of a few centuries we will have gone from a society of oral tradition to a world of visual dominance.

On the model of image-based faking, audio deepfakes are the logical continuation of the work initiated in speech synthesis. The question is not if audio deepfakes will become ubiquitous, but when. The work on these audio filters designed thanks to Artificial Intelligence and Deep Learning is already well advanced, but the obvious risks for manipulation and misuse remain.

This underscores the emotive power of human speech. Last year, with the help of Opinion Way, Ircam Amplify established that nearly 70% of people believe that how easily convinced they are depends on someone’s voice.

The trend is clear: the 21st century will be one of speech, not just conversations between humans, but also with machines. And then the machines have to hear us and understand what we are saying. That’s a big part of the challenge for Ircam Amplify.

Check out Voice Cloning, the solution that allowed Thierry Ardisson to reconstruct the voice of personalities from the past.

Check out Voice Cloning

A History of Synthetic Voices

03/01/2022

In the age of AI, deep learning, and voice assistants, Ircam is a pioneer in the field of synthetic voices. Ircam’s research now makes it possible to humanize voices by infusing them with emotion, individuality, and subtlety.

How do sound and vocal interfaces build new relationships with our everyday objects?

17/06/2021

We are in a world where relationships between humans and connected devices are vast. These links are increasingly made by vocal and sound interfaces. Today, we are already talking with our smartphones, cars, speakers, computers… Tomorrow, sounds and voices will be at the center of our relationship with every technology.

Forbes

06/10/2021

VIKTOR & ROLF’S NEW SPICEBOMB FRAGRANCE HAS ITS VERY OWN SOUND
If a fragrance had a sound what would it be? We’re not talking soundtracks for 30- or 60-second television commercials, or music that captures the joie de vivre of Chanel Chance or the jaunty, flirty spirit of Guerlain Le Petite Robe Noir.

See post (external)

Would you like to meet with us?

Do you have a topic, a problem, a question you would like to share with us? Feel free to contact us!

Contact us

From Virtual Singing to Deepfakes: How is Speech Used Today

Playing with Voice: A New Instrument

Listening to the Voice: The Societal Challenge of Deepfakes

Listening to the Voice: The Societal Challenge of Deepfakes

Related articles

A History of Synthetic Voices

How do sound and vocal interfaces build new relationships with our everyday objects?

Forbes