Transcript of Tomorrow Will Be Heard (S2E3), the podcast that deciphers the new uses of audio in our daily lives
Voice research will profoundly change our relationship to objects. One example is Vocal’iz, a virtual vocal coach imagined by MGEN that helps you analyze and work on your voice, your vocal tone, your breath control and even your public speaking every day via your smart phone.
Mélusine Harlé, Director of Prevention at MGEN: “It’s an app that you can download to your phone. You only have to register and then you can instantly take a test to discover the quality and tone of your voice. Is your voice quiet? Are you a soprano, alto? Once you have taken the test, certain exercises are shared. Typically, if you have a tired voice, Vocal’iz will tell you: “Try some breathing exercises today”. If your voice is feeling really good one day, Vocal’iz might suggest singing exercises or something much more powerful that will allow you to have a lot more fun during the day.”
Initially designed for teachers, the application is open to everyone and is based on research carried out at Ircam and developed by Ircam Amplify
Méluse Harlé: “Vocal health is important for MGEN, which is why we developed this application with Ircam Amplify. We said to ourselves, “No one has taken on this subject in a way that is both playful and educational.” Naturally, we went to Ircam Amplify to see how they could help us, with Ircam’s research, to build a technological foundation that would allow us to realize our dream for this type of preventative care.”
Technology at the service of speech and health
Frederic Amadu, CTO of Ircam Amplify: “Alongside software functions developed in the Ircam laboratory, a signal analysis algorithm provides data on, for example, the frequency at which we speak, our pitch height, which is to say, our overall tonality. Frequency is the easiest parameter to understand of those we examine. We also analyze other vocal parameters such as vocal power: is the user whispering or speaking too loudly? Our system also counts the number of syllables spoken in a given time frame, which makes it possible to tell whether a user is speaking quickly or slowly.
By looking at all these parameters, the main analysis we can provide is to decide whether we have diction and speech patterns that are calm and understandable, and thus effective. Our technology provides raw analysis parameters, then we set thresholds in order to define whether the user’s speech is, for example, too high-pitched or too low-pitched, or too fast. Speech therapists helped to decide on those thresholds. We worked together with MGEN and speech therapists to define the rules and exercises used in Vocal’iz. The objective of the coaching is for a user to improve their speech scores by repeating an exercise several times, following the advice that the application gives based on the results of a given exercise.”
Vocal’iz analyzes your voice, but also improves your public speaking.
Méluse Harlé: “There is a whole series of exercises on prosody which allow you to work, in particular, on tone, rhythm, and pauses, which are very important. For example, there is a series of exercises with great classics of French literature, such as Cyrano de Bergerac’s tirade that we all know. With Vocal’iz, we combine pleasure and vocal training.”
Corinne Loie, Prevention Officer at MGEN and an opera singer was among the speech therapists who worked on the project: “Why take care of your voice? Because it will help you know yourself better. Most of the time, it also helps improve relationships and social interactions, and increases confidence and comfort in carrying out professional tasks. These improvements are MGEN’s goal as an occupational risk prevention organization. Most of the time, understanding our own voices helps us to bring ourselves into the world.”
Today, technology makes it possible to better ourselves through our interactions with others. Tomorrow, research offers the possibility of improving our interactions with objects.
Nathalie Birocheau, CEO of Ircam Amplify:
“It’s a huge field of research and there are different areas of application. There is the analysis, synthesis, and cloning or transformation of voices in real time. These uses often arise from the arts world, which is Ircam’s primary domain. The research then gradually finds use cases in other sectors. Ircam Amplify’s objective is to apply these technological foundations to industrial use cases that are broadly related to interfaces with voice assistants, interfaces with robots and connected objects.
This is a very important field of possibility for us, especially since there will be 8 billion voice assistants in circulation by 2023 and it is estimated that roughly 30% of web browsing is already done without a screen. These vocal interactions have to be high-quality, otherwise we won’t want to use these technologies. But above all, the technology has to work. Devices have to understand when there are several speakers or how to interpret our speech differently based on the way we express ourselves. We know that between people, the delivery and the way communication is perceived is often more important than the content.
Today, objects do not yet analyze vocal prosody, or the way in which we articulate a sentence, and therefore they cannot interpret if we are sad, in a hurry, if there are children in the room or the age of a speaker. If the user is an elderly person, a voice assistant may have to speak slower, more calmly and louder, and make different adjustments if the user is a child. Devices must learn to absorb all this information and then adapt the output appropriately. In a car, there are a lot of sounds, in the middle of a storm, it is raining, and there’s noise on the windows, so the car’s voice assistant should get louder. For now, these abilities are not yet integrated in technologies that communicate with us as human beings.”