Google Translation can translate languages into the speaker’s voice
Lately, Google has focused on products related to speech. A few weeks ago, Google announced that it was training its AI to help with a speech disability. Last year, Google announced the interpreter mode in its assistant, in the same period more accents and languages were added to the Google translation application. The most recent release is Google Translation, a voice translation model that can help people directly convert their voice from one language to another, also in their own voice.
Note that this is not available as an application or function in Google translate and is currently being tested by Google in the company.
It is the first translation model of this type and will facilitate speaking in other languages. It can help to convert the user’s voice to another language while maintaining the user’s voice. In general, translation applications convert the voice into text, which is translated and converted into a voice again.
Although it is a widely used method, it can lead to a series of errors during the translation process. That is why this end-to-end translation will open many future developments.
How does it work?
According to Google, the Translation algorithm is based on a network model sequence by sequence. A sequence-by-sequence model takes an input sequence and produces an output in sequence form. In this case, the sequence is a visual representation of the voice in the form of a spectrogram. The algorithm takes the spectrogram and generates a destination spectrogram in the desired language.
This reduces the risk of losing data along the way and is also a faster process compared to the voice to text to speech method. The generated voice is slightly robotic, but as the application is still in development, we have high hopes for the future.
To maintain the speaker’s voice, an optional speaker component support is also added to the model. Samples of voice translations are available on GitHub.
Through Google AI Blog