Thus, Yandex.Browser users can now watch English videos with polyphonic audio translation into Russian.
If initially the technology used two synthesized voices, a male and a female, for speech translation, there are now twelve voices. Six for each gender.
The neural network reportedly “distributes” sounds to different speakers and then “remembers” them using AI models created in Yandex.
Moreover, it all works in multiple layers: first, a neural network translates speech into text, restores punctuation and marks the boundaries of sentences, and then another analyzes the spectrogram of the voice and marks the parts spoken by different people.
Source: Ferra
