With AI you can: Google watches pixels to create soundtracks

Wolf howl created by artificial intelligence — Google Deep Mind

Deep Mind on Tuesday showed off the latest results from its generative artificial intelligence research into video-to-audio conversion. This is a new system that combines what you see on the screen with the user’s written input to create synchronized soundscapes for a given video clip.

V2A AI can be combined with video generation models like Veo, generative audio team Deep Mind wrote in a blog post, to create soundtracks, sound effects and even dialogue for on-screen action. In addition, Deep Mind claims that its new system can generate “an unlimited number of soundtracks for any video input,” tuning the model with positive and negative signals that encourage or discourage the use of a particular sound, respectively.

V2A cars

The system works by first encoding and compressing the input video signal, which the diffusion model uses to iteratively clear the desired audio effects from background noise based on additional text and visual input from the user. This audio output is finally decoded and exported as a signal, which can then be recombined with the video input.

The best part is that the user does not have to go in and manually (read: tediously) synchronize audio and video tracks, since the V2A system does this automatically. “By learning video, audio, and additional annotations, our technology learns to associate specific audio events with different visual scenes, while responding to information presented in the annotations or transcripts,” the Deep Mind team writes.

V2A Wolf

However, the system is not yet perfect. First, the quality of the output audio depends on the fidelity of the video input, and the system fails when the input exhibits video artifacts or other distortions. According to the Deep Mind team, synchronizing dialogue with the audio track remains an ongoing challenge.

“V2A attempts to generate speech based on input transcripts and synchronize it with the characters’ lip movements,” the team explained. “But the paired video generation pattern may not be driven by transcripts. “This creates a mismatch, often resulting in strange lip-syncing as the video model does not generate mouth movements that match the transcription.”

The system still must undergo “rigorous security assessment and testing” before the team will consider releasing it to the public. All videos and soundtracks produced by this system will be accompanied by the SynthID Deep Mind watermark. This system is far from the only sound-generating AI system currently on the market. Stability AI launched a similar product last week, and ElevenLabs launched its audio effects tool last month.

A new GOST is being created in Russia for the classification…

The trailer for the series “Out of Many” from the author…

‘A Knight of the Seven Kingdoms’: Watch the incredible first trailer…

Dia is the new AI browser just launched for Mac

TRON: Ares bikes come to life in the real world in…

A new GOST is being created in Russia for the classification…

The trailer for the series “Out of Many” from the author…

‘A Knight of the Seven Kingdoms’: Watch the incredible first trailer…

Dia is the new AI browser just launched for Mac

TRON: Ares bikes come to life in the real world in…

A new GOST is being created in Russia for the classification…

The trailer for the series “Out of Many” from the author…

‘A Knight of the Seven Kingdoms’: Watch the incredible first trailer…

Dia is the new AI browser just launched for Mac

TRON: Ares bikes come to life in the real world in…

A new GOST is being created in Russia for the classification…

The trailer for the series “Out of Many” from the author…

‘A Knight of the Seven Kingdoms’: Watch the incredible first trailer…

Dia is the new AI browser just launched for Mac

TRON: Ares bikes come to life in the real world in…

With AI you can: Google watches pixels to create soundtracks

LEAVE A REPLY Cancel reply

Recent Posts

Windows on June 11, 2025, 11:15, a new option to transfer data on Windows...

Microsoft announces Copilot Pro, paid version of its AI assistant

LOL: Whoever laughs is out, here are the main characters of the third season

On equal terms? Chinese smartphone for 30 thousand rubles compared with Samsung for...

LG phones will be released forever: what options do you have

EDITOR PICKS

POPULAR POSTS

iPhone 15 Pro and iPhone 15 Pro Max (Ultra): Everything we...

How much does the production cost of iPhone 15, iPhone 15...

What would an iPhone 14 Pro mini look like? That’s...

POPULAR CATEGORY