Sber introduced the Kandinsky Video neural network, a generative model for creating complete videos using text descriptions. He said so in a statement.
The model is capable of generating video sequences up to eight seconds long at a rate of 30 frames per second. The video resolution will be 512 x 512 pixels. The generation will take up to three minutes.
There are two key blocks in the Kandinsky Video architecture: the first is responsible for creating key frames that form the plot structure of the video, the second is for generating interpolation frames that are responsible for the smoothness of the movements in the finished video .
The video generated is a continuous scene with movement of both the subject and the background. This is a key difference between videos synthesized by Kandinsky Video and animated videos, in which dynamics are achieved by simulating the flight of the camera over a static scene.
The model was trained on a data set of more than 300 thousand text and video pairs. The two blocks of the Kandinsky Video architecture are based on a new image synthesis model based on text descriptions, Kandinsky 3.0.
Sber presented Kandinsky 3.0 on November 22. The company has improved the neural network: compared to previous versions, it understands the user’s text request better, can create more photorealistic images, and generate full artistic paintings and sketch art.
Author:
Anastasia Marina
Source: RB

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.