Enthusiasts Seth Forsgren and Heik Martiros developed the Riffusion neural network, which uses music based on lyrics. He learns and accumulates information about music with the help of a sonogram – visually formed representations of sounds. This format is fundamentally very good
Sograma manifests itself with sickly young graphics, on which each pixel has its own color, which means the perception of sound at any given time. Sonograms are built using Stable Diffusion distance learning models so they are easy to associate with text. For example, raise the AI to come up with and voice “jazz with notes of summer rain.”
Riffusion yayayaya The whole process is visualized automatically, the result enters the database for constant monitoring. Strictly speaking, most of the Riffusion melodies created are rather unpleasant, but this is a subtle example of a very promising technology for manipulating sounds in space and time.

Source: Tech Cult
