Meta*, continuing its research on the possibilities of generative AI models, has presented its latest development, a multimodal model for converting text to images and vice versa, called CM3leon (pronounced together with the word “chameleon”).
An analogue of the popular Stable Diffusion tools DALL-E and Midjourney, the “starter” CM3leon, according to the developers, achieves better results by using a “token-based autoregressive model” instead of the now more common diffusion model, writes VentureBeat.
“Diffusion models have recently dominated imaging work due to their high performance and relatively modest computational cost,” says the Meta* research. “But token-based autoregressive models also give good results, although they are much more expensive to train and use for inference.”
“CM3leon achieves superior text-to-image performance despite using five times fewer computational resources to train than previous methods,” the blog writes.
The basic outline of how CM3leon works is somewhat similar to how existing text generation models work. But during the development process, Meta* representatives paid close attention to legal issues: “The ethical implications of searching for image data in the field of text-to-image conversion have been the subject of considerable debate.”
As a result, only images licensed from Shutterstock were used in the testing phase of CM3leon. Now, after preliminary preparation, the development is going through the “controlled fine-tuning” stage (SFT, which also uses OpenAI to train ChatGPT).
According to Meta researchers, this approach produces “highly optimized results” (both in terms of resource usage and image quality). The program learns to understand even the most complex prompts, which is useful for generative tasks. In response to multi-step queries, high-quality, high-resolution, relevant images are generated.
“We found that instruction wrapping markedly improves the performance of the multimodal model in various tasks, such as image caption generation, visual question answering, text-based editing, and conditional image generation,” the developers write.
So far, CM3leon is still being tested and there is no information on whether Meta* will make this technology available to the public, but if this happens, it is very likely that, given the power and efficiency of the model, the company will want to monetize the development. .
* Recognized as extremist and banned in the Russian Federation
I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.