Microsoft AI can draw objects based on detailed text descriptions
Google may have taught an AI how to doodle, but drawing something more complex is tough for a computer. Imagine asking a computer to draw a “yellow bird with black wings and a short beak;” it sounds a little tricky.
Researchers at Microsoft, though, have been developing an AI-based technology to do just that. It generates images from text descriptions with a surprising amount of accuracy, according to the most recent paper posted by the team.
The system doesn’t find an existing image based on your input, but creates real drawing. “If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch,” said principal researcher Xiaodong He in a statement. “These birds may not exist in the real world — they are just an aspect of our computer’s imagination of birds.”
While the current form of this drawing technology isn’t perfect, it’s not hard to imagine a future where it could function as a sketch assistant for painters and interior designers or a tool to refine photos based on voice input. Farther out, researcher He imagines animated movies generated from a written script.
The team began its research into computer vision and natural language processing with the CaptionBot, an AI system that automatically writes captions for photos, then created a system to answer questions people ask about images called SeeingAI, which can be extra helpful if you’re blind. The current technology consists of two parts: one that generates images known as a Generative Adversarial Network (GAN) and one that judges the quality of the images generated, known as a discriminator.
The drawing bot was trained on pairs of images and captions, which teach the AI to learn what words go with which images. The team also created a mathematical representation of human attention, which is what we all use when we draw pictures from complex descriptions: a red wing, a sharp beak, a yellow wing. “Attention is a human concept; we use math to make attention computational,” said He.