Apple shakes up open source AI with MGIE image editor
The Cupertino-based tech giant has partnered with the University of Santa Barbara to develop an artificial intelligence model that can edit images based on natural language, in the same way people interact with ChatGPT. Apple calls it Multimedia and Large Language Model Guided Image Editing (MGIE).
MGIE interprets text instructions provided by users, processes them, and enhances them to create precise image editing commands. Incorporating a diffusion model improves the process, enabling MGIE to apply adjustments based on the properties of the original image.
Multimodal large language models (MLLMs), which can process both text and images, form the basis of the MGIE method. Unlike traditional single-mode AI systems that focus only on text or images, MLLMs can process complex instructions and work in a wider range of situations. For example, a model might understand a text instruction, analyze certain image elements, then take something from the image and create a new image without that element.
To perform these procedures, the AI system must have various capabilities, including generative text, generative image, segmentation, and CLIP analysis, all in the same process.
The introduction of MGIE brings Apple closer to realizing capabilities similar to OpenAI’s ChatGPT Plus, allowing users to engage in conversational interactions with AI models to create personalized images based on text input. Using MGIE, users can provide detailed natural language instructions – “Remove traffic cone from foreground” – which are translated into image editing commands and executed.
In other words, users can start with an image of a blond person and turn it into a ginger by simply saying “make this person red.” Under the hood, the model understands the instructions, sections the person’s hair, issues a command like “red hair, highly detailed, realistic, ginger color,” and then implements the changes via internal paint.
Apple’s approach is consistent with existing tools like Stable Diffusion, which can be augmented with a primitive text-guided image editing interface. By leveraging third-party tools like Pix2Pix, users can interact with the Stable Diffusion interface using natural language commands, and see real-time effects on edited images.
However, Apple’s approach has proven to be more accurate than any other similar method.
Natural language image editing results using Instruct Pix2Pic, LGIE, Apple’s MGIE, and Ground Truth Image: Apple
Besides generative AI, Apple MGIE can perform other traditional image editing tasks such as color grading, resizing, rotating, style changing, and drawing.
Why does Apple make it open source?
Apple’s forays into open source represent a clear strategic move, with a scope that goes beyond mere licensing requirements.
To build MGIE, Apple uses open source models such as Lava and Vicuna. Due to the licensing requirements for these models, which limit commercial use by larger corporate entities, Apple has likely had to share its improvements openly on GitHub.
But this also allows Apple to tap into a global pool of developers in an effort to enhance its power and flexibility. This type of collaboration moves things forward much faster than Apple working entirely on its own, starting from scratch. In addition, this openness inspires a wider range of ideas and attracts diverse artistic talent, allowing MGIE to develop faster.
Apple’s involvement in the open source community with projects like MGIE also gives the brand a boost among developers and technology enthusiasts. This aspect is no secret, as both Meta and Microsoft are investing heavily in open source AI.
Releasing MGIE as open source software could give Apple a head start in setting the still-evolving industry standards for AI and AI-based photo editing in particular. With MGIE, Apple has likely given AI artists and developers a solid foundation from which to build the next big thing, offering greater accuracy and efficiency than is available anywhere else.
MGIE would certainly make Apple products better: It wouldn’t be too difficult to put together a voice command sent to Siri and use that text to edit an image on the user’s smartphone, computer, or internal headset.
AI developers with technical expertise can now use MGIE. Simply visit the project’s GitHub repository.
Edited by Ryan Ozawa.