The Washington Post conducted an extensive test to determine the most efficient artificial intelligence model in generating images and responding to user requests, focusing on models available to the general public without the need for advanced technical skills or dealing with open-source models.
The test included 5 main models in their paid and professional versions, aiming to ensure the highest level of quality. These models are: Adobe Firefly, Bytedance Seedream Image 4.0, Gemini 3 Pro, ChatGPT-5, and Meta AI.
The experiment relied on a variety of commands that simulate the daily needs of users, such as modifying image details, adding new elements, and removing people from images, in addition to testing the models' ability to generate accurate images of hands and faces, which are aspects that artificial intelligence still faces significant challenges in.
This test was part of a coverage prepared by both Jeffrey Fowler, the technology editor at the Washington Post, and Kevin Schoel, the editor specializing in artificial intelligence affairs, concurrently with the widespread use of generative artificial intelligence tools for images and the increasing reliance on them by users.
Professional Testing Methodology
The newspaper confirmed that it adopted a precise methodology to ensure the objectivity of the results, as the evaluation was not limited to the editors' opinions but was referred to an independent jury panel consisting of 3 prominent names in the photography field, to assess the image quality and their accuracy and realism, reflecting the models' performance in actual use.
The judging panel includes David Carson, a professional photojournalist and winner of several awards including the Pulitzer Prize for 2015, alongside Dahlia Dresser, a digital artist focusing on the creative aspect of images using the latest technologies, and Pratik Naik, an expert in refining images and adding beauty touches to them without changing them.
The test consisted of 5 different examinations:
-
The first one was about modifying faces and involved adding hair to the face image of famous actor Dwayne Johnson.
-
The second test was to generate an image of a deer covered in dazzling colors as an artistic piece worthy of recognition awards.
-
The third test was to remove a person from an image and this test focused on an image featuring actress Kristen Stewart and Robert Pattinson.
-
The fourth test was to generate an image of an actor crying happily for winning an Oscar, reflecting the actor’s feelings of happiness.
-
The fifth and final test was to generate an image of two hands grabbing a head from behind, with the fingers intertwined.
The results of these tests varied significantly between the different models based on each model's ability to generate images.
Final Verdict
The artificial intelligence model developed by Google, "Gemini,", based on "Nano Banana Pro" technology, topped all the tests conducted by the responsible entity, which prompted one of the jury members, Pratik Naik, to commend the remarkable progress accomplished by the model.
Among the most significant tests, "Gemini" excelled particularly in the field of image modification, whether in terms of removing people from the scene or adding new elements like hair and changing details with high precision. In one example, the model managed to remove actor Pattinson from an image with Stewart, then recreated an entirely different photography setup, which led jury member Carson to confirm the difficulty in distinguishing between the modified image and real photos.
%p data-end="511" data-start="91">Although Gemini excelled in most of the tests, it faced a noteworthy challenge regarding protecting copyright on generated images. When generating images of an Oscar-winning actor, the model used facial features of famous actor Leonardo DiCaprio and added a fictional signature below the image attributing rights to an actual photographer working with the Associated Press, leading jury member Carson to a clear critique of the model for this flaw.On the other hand, Adobe's "Firefly" model ranked last among the participating models, attributed to the model's reliance on training based on open-source images available for free use, which limited its output quality compared to competitors.
Dresser, on her part, noted that artificial intelligence-generated art is not inherently negative, but still requires human creative intervention to reach a higher standard of excellence. She noted that Gemini produced the best image in terms of technical quality while ChatGPT's image appeared more innovative artistically.
Generating images of hands and fingers remains one of the most significant obstacles facing artificial intelligence models. Although Gemini outperformed others in showing the correct number of fingers without severe errors, it still lacked full realism and could easily be discerned as an image generated artificially.




