Prompt transformation makes ChatGPT OpenAI’s covert moderator for DALL-E 3



summary
Summary

OpenAI’s new image AI DALL-E 3 is currently being deployed in ChatGPT and Bing Image Creator. OpenAI is documenting its efforts to prevent users from generating potentially harmful or offensive images.

The documentation shows that the integration of DALL-E 3 into ChatGPT is as much a safety measure as it is a convenience. This is because ChatGPT can use so-called “prompt transformations” to check user prompts for possible content violations, and then rewrite them in such a way that the violations can be bypassed if they appear unintentional. It’s essentially a stealth moderation system.

In particularly blatant cases, ChatGPT will block the prompt, especially if the prompt contains terms that are on the OpenAI block lists. These lists come from experience with DALL-E 2 and beta testing of DALL-E 3.

They include, for example, names of famous people or artists whose style should be used as a template – a popular prompt method in DALL-E 2 and similar image generation systems, which is seen as controversial by some artists. In addition, ChatGPT’s moderation API intervenes in prompts that violate OpenAI’s content guidelines.

Ad

Ad

For boundary setting and testing, OpenAI also relied on red teaming, in which assigned individuals attempted to give DALL-E 3 the wrong ideas through targeted prompts. Red teaming helps identify new risks and evaluate mitigations for known risks.

Proprietary image classifier for “racy” images

For sexist or otherwise “racy” content, OpenAI has trained an image output classifier to detect suspicious patterns in images and stop generation. A previous version of DALL-E 3 without this filter was able to generate violent, religious, and copyright-infringing images – all in one image.

An image from an earlier DALL-E 3 version without protections. | Image: Kaamalauppias, Discord

The function of the filter is illustrated in the following example, where an earlier version of DALL-E 3 generated an image of “an individual enjoying a leisurely picnic in the park”. In a previous version of DALL-E 3, this person was a muscular, almost naked man because, well, why not? In the launch version, the food is the focus of the image, not the person.

Image: OpenAI

According to OpenAI, the risk of such unwanted (the prompt did not ask for a muscular, naked man) or offensive images has been reduced to 0.7 percent for the launch version of DALL-E 3. However, such filters are controversial. With DALL-E 2 and similar systems, parts of the AI art scene complained about too much interference in the generation, limiting artistic freedom. According to OpenAI, the classifier will be further optimized to achieve an optimal balance between risk mitigation and output quality.

Sexual contexts are moderated away or constrained by DALL-E 3’s image classifier. | Image: OpenAI

Bias and disinformation

Like DALL-E 2, DALL-E 3 contains cultural biases, usually in favor of Western culture, OpenAI writes, especially in unspecified prompts where parameters such as nationality, culture, or skin color are not determined by the user.

Recommendation

generate the Nickelodeon mascot “Spongebob” flying in a plane toward two skyscrapers that resemble the World Trade Center. Meta has similar problems with its new AI stickers, such as a sticker of Mickey Mouse holding a machine gun.

Image: Bing Image Creator, DALL-E 3 / via 404media.co

Acknowledging the complexity of the problem, OpenAI says that while the risk mitigation measures it has put in place would limit misuse in these scenarios as well, it is “not able to anticipate all permutations that may occur.”

“Some common objects may be strongly associated with branded or trademarked content, and may therefore be generated as part of rendering a realistic scene,” OpenAI writes.

On the other hand, OpenAI says that using DALL-E 3 to generate potentially dangerous images, such as for building weapons or visualizing harmful chemical substances, is unproblematic. Here, the Red Team found such inaccuracies in the tested disciplines of chemistry, biology, and physics that DALL-E 3 is simply unsuitable for this application.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top