NaHyeon Park, Namin An, Kunhee Kim, Soyeon Yoon, Jiahao Huo, Hyunjung Shim
This paper reveals that system prompts in large vision-language model-based text-to-image systems significantly contribute to social biases, and proposes a framework called FairPro to reduce these biases while maintaining text-image alignment.
The study investigates how large vision-language models used for generating images from text might perpetuate social biases. These models, which are becoming standard for creating images based on text descriptions, can produce biased images due to the influence of system prompts, which are the predefined instructions guiding these models. The researchers developed a method called FairPro that allows these models to check themselves for bias and adjust their behavior to be fairer without needing additional training. This approach helps create more socially responsible systems that still accurately align images with text descriptions.